
Staff Site Reliability Engineer
<Undefined>
full-time
Posted on:
Location Type: Hybrid
Location: Munich • Germany
Visit company websiteExplore more
Job Level
About the role
- Engage in and improve the full service lifecycle from initial design through deployment, operation, and continuous improvement.
- Prepare services for production by engaging in system design reviews, developing shared frameworks and platforms, planning capacity and conducting launch assessments.
- Operate, monitor, and maintain live services, designing observability stacks and dashboards to track key metrics and improve operational insight.
- Ensure sustainable scalability through automation, driving continuous evolution to increase reliability and delivery speed.
- Collaborate with product and engineering teams to define SLOs, error budgets and ensure services are reliable, scalable and observable.
- Lead incident management processes, including on-call rotations, managing outages, driving post-mortems and conducting root cause analysis.
- Identify and reduce toil through process automation, creating playbooks and automated runbooks to reduce MTTR.
- Define resilience strategies and implement chaos testing to proactively uncover weaknesses and validate recovery strategies.
- Mentor, train and grow the community. Guide engineers across teams in reliability best practices and tooling.
Requirements
- Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
- 8+ years of experience with SaaS software development in distributed systems using languages such as Kotlin/Java, Typescript, Python, and technologies like IaC, Docker, and Kubernetes.
- 2+ years’ experience as an SRE or similar role designing, operating, analyzing and troubleshooting distributed systems in agile environments.
- Strong knowledge of modern application and infrastructure monitoring concepts (Datadog and/or AWS experience advantageous).
- Systematic problem solving and debugging skills with a strong sense of ownership and bias towards establishing mechanisms which can scale across the entire company.
- Excellent written, verbal, and documentation skills.
- Collaborative team player, able to communicate effectively across disciplines.
Benefits
- Receive a competitive reward package – reevaluated each year – that includes salary, benefits, and pre-IPO equity.
- Enjoy 28 days of paid vacation, plus an additional day after 2 and 4 years.
- Make an impact on the environment and society with 1 (fully paid) Impact Day.
- Receive generous family leave, child support, mental health support, and sabbatical opportunities.
- We enjoy gathering for meals, cultural initiatives, and events like local Summer Sessions and year-end celebrations. There's also healthy snacks, drinks, and a weekly catered lunch.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
KotlinJavaTypescriptPythonInfrastructure as Code (IaC)DockerKubernetesSaaS software developmentsystem designchaos testing
Soft Skills
systematic problem solvingdebugging skillsownershipcollaborationcommunicationmentoringtrainingleadershipdocumentation skillsteam player
Certifications
Bachelor’s degree in Computer Sciencerelated field