<Undefined>

Staff Site Reliability Engineer

<Undefined>

full-time

Posted on:

Location Type: Hybrid

Location: MunichGermany

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Engage in and improve the full service lifecycle from initial design through deployment, operation, and continuous improvement.
  • Prepare services for production by engaging in system design reviews, developing shared frameworks and platforms, planning capacity and conducting launch assessments.
  • Operate, monitor, and maintain live services, designing observability stacks and dashboards to track key metrics and improve operational insight.
  • Ensure sustainable scalability through automation, driving continuous evolution to increase reliability and delivery speed.
  • Collaborate with product and engineering teams to define SLOs, error budgets and ensure services are reliable, scalable and observable.
  • Lead incident management processes, including on-call rotations, managing outages, driving post-mortems and conducting root cause analysis.
  • Identify and reduce toil through process automation, creating playbooks and automated runbooks to reduce MTTR.
  • Define resilience strategies and implement chaos testing to proactively uncover weaknesses and validate recovery strategies.
  • Mentor, train and grow the community. Guide engineers across teams in reliability best practices and tooling.

Requirements

  • Bachelor’s degree in Computer Science, a related field, or equivalent practical experience.
  • 8+ years of experience with SaaS software development in distributed systems using languages such as Kotlin/Java, Typescript, Python, and technologies like IaC, Docker, and Kubernetes.
  • 2+ years’ experience as an SRE or similar role designing, operating, analyzing and troubleshooting distributed systems in agile environments.
  • Strong knowledge of modern application and infrastructure monitoring concepts (Datadog and/or AWS experience advantageous).
  • Systematic problem solving and debugging skills with a strong sense of ownership and bias towards establishing mechanisms which can scale across the entire company.
  • Excellent written, verbal, and documentation skills.
  • Collaborative team player, able to communicate effectively across disciplines.
Benefits
  • Receive a competitive reward package – reevaluated each year – that includes salary, benefits, and pre-IPO equity.
  • Enjoy 28 days of paid vacation, plus an additional day after 2 and 4 years.
  • Make an impact on the environment and society with 1 (fully paid) Impact Day.
  • Receive generous family leave, child support, mental health support, and sabbatical opportunities.
  • We enjoy gathering for meals, cultural initiatives, and events like local Summer Sessions and year-end celebrations. There's also healthy snacks, drinks, and a weekly catered lunch.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
KotlinJavaTypescriptPythonInfrastructure as Code (IaC)DockerKubernetesSaaS software developmentsystem designchaos testing
Soft Skills
systematic problem solvingdebugging skillsownershipcollaborationcommunicationmentoringtrainingleadershipdocumentation skillsteam player
Certifications
Bachelor’s degree in Computer Sciencerelated field