Veeam Software

Staff Site Reliability Engineer

Veeam Software

full-time

Posted on:

Origin:  • 🇺🇸 United States

Visit company website
AI Apply
Apply

Job Level

Lead

Tech Stack

AzureCloudDistributed SystemsGoGrafanaJavaJavaScriptKubernetesPrometheusTerraformTypeScript

About the role

  • Act as a technical authority, mentoring senior engineers and guiding design choices to improve service reliability and resilience
  • Lead the definition and enforcement of SLIs, SLOs, and error budgets and drive adherence across engineering teams
  • Collaborate with Staff peers and partner with development and product teams to design for failure and operationalize reliability from the start
  • Drive company-wide adoption of observability best practices and tooling; ensure metrics, logs, and traces provide deep, actionable insights
  • Lead complex incident responses, postmortems, and systemic reliability improvements while promoting a blameless culture
  • Lead initiatives in infrastructure as code, deployment automation, and resilience testing; influence chaos engineering and release validation frameworks
  • Partner with platform and security teams to ensure production readiness and represent the SRE team in technical leadership forums and product planning

Requirements

  • 8+ years of experience in a Software Engineering or SRE role, including technical leadership
  • Demonstrated experience mentoring and guiding senior engineers
  • Deep expertise in building distributed systems on public cloud (Azure preferred)
  • Strong skills in programming (e.g., JS, Go, Typescript, Java, or C#)
  • Hands-on experience with observability tooling (e.g., Prometheus, Grafana, OpenTelemetry)
  • Mastery of infrastructure automation tools (Terraform, Pulumi) and container orchestration (Kubernetes)
  • Ability to communicate clearly across geographies and disciplines
  • Experience leading SRE initiatives across multiple product teams (preferred)
  • Background in chaos engineering, incident learning, or performance and load testing (preferred)
  • Familiarity with global compliance standards (ISO, SOC 2, GDPR, FedRAMP, CMMC) (preferred)