Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Veeam Software

Senior Site Reliability Engineer – Government, Sovereign Cloud

Veeam Software

Senior Site Reliability Engineer for Veeam's Government & Sovereign Cloud environments. Building a global SRE function with an emphasis on high availability and operational excellence.

Posted 4/27/2026full-timeRemote • California • 🇺🇸 United StatesSenior💰 $138,900 - $231,400 per yearWebsite

Tech Stack

Tools & technologies
AWSAzureCloudDaggerDistributed SystemsGoGrafanaJavaJavaScriptKubernetesPrometheusTerraformTypeScript

About the role

Key responsibilities & impact
  • Get up to speed on the full platform — all VDC workloads, dependencies, and risk areas. Much of this will happen through code, docs, and conversations rather than direct environment access.
  • Work with SMEs across the org to fill knowledge gaps and build onboarding material for the team.
  • Write and maintain runbooks, architecture docs, and operational guides.
  • Design infrastructure for high availability and fault tolerance on Azure (including Azure Government).
  • Define SLIs, SLOs, and error budgets where none exist today.
  • Run incident response and blameless postmortems. Turn incidents into improvements.
  • Identify reliability risks across modern and legacy workloads and build practical remediation plans that work within compliance constraints.
  • Close observability gaps — define instrumentation requirements and drive implementation.
  • Set alerting, telemetry, and monitoring standards with partner teams.
  • Build automation to reduce toil and support fleet management.
  • Participate in on-call rotations.
  • Work with IaC, CI/CD, deployment automation, and config management — including in air-gapped or compliance-restricted environments.
  • Build and maintain testing, canary deployment, and release validation pipelines.
  • Integrate chaos engineering and monitoring tools, adapting choices to meet regulatory requirements.
  • Work across product, platform, security, legal, compliance, and operations teams.
  • Own problems end-to-end — identify gaps, drive solutions, don't wait for direction.
  • Mentor other engineers and help spread SRE practices across the org.

Requirements

What you’ll need
  • 7+ years in Software Engineering, with 3+ years in SRE, Platform Engineering, or similar — across multi-service platforms, not just single-service environments.
  • Experience with Government or Sovereign Cloud (e.g., Azure Government, AWS GovCloud).
  • Experience in regulated compliance environments — government (FedRAMP, CMMC, IL2/IL4/IL5), financial (PCI-DSS, SOX), or healthcare (HIPAA, HITRUST). You understand how compliance shapes architecture and operations.
  • Strong experience building and running production services on cloud infrastructure (Azure preferred, including Azure Government).
  • Able to learn large, complex platforms quickly with limited guidance — comfortable building understanding from code, docs, and architecture artifacts when direct environment access is restricted.
  • Can investigate systems independently and produce clear docs, risk assessments, and improvement plans.
  • Comfortable working across teams — engineering, product, security, compliance, operations.
  • Programming skills in one or more of: TypeScript/JS, Go, Java, C#, or similar.
  • Experience with monitoring and observability tools (e.g., Prometheus, Grafana, OpenTelemetry, ELK stack).
  • Experience with IaC (Terraform, Terragrunt, Pulumi) and container orchestration (Kubernetes).
  • Experience with CI/CD and GitOps tooling — GitHub Actions, Azure DevOps, GitLab CI, ArgoCD, FluxCD, or Dagger.
  • Solid grasp of distributed systems, networking, and cloud-native architecture.
  • Clear written and verbal communication skills.

Benefits

Comp & perks
  • Unlimited paid time off, 12 paid holidays, plus 4 extra global VeeaMe Days for self-care and 24 paid volunteer hours annually through Veeam Cares
  • Paid parental leave: 8 weeks for all parents, 16 weeks for birthing parents
  • Medical, dental, and vision coverage starting on your first day
  • Mental health support, therapy sessions, and digital wellness tools via our Employee Assistance Program
  • 401(k) retirement plan with company matching contributions
  • Fertility, adoption, and surrogacy support through Maven, plus paid volunteer time
  • AirVet: 24/7 virtual veterinary care at no cost
  • Legal services, identity protection, and supplemental health insurance options
  • Tax-advantaged spending accounts for healthcare, dependent care, and commuting
  • Opportunities to learn and grow through on-demand libraries (LinkedIn Learning, O’Reilly), mentoring, workshops, and learning events like our annual Global Day of Learning

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Software EngineeringSite Reliability Engineering (SRE)Platform EngineeringCloud InfrastructureProgramming (TypeScript, JavaScript, Go, Java, C#)Infrastructure as Code (IaC)Continuous Integration/Continuous Deployment (CI/CD)Distributed SystemsMonitoring and ObservabilityChaos Engineering
Soft Skills
Clear CommunicationMentoringProblem SolvingCollaborationIndependent InvestigationDocumentationRisk AssessmentAdaptabilityLearning AgilityTeamwork
Certifications
FedRAMPCMMCPCI-DSSSOXHIPAAHITRUST