Veeam Software

Manager, Site Reliability Engineering

Veeam Software

full-time

Posted on:

Location Type: Remote

Location: Czech

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Hire, onboard, and grow your SRE team; coach career development and performance
  • Foster a psychologically safe, blameless culture that favors learning over blame and emphasizes engineering over firefighting
  • Ensure a sustainable operational coverage; monitor on-call health and workload
  • Track and cap toil so engineers spend the majority of time on project work that reduces future toil
  • Establish and operationalize SLIs/SLOs and error budgets with service owners; run reliability reviews and hold teams accountable to outcomes
  • Define reliability standards, runbooks, readiness checklists, and alerting patterns (including SLO-based alerting)
  • Partner with product/EMs to align reliability work with service goals and customer experience, not as a gate but as an enabler
  • Ensure incident response readiness; lead/coordinate major incidents; drive fast, high-quality postmortems and systemic fixes
  • Measure MTTR, change failure rate, SLO posture, and repeat-incident reduction; publish learning broadly
  • Lead software-first reliability investments: observability, deployment safety (canary/blue-green), resilience testing/chaos, and self-service guardrails
  • Drive platform improvements (IaC, CI/CD, Kubernetes) and internal tools that scale operations and improve developer experience

Requirements

  • 7+ years in Software, Platform, and/or Reliability Engineering with 2+ years managing engineers
  • Demonstrable experience leading engineering teams to predictably deliver outcomes
  • Experience leading cross-functional initiatives collaboratively with peers through influence
  • Experience with public cloud (Azure preferred), Kubernetes, IaC (Terraform, Pulumi), CI/CD (Github Actions, ArgoCD, Azure DevOps), and observability (OpenTelemetry, Elastic, Datadog, Prometheus, Grafana)
  • Coding background with experience improving service reliability
  • Hands-on incident management and postmortem practice; excellent cross-geo communication
  • Willingness to participate in an on-call rotation (typically during daytime hours, including weekends/holidays)
Benefits
  • 25 vacation days, 4 sick days, 21 paid medical leave days, plus 4 extra global VeeaMe Days for self-care and 24 paid volunteer hours annually through Veeam Cares
  • Premium private medical insurance for employees and dependents
  • Daily meal vouchers for restaurants and groceries (180 CZK per working day)
  • Flexible cafeteria platform with thousands of lifestyle benefit options
  • Multisport Card for gym and wellness, with family add-on options
  • Annual public transport reimbursement up to a set limit
  • Corporate mobile plan with optional family tariff
  • Opportunities to learn and grow through on-demand libraries (LinkedIn Learning, O’Reilly), mentoring, workshops and learning events like our annual Global Day of Learning
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Software EngineeringPlatform EngineeringReliability EngineeringIncident ManagementService ReliabilityObservabilityInfrastructure as CodeContinuous IntegrationContinuous DeploymentPostmortem Practice
Soft Skills
Team LeadershipCoachingCollaborationInfluenceCommunicationPsychological SafetyPerformance ManagementCross-Functional LeadershipProblem SolvingCultural Development