
Manager, Site Reliability Engineering
Veeam Software
full-time
Posted on:
Location Type: Remote
Location: Czech
Visit company websiteExplore more
About the role
- Hire, onboard, and grow your SRE team; coach career development and performance
- Foster a psychologically safe, blameless culture that favors learning over blame and emphasizes engineering over firefighting
- Ensure a sustainable operational coverage; monitor on-call health and workload
- Track and cap toil so engineers spend the majority of time on project work that reduces future toil
- Establish and operationalize SLIs/SLOs and error budgets with service owners; run reliability reviews and hold teams accountable to outcomes
- Define reliability standards, runbooks, readiness checklists, and alerting patterns (including SLO-based alerting)
- Partner with product/EMs to align reliability work with service goals and customer experience, not as a gate but as an enabler
- Ensure incident response readiness; lead/coordinate major incidents; drive fast, high-quality postmortems and systemic fixes
- Measure MTTR, change failure rate, SLO posture, and repeat-incident reduction; publish learning broadly
- Lead software-first reliability investments: observability, deployment safety (canary/blue-green), resilience testing/chaos, and self-service guardrails
- Drive platform improvements (IaC, CI/CD, Kubernetes) and internal tools that scale operations and improve developer experience
Requirements
- 7+ years in Software, Platform, and/or Reliability Engineering with 2+ years managing engineers
- Demonstrable experience leading engineering teams to predictably deliver outcomes
- Experience leading cross-functional initiatives collaboratively with peers through influence
- Experience with public cloud (Azure preferred), Kubernetes, IaC (Terraform, Pulumi), CI/CD (Github Actions, ArgoCD, Azure DevOps), and observability (OpenTelemetry, Elastic, Datadog, Prometheus, Grafana)
- Coding background with experience improving service reliability
- Hands-on incident management and postmortem practice; excellent cross-geo communication
- Willingness to participate in an on-call rotation (typically during daytime hours, including weekends/holidays)
Benefits
- 25 vacation days, 4 sick days, 21 paid medical leave days, plus 4 extra global VeeaMe Days for self-care and 24 paid volunteer hours annually through Veeam Cares
- Premium private medical insurance for employees and dependents
- Daily meal vouchers for restaurants and groceries (180 CZK per working day)
- Flexible cafeteria platform with thousands of lifestyle benefit options
- Multisport Card for gym and wellness, with family add-on options
- Annual public transport reimbursement up to a set limit
- Corporate mobile plan with optional family tariff
- Opportunities to learn and grow through on-demand libraries (LinkedIn Learning, O’Reilly), mentoring, workshops and learning events like our annual Global Day of Learning
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Software EngineeringPlatform EngineeringReliability EngineeringIncident ManagementService ReliabilityObservabilityInfrastructure as CodeContinuous IntegrationContinuous DeploymentPostmortem Practice
Soft Skills
Team LeadershipCoachingCollaborationInfluenceCommunicationPsychological SafetyPerformance ManagementCross-Functional LeadershipProblem SolvingCultural Development