Backblaze

Senior Manager, SRE

Backblaze

full-time

Posted on:

Origin:  • 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $175,000 - $215,000 per year

Job Level

Senior

Tech Stack

AnsibleAWSAzureCloudDistributed SystemsDockerGoGoogle Cloud PlatformGrafanaKubernetesLinuxMicroservicesPrometheusPythonTerraform

About the role

  • Build, lead, and mentor a team of SREs across multiple regions and time zones.
  • Define the long-term vision and roadmap for SRE, aligning with organizational objectives.
  • Partner with product and engineering to ensure reliability is embedded in design, development, and operations.
  • Own the end-to-end reliability of critical customer-facing services.
  • Establish and maintain SLOs, SLIs, and error budgets to measure and enforce service quality.
  • Drive root cause analysis and problem management for major incidents, ensuring long-term fixes are prioritized.
  • Champion adoption of ITIL/OSS processes (incident, change, problem, and capacity management).
  • Expand automation in deployment, monitoring, testing, and incident response to reduce toil.
  • Oversee observability platforms (e.g., Catchpoint, Grafana, Moogsoft/BigPanda, Prometheus, Datadog).
  • Ensure robust configuration, capacity, and change management practices.
  • Partner with Network Engineering, DevOps, NOC, and Product Engineering on scalable, resilient architecture.
  • Support business continuity, disaster recovery, and compliance requirements.
  • Engage with vendors and service providers to manage SLAs and performance outcomes.
  • Hire, coach, and develop engineers and managers, creating strong career paths within SRE.
  • Foster a culture of reliability, accountability, and continuous improvement.
  • Lead succession planning and leadership pipeline development.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or related field (Master’s preferred).
  • 10+ years in infrastructure, reliability, or operations engineering roles.
  • 5+ years in people leadership with experience managing managers and global teams.
  • Deep expertise in Linux operating systems (administration, performance tuning, troubleshooting, security hardening).
  • Strong knowledge of distributed systems, cloud platforms (AWS, GCP, Azure, private cloud), and networking fundamentals.
  • Solid background in observability, monitoring, logging, and alerting frameworks.
  • Proficiency with automation (Python, Go, Ansible, Terraform, CI/CD pipelines).
  • Familiarity with containers (Kubernetes, Docker) and microservices architectures.
  • Strong understanding of ITIL/OSS frameworks, SLO/error budget practices, and incident management at scale.
  • Proven ability to manage large-scale, high-availability environments.
  • Strong communication skills with executive presence; able to translate technical topics into business outcomes.
  • Demonstrated success in building and maturing high-performing SRE/operations teams.
  • Preferred: Experience in a service provider, CDN, or large-scale SaaS environment.
  • Preferred: Familiarity with compliance and regulatory frameworks (SOC 2, ISO 27001, GDPR).
  • Preferred: Track record of driving cultural transformation toward reliability-first principles.
Nscale

Director of Observability

Nscale
Leadfull-time🇬🇧 United Kingdom
Posted: 3 days agoSource: nscale.bamboohr.com
AnsibleCloudDistributed SystemsGoGrafanaKubernetesPrometheusPythonTerraform
MTN Uganda

Manager, Customer Reliability Engineering

MTN Uganda
Senior · Leadfull-time🇿🇦 South Africa
Posted: 6 days agoSource: ehle.fa.em2.oraclecloud.com
AnsibleAWSAzureCloudGoogle Cloud PlatformGrafanaKubernetesPrometheusSplunkTerraform
Sinch

Senior Site Reliability Engineer

Sinch
Seniorfull-time$143k–$179k / yearColorado, Illinois · 🇺🇸 United States
Posted: 21 days agoSource: apply.workable.com
AnsibleAWSCassandraCloudDistributed SystemsElasticSearchGoGoogle Cloud PlatformGrafanaLinuxMicroservicesPrometheus+2 more
Docusign

Principal Product Manager - Site Reliability

Docusign
Leadfull-time$174k–$328k / year🇺🇸 United States
Posted: 40 days agoSource: uscareers-docusign.icims.com
AnsibleAWSAzureCloudDistributed SystemsGoogle Cloud PlatformGrafanaKubernetesPrometheusTerraform
C-Serv

Lead Cloud Edge Engineer

C-Serv
Seniorfull-time🇮🇪 Ireland
Posted: 8 days agoSource: apply.workable.com
AnsibleAWSAzureCloudDockerGoogle Cloud PlatformGrafanaIoTKubernetesLinuxPrometheusTerraform