Engineering Manager – SRE

AuthZed

full-time

Posted on: 1/17/2026

Location Type: Remote

Location: United States

Visit company website

Explore more

DevOps Engineer jobs

✨ AI Apply

Apply

Job Level

Senior Lead

Tech Stack

AWS Azure Cloud Go Google Cloud Platform Grafana Kubernetes Prometheus Python Terraform

About the role

Lead a global team of Site Reliability Engineers delivering infrastructure automation, observability, and operational scalability across multi-cloud and multi-region kubernetes based architectures.
Recruit, hire, onboard and develop engineers while elevating the overall strength of the team.
Act as a player coach by contributing to critical projects while mentoring engineers and supporting their professional growth.
Participate in on-call rotations at a sustainable level to stay grounded in real operational issues.
Guide project planning by defining milestones, identifying dependencies, and working toward timely and meaningful delivery.
Identify toil and lead initiatives to eliminate it through engineering solutions.
Drive automation and platform engineering: safer deploys, progressive delivery, guardrails, and paved paths that reduce toil.
Collaborate with product and engineering to ship features like self-service workflows and infra-as-code expectations with reliability baked in.
Serve as a senior escalation point for complex incident triage and root cause analysis.

Requirements

10+ years of experience in infrastructure, SRE, or platform engineering roles.
5+ years of team management or technical leadership in SRE or Platform Engineering.
Experience managing distributed teams across US, Canada, EU, and global time zones.
Experience leading or mentoring SRE/Infrastructure/Platform teams in a production SaaS environment.
Strong grasp of SRE fundamentals: SLOs/SLIs, error budgets, incident management, capacity planning, and operational excellence.
Extensive experience with AWS, GCP and Azure managed services.
Strong programming skills and experience writing production-quality automation or tooling (e.g., Go, Python, Bash).
Hands-on experience with Kubernetes, Kubernetes Operators/Controllers, containerized workloads, and Infrastructure as Code (Terraform, Pulumi).
Experience with monitoring and observability systems (e.g., Prometheus, Grafana, logging/tracing pipelines).
Excellent communication: can translate reliability tradeoffs to product/exec stakeholders and write crisp incident/postmortem artifacts.
Proven ability to translate operational pain points into engineering deliverables.

Benefits

Comprehensive benefits including healthcare (US-based) and other insurance.
Stock options at an early-stage startup.
Competitive salary based on experience.
A full remote and flexible schedule to accommodate different timezones.
Twice-yearly travel for team offsites focused on team bonding, collaboration, and having fun!

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

infrastructure engineeringsite reliability engineeringplatform engineeringSLOsSLIserror budgetsincident managementcapacity planningautomationprogramming

Soft Skills

team managementtechnical leadershipmentoringcommunicationcollaborationproject planningproblem-solvingoperational excellenceprofessional growthincident triage