AuthZed

Engineering Manager – SRE

AuthZed

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Lead a global team of Site Reliability Engineers delivering infrastructure automation, observability, and operational scalability across multi-cloud and multi-region kubernetes based architectures.
  • Recruit, hire, onboard and develop engineers while elevating the overall strength of the team.
  • Act as a player coach by contributing to critical projects while mentoring engineers and supporting their professional growth.
  • Participate in on-call rotations at a sustainable level to stay grounded in real operational issues.
  • Guide project planning by defining milestones, identifying dependencies, and working toward timely and meaningful delivery.
  • Identify toil and lead initiatives to eliminate it through engineering solutions.
  • Drive automation and platform engineering: safer deploys, progressive delivery, guardrails, and paved paths that reduce toil.
  • Collaborate with product and engineering to ship features like self-service workflows and infra-as-code expectations with reliability baked in.
  • Serve as a senior escalation point for complex incident triage and root cause analysis.

Requirements

  • 10+ years of experience in infrastructure, SRE, or platform engineering roles.
  • 5+ years of team management or technical leadership in SRE or Platform Engineering.
  • Experience managing distributed teams across US, Canada, EU, and global time zones.
  • Experience leading or mentoring SRE/Infrastructure/Platform teams in a production SaaS environment.
  • Strong grasp of SRE fundamentals: SLOs/SLIs, error budgets, incident management, capacity planning, and operational excellence.
  • Extensive experience with AWS, GCP and Azure managed services.
  • Strong programming skills and experience writing production-quality automation or tooling (e.g., Go, Python, Bash).
  • Hands-on experience with Kubernetes, Kubernetes Operators/Controllers, containerized workloads, and Infrastructure as Code (Terraform, Pulumi).
  • Experience with monitoring and observability systems (e.g., Prometheus, Grafana, logging/tracing pipelines).
  • Excellent communication: can translate reliability tradeoffs to product/exec stakeholders and write crisp incident/postmortem artifacts.
  • Proven ability to translate operational pain points into engineering deliverables.
Benefits
  • Comprehensive benefits including healthcare (US-based) and other insurance.
  • Stock options at an early-stage startup.
  • Competitive salary based on experience.
  • A full remote and flexible schedule to accommodate different timezones.
  • Twice-yearly travel for team offsites focused on team bonding, collaboration, and having fun!
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
infrastructure engineeringsite reliability engineeringplatform engineeringSLOsSLIserror budgetsincident managementcapacity planningautomationprogramming
Soft Skills
team managementtechnical leadershipmentoringcommunicationcollaborationproject planningproblem-solvingoperational excellenceprofessional growthincident triage