Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Softcard (acquired by Google)

Manager – Site Reliability Engineering, SRE

Softcard (acquired by Google)

. Lead, mentor, and grow a high-performing team of Site Reliability Engineers, fostering a culture of ownership, continuous improvement, and operational excellence .

Posted 5/8/2026full-timeBirmingham • Alabama • 🇺🇸 United StatesSeniorLeadWebsite

Tech Stack

Tools & technologies
CloudGoogle Cloud PlatformKubernetesSDLCTerraform

About the role

Key responsibilities & impact
  • Lead, mentor, and grow a high-performing team of Site Reliability Engineers, fostering a culture of ownership, continuous improvement, and operational excellence
  • Implement and champion Site Reliability Engineering principles and DevOps best practices within the team to ensure service reliability, availability, and performance
  • Define and track key SRE metrics such as service uptime, incident response and resolution times
  • Drive automation efforts including CI/CD pipeline enhancements, infrastructure-as-code practices, and self-service infrastructure provisioning to increase deployment velocity while reducing manual toil
  • Own and continuously improve observability practices including system monitoring, logging, alerting, and diagnostics to ensure rapid issue detection and resolution
  • Participate in incident response processes including incident management, root cause analysis, post-mortems, and continuous improvement to enhance system resilience
  • Partner closely with software engineering, product management, architecture, and security teams to embed reliability and security early in the software development lifecycle (SDLC)
  • Oversee the management and scalability of cloud infrastructure environments, primarily on Google Cloud Platform (GCP), with a focus on Kubernetes, container orchestration, and hybrid cloud integrations
  • Advocate for and apply best practices in performance tuning, capacity planning, and system design for high availability
  • Develop and execute a long-term roadmap for our hybrid cloud platform, aligning with evolving business objectives and technology trends
  • Establish and monitor key performance indicators (KPIs) service level indicators (SLIs) and service level objectives (SLOs) to drive system health and stability

Requirements

What you’ll need
  • Typically requires a bachelor's degree and 7 years of experience in a technology and/or software engineering role or an equivalent combination
  • Proven experience working in large, complex enterprise environments (Fortune 500 or equivalent)
  • Strong understanding and demonstrated implementation of Site Reliability Engineering (SRE) principles at scale
  • Hands-on experience with infrastructure-as-code (IaC) tools such as Terraform, and ArgoCD
  • In-depth knowledge and practical experience with CI/CD pipelines and automation of software delivery
  • Significant hands-on experience in Site Reliability Engineering or related roles focused on cloud infrastructure reliability
  • Strong software engineering background with proficiency in infrastructure-as-code tools (e.g., Terraform, ArgoCD) and CI/CD automation
  • Deep knowledge of cloud platforms, specifically Google Cloud Platform (GCP), Kubernetes, container orchestration, and cloud-native architecture
  • Familiarity with monitoring and observability tools such as Dynatrace, Datadog, or equivalents
  • Experience managing high-availability systems in 24/7 operational environments
  • Ability to collaborate cross-functionally and drive alignment across engineering, product, and security teams

Benefits

Comp & perks
  • Health insurance
  • Retirement plans
  • Paid time off
  • Flexible work arrangements
  • Professional development

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Site Reliability Engineering (SRE)DevOps best practicesinfrastructure-as-code (IaC)CI/CD pipelinescloud infrastructure reliabilityperformance tuningcapacity planningsystem designKubernetescloud-native architecture
Soft Skills
leadershipmentoringcollaborationcontinuous improvementproblem-solvingcommunicationcross-functional alignmentownershipoperational excellenceincident management