FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesAWSAzureCloudDockerGrafanaJenkinsKubernetesLinuxPrometheusPythonTerraform
About the role
Key responsibilities & impact- Lead, mentor, and develop Level 1 and Level 2 SRE Support Engineers
- Manage 24x7 support coverage, shift planning, workforce utilization, and operational readiness
- Establish clear escalation matrices and support ownership models
- Drive skill upliftment across cloud technologies, troubleshooting, and SRE practices
- Manage support delivery for multiple enterprise managed services customers
- Understand customer expectations, business priorities, and critical workloads
- Act as senior escalation point for high-priority incidents and service concerns
- Ensure proactive communication during outages, incidents, and service requests
- Define and monitor Service Level Indicators (SLIs) for availability, latency, error rates, throughput, and ticket responsiveness
- Establish and govern Service Level Objectives (SLOs) aligned to customer needs
- Manage Error Budgets and balance reliability with speed of change
- Improve operational reliability through automation, standardization, and continuous improvement
- Reduce toil and repetitive manual support tasks
- Lead major incident management bridges and restoration activities
- Coordinate with Level 3 teams, cloud vendors, and customer stakeholders
- Drive Root Cause Analysis (RCA) and preventive corrective actions
- Ensure controlled execution of change management, patching, releases, and maintenance
- Track contractual SLAs, operational KPIs, MTTR, MTTD, ticket aging, and backlog health
- Publish weekly/monthly service review dashboards
- Highlight risks, recurring issues, and improvement opportunities
- Ensure audit readiness and governance compliance
- Oversee customer workloads on Amazon Web Services, Microsoft Azure, Google Cloud
Requirements
What you’ll need- 12+ years overall experience
- 3+ years in team leadership / support management / SRE management role
- Strong hands-on experience in any one or more cloud platforms: Amazon Web Services / Microsoft Azure / Google Cloud
- Good understanding of compute, storage, networking, IAM, backup, DR, and security controls
- Experience with Linux and/or Windows server administration
- Knowledge of containers and orchestration platforms such as Kubernetes / Docker
- Strong knowledge of SRE principles and best practices
- Experience designing and tracking SLI, SLO, SLA frameworks
- Practical understanding of Error Budget policy management
- Expertise in incident response, on-call operations, postmortems, and resilience engineering
- Familiarity with capacity planning, availability engineering, and performance optimization
- Hands-on experience with monitoring tools: Amazon CloudWatch, Azure Monitor, Google Cloud Operations Suite, Datadog, Grafana, Prometheus
- Experience with scripting: Python / Bash / PowerShell
- Infrastructure as Code using Terraform or similar
- CI/CD exposure using GitHub Actions, Jenkins, or similar tools
- Proven experience managing technical support or SRE operations teams
- Strong customer-facing communication skills
Benefits
Comp & perks- Health insurance
- Flexible working hours
- Professional development opportunities
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
cloud technologiesLinux administrationWindows server administrationcontainersKubernetesDockerSRE principlesError Budget managementmonitoring toolsscripting
Soft Skills
team leadershipsupport managementcustomer-facing communicationincident responseproactive communicationmentoringproblem-solvingcollaborationorganizational skillscontinuous improvement
