
Site Reliability Engineer 5, Ads SRE
Netflix
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
About the role
- Help guide the reliability of the Netflix Ad Suite.
- Design, implement, and maintain scalable and reliable infrastructure.
- Collaborate with engineering and product teams for observability and reliability.
- Coordinate capacity planning for Dynamic Ad Insertion.
- Develop automation tools for monitoring, deployment, and incident response.
- Participate in on-call rotations for the health of the Netflix Ad Suite.
- Implement a robust incident response framework.
- Identify sources of instability and champion a culture of reliability.
Requirements
- 5+ years of experience as a Site Reliability Engineer (SRE), Production Engineer, or similar role supporting business-critical, high-traffic services.
- Write code to solve problems.
- Proficient in one or more languages like Python, Go, or Java.
- Hands-on experience with cloud providers such as AWS/Azure/GCP.
- Infrastructure as Code such as Terraform.
- Container orchestration systems like Kubernetes.
- Understand large-scale distributed systems, their common failure modes and edge cases.
- Excellent communication skills and a proven ability to build relationships with engineering partners.
- Calmly navigate complex production issues, identify root causes, and implement lasting solutions.
- Committed to continuous improvement and scaling expertise.
Benefits
- Inclusive hiring practices
- 10-15% Travel Expectation
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Site Reliability EngineerProduction EngineerPythonGoJavaAWSAzureGCPTerraformKubernetes
Soft Skills
communication skillsrelationship buildingproblem solvingcalm under pressureroot cause analysiscontinuous improvementscaling expertise