FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior Site Reliability Engineer – Observability
LambdaSenior Site Reliability Engineer deploying observability platforms for AI cloud infrastructure at Lambda. Collaborating with engineering teams to enhance product reliability and system monitoring.
Posted 5/9/2026full-timeSan Francisco • California, Washington • 🇺🇸 United StatesSenior💰 $240,000 - $401,000 per yearWebsite
Tech Stack
Tools & technologiesGoKubernetes
About the role
Key responsibilities & impact- Deploy and operate observability platforms for logging, metrics, and distributed tracing.
- Automate the deployment and operation of these observability systems.
- Set up monitoring for modern AI/HPC cluster infrastructure.
- Develop platform software to make observability adoptable and improve product reliability.
- Lead members of other engineering teams in development of solutions for their monitoring challenges.
Requirements
What you’ll need- Have 8+ years of experience in software engineering, with 3+ years in Go
- Have 5+ years of experience in Site Reliability Engineering practices
- Possess proven understanding of Observability tools and practices
- Have experience with application deployment and monitoring using Kubernetes
- Have strong experience with modern devops practices
- Expect quality and reliability from the solutions you build
- Enjoy collaborating across team boundaries to help our engineering teams meet their observability needs
Benefits
Comp & perks- Health, dental, and vision coverage for you and your dependents
- Wellness and commuter stipends for select roles
- 401k Plan with 2% company match (USA employees)
- Flexible paid time off plan that we all actually use
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
GoSite Reliability EngineeringObservability toolsKubernetesDevOps practicesSoftware developmentMonitoringAutomationDistributed tracingLogging
Soft Skills
CollaborationLeadershipProblem-solvingQuality assuranceReliability focus