FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Staff Site Reliability Engineer
Twelve LabsStaff Site Reliability Engineer responsible for production reliability and infrastructure for multimodal AI models at Twelve Labs. Collaborating with product teams to ensure system health and performance.
Posted 5/8/2026full-timeSan Francisco • California • 🇺🇸 United StatesLead💰 $220,000 - $250,000 per yearWebsite
Tech Stack
Tools & technologiesAnsibleAWSCloudGrafanaKubernetesPrometheusTerraform
About the role
Key responsibilities & impact- Own production reliability end to end — from deployment through monitoring, incident response, and postmortem-driven improvement.
- Partner with the product engineering teams to ensure their services are reliable, observable, and operable by design.
- Build and maintain observability systems (metrics, logging, tracing, alerting) that give the team clear signal on system health and performance.
- Design and operate cloud infrastructure supporting AI/ML workloads.
- Drive incident response — detect, diagnose, mitigate, and prevent production issues. Build the runbooks, automation, and guardrails that reduce mean time to recovery.
- Identify and eliminate toil through automation, self-healing systems, and better tooling.
Requirements
What you’ll need- 7+ years of experience operating production infrastructure systems, not just building them.
- Strong hands-on experience with AWS, Kubernetes in production environments.
- Solid fundamentals in OS internals, networking, storage, and compute — the kind that help you debug a problem at 3am without documentation.
- Deep practical experience with observability (Prometheus/Grafana/Loki or equivalent), Infrastructure as Code (Terraform, Ansible), and CI/CD.
- Track record of owning services end to end — deployment, monitoring, incident response, and postmortem follow-through.
Benefits
Comp & perks- An open and inclusive culture and work environment
- Work closely with a collaborative, mission-driven team on cutting-edge AI technology
- Full health, dental, and vision benefits
- Extremely flexible PTO and parental leave policy. Office closed the week of Christmas and New Years
- Monthly wellness stipend
- Annual Learning & Development stipend to invest in your growth
- Global offices in San Francisco and Seoul, and coworking office memberships for remote team members
- VISA support where applicable
- Transportation stipend
- Daily lunch & dinner provided
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
production reliabilityincident responseobservabilitycloud infrastructureAI/ML workloadsautomationInfrastructure as CodeCI/CDOS internalsnetworking
Soft Skills
problem-solvingcollaborationcommunicationownershipproactive improvement