FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior Site Reliability Engineer
Sanity.ioSRE managing scalable content operations infrastructure for AI-powered platform. Collaborating with dev teams and ensuring reliability for high request volume systems.
Posted 7/2/2026full-timeRemote • Connecticut, Massachusetts, New Jersey, New York, Pennsylvania, Rhode Island, Vermont • 🇺🇸 United StatesSeniorWebsite
Tech Stack
Tools & technologiesCloudDistributed SystemsGoogle Cloud PlatformKubernetesPrometheus
About the role
Key responsibilities & impact- Design, build, and operate the shared platform foundations engineers ship on every day: GCP infrastructure, Kubernetes, networking, routing, CI/CD, and observability.
- Diagnose and troubleshoot complex distributed systems running at high request volume.
- Ensure observability and analyze the behavior of our stack.
- Contribute to in-flight work like modernizing our edge, caching, and gateway layers onto Fastly and tightening observability across the platform.
- Raise the reliability bar through better dashboards, alert severity, paging standards, on-call readiness, and incident response.
- Make deployment boring in the best way: build golden paths, production readiness checks, safe rollouts, and useful automation so engineers have fewer places to look before they ship.
- Mentor engineers and raise the technical bar through code review, design review, and pairing.
- Participate in our on-call rotation and help our developer on-call rollout land well.
Requirements
What you’ll need- Based in the United States, with reasonable overlap with European engineering hours.
- Experience with SRE/DevOps tools, processes, and culture.
- 5+ years of experience as part of an SRE on-call rotation.
- Analytical approach to designing, diagnosing, and optimizing infrastructure.
- Experience with managing scalable, highly available, cloud-based applications, ideally with high request volume and customer-facing uptime expectations.
- Experience with Kubernetes for orchestrating, scaling, and managing containerized applications in cloud-based environments.
- Experience building CI/CD pipelines.
- Experience with an observability stack (Prometheus, et al.).
- Comfortable working across CDNs, edge, gateways, and caching layers, or eager to go deep there.
- You improve on-call and reliability by building systems, standards, and feedback loops that make production healthier over time.
- You are comfortable dealing with incidents and outages and have built a practical, thoughtful communication style for handling high-pressure situations.
- An open but considered approach to new technologies.
Benefits
Comp & perks- A highly-skilled, inspiring, and supportive team
- Real infrastructure scale and meaningful, hands-on work changing how it runs
- Positive, flexible, and trust-based work environment that encourages long-term professional and personal growth
- A global, multi-culturally diverse group of colleagues and customers
- Comprehensive health plans and perks
- A healthy work-life balance that accommodates individual and family needs
- Competitive stock options program and location-based salary
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Cloud-Based Application ManagementDistributed Systems DiagnosisProduction Readiness ChecksIncident ResponseAutomation for Deployment
Soft Skills
Analytical Problem SolvingEffective Communication in High-Pressure SituationsMentoring and Code Review