Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Sanity.io

Senior Site Reliability Engineer

Sanity.io

SRE managing scalable content operations infrastructure for AI-powered platform. Collaborating with dev teams and ensuring reliability for high request volume systems.

Posted 7/2/2026full-timeRemote • Connecticut, Massachusetts, New Jersey, New York, Pennsylvania, Rhode Island, Vermont • 🇺🇸 United StatesSeniorWebsite

Tech Stack

Tools & technologies
CloudDistributed SystemsGoogle Cloud PlatformKubernetesPrometheus

About the role

Key responsibilities & impact
  • Design, build, and operate the shared platform foundations engineers ship on every day: GCP infrastructure, Kubernetes, networking, routing, CI/CD, and observability.
  • Diagnose and troubleshoot complex distributed systems running at high request volume.
  • Ensure observability and analyze the behavior of our stack.
  • Contribute to in-flight work like modernizing our edge, caching, and gateway layers onto Fastly and tightening observability across the platform.
  • Raise the reliability bar through better dashboards, alert severity, paging standards, on-call readiness, and incident response.
  • Make deployment boring in the best way: build golden paths, production readiness checks, safe rollouts, and useful automation so engineers have fewer places to look before they ship.
  • Mentor engineers and raise the technical bar through code review, design review, and pairing.
  • Participate in our on-call rotation and help our developer on-call rollout land well.

Requirements

What you’ll need
  • Based in the United States, with reasonable overlap with European engineering hours.
  • Experience with SRE/DevOps tools, processes, and culture.
  • 5+ years of experience as part of an SRE on-call rotation.
  • Analytical approach to designing, diagnosing, and optimizing infrastructure.
  • Experience with managing scalable, highly available, cloud-based applications, ideally with high request volume and customer-facing uptime expectations.
  • Experience with Kubernetes for orchestrating, scaling, and managing containerized applications in cloud-based environments.
  • Experience building CI/CD pipelines.
  • Experience with an observability stack (Prometheus, et al.).
  • Comfortable working across CDNs, edge, gateways, and caching layers, or eager to go deep there.
  • You improve on-call and reliability by building systems, standards, and feedback loops that make production healthier over time.
  • You are comfortable dealing with incidents and outages and have built a practical, thoughtful communication style for handling high-pressure situations.
  • An open but considered approach to new technologies.

Benefits

Comp & perks
  • A highly-skilled, inspiring, and supportive team
  • Real infrastructure scale and meaningful, hands-on work changing how it runs
  • Positive, flexible, and trust-based work environment that encourages long-term professional and personal growth
  • A global, multi-culturally diverse group of colleagues and customers
  • Comprehensive health plans and perks
  • A healthy work-life balance that accommodates individual and family needs
  • Competitive stock options program and location-based salary

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Cloud-Based Application ManagementDistributed Systems DiagnosisProduction Readiness ChecksIncident ResponseAutomation for Deployment
Soft Skills
Analytical Problem SolvingEffective Communication in High-Pressure SituationsMentoring and Code Review