FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior Site Reliability Engineer
AdobeSite Reliability Engineer for Adobe's Project Graph ensuring the stability of HTTP APIs and async compute platform. Collaborating with backend engineers to enhance system performance and reliability.
Posted 6/19/2026full-timeSan Jose • California, Washington • 🇺🇸 United StatesSenior💰 $159,200 - $301,600 per yearWebsite
Tech Stack
Tools & technologiesAWSCloudDockerJavaScriptKubernetesNode.jsPostgresRedisTerraformTypeScript
About the role
Key responsibilities & impact- Define and enforce SLOs, SLIs, and error budgets for Project Graph's HTTP APIs and async compute platform.
- Build and maintain observability—metrics, logging, tracing, and alerting—so issues are caught and diagnosed quickly.
- Lead incident response, run blameless postmortems, and drive the follow-up work that prevents recurrence.
- Improve the reliability and scalability of an async job scheduling system built on top of Kubernetes and Postgres.
- Maintain and improve CI/CD systems to keep delivery fast, safe, and reliable.
- Own database data protection, backup, and resilience—including backup strategy, recovery testing, and disaster recovery planning.
- Design and implement cloud infrastructure and automation to meet reliability, performance, and cost goals.
- Reduce operational toil through tooling and automation, and partner with developers to build reliability in from the start.
- Participate in an on-call rotation.
Requirements
What you’ll need- Bachelor's degree or equivalent experience in Computer Science.
- 5-10 years of experience in site reliability engineering, infrastructure, or backend software development with a strong operational focus.
- Expertise with Kubernetes in production, including scaling, troubleshooting, and tuning.
- Expertise with Docker and containerization.
- Strong experience with bash and CI/CD tools, like CircleCI.
- Strong hands-on experience in at least one server-side language; we use Node.js/TypeScript.
- Experience operating data stores such as Postgres, Redis, or similar in production; we run on AWS Aurora (Postgres-compatible), so familiarity with managed/Aurora environments is a plus.
- Experience with database backup, resilience, and disaster recovery—designing backup strategies, testing recovery, and meeting RPO/RTO targets.
- Experience with Terraform and AWS.
- Hands-on experience with observability tooling (metrics, logging, distributed tracing) and alerting.
- Familiarity with HTTP API security.
- A track record of incident response and a systematic, blameless approach to learning from failures.
- An interest in and ability to learn new technologies.
- Ability to tackle problems in a sustainable way, always striving to improve our processes and learn.
- Excellent verbal and written communication skills; can effectively articulate complex ideas and influence others through well-reasoned explanations.
Benefits
Comp & perks- Health insurance
- 401(k) matching
- Paid time off
- Flexible work hours
- Professional development opportunities
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
site reliability engineeringinfrastructurebackend software developmentKubernetesDockerbashNode.jsTypeScriptPostgresTerraform
Soft Skills
incident responseblameless postmortemsproblem-solvingprocess improvementcommunicationcollaborationlearning agility
Certifications
Bachelor's degree in Computer Science