FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Staff SRE, Ads
Reddit, Inc.Staff Site Reliability Engineer leading reliability initiatives across Ads domains at Reddit. Partnering with engineering to improve operational excellence and platform resilience.
Tech Stack
Tools & technologiesCloudDistributed SystemsGoLinuxPython
About the role
Key responsibilities & impact- Lead reliability initiatives across multiple Ads domains including ad serving, auctions, targeting, reporting, measurement, and billing.
- Partner with engineering leadership to improve reliability, scalability, operational excellence, and engineering efficiency across the Ads organization.
- Drive architecture reviews and influence technical decisions impacting critical revenue-generating systems.
- Design and build platforms, tooling, and automation that improve reliability and developer productivity at scale.
- Participate in on-call rotations, lead complex incident investigations and coordinate cross-functional response efforts during major production events.
- Identify systemic reliability risks and drive long-term solutions that improve platform resilience.
- Establish reliability metrics around advertiser-critical user journeys such as campaign creation, ad delivery, auction participation, reporting, attribution, and billing.
- Mentor engineers and provide technical leadership across multiple teams.
- Influence roadmap planning and ensure reliability considerations are incorporated into product and infrastructure investments.
Requirements
What you’ll need- 8+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or related roles operating large scale distributed systems.
- Strong experience supporting high traffic, user facing production environments.
- Deep understanding of distributed systems, networking, Linux systems, cloud native architectures.
- Experience designing highly available systems with strong operational and reliability practices.
- Strong understanding of observability systems including metrics, logging, tracing, and alerting.
- Good programming skills in languages such as Go, Python, or similar.
- Experience improving reliability through SLOs, automation, incident management, and performance optimization.
- Demonstrated ability to troubleshoot complex issues across a modern distributed system stack.
- Strong collaboration and communication skills with the ability to influence technical direction across teams.
Benefits
Comp & perks- Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support
- Family Planning Support
- Gender-Affirming Care
- Mental Health & Coaching Benefits
- Private Pension plan with Employer-matching
- 100% employer-sponsored group medical plan
- Income Replacement Programs
- Flexible Vacation & Paid Volunteer Time Off
- Generous Paid Parental Leave
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Site Reliability EngineeringInfrastructure Engineeringdistributed systemsLinux systemscloud native architecturesobservability systemsGoPythonautomationperformance optimization
Soft Skills
collaborationcommunicationtechnical leadershipmentoringinfluencing technical directiontroubleshootingincident managementoperational excellencescalabilitydeveloper productivity