FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Staff SRE, Ads
Reddit, Inc.Staff SRE leading reliability initiatives across Ads domains at Reddit. Mentoring engineers and improving infrastructure reliability for critical revenue-generating systems.
Tech Stack
Tools & technologiesCloudDistributed SystemsGoLinuxPython
About the role
Key responsibilities & impact- Lead reliability initiatives across multiple Ads domains including ad serving, auctions, targeting, reporting, measurement, and billing.
- Partner with engineering leadership to improve reliability, scalability, operational excellence, and engineering efficiency across the Ads organization.
- Drive architecture reviews and influence technical decisions impacting critical revenue-generating systems.
- Design and build platforms, tooling, and automation that improve reliability and developer productivity at scale.
- Participate in on-call rotations, lead complex incident investigations and coordinate cross-functional response efforts during major production events.
- Identify systemic reliability risks and drive long-term solutions that improve platform resilience.
- Establish reliability metrics around advertiser-critical user journeys such as campaign creation, ad delivery, auction participation, reporting, attribution, and billing.
- Mentor engineers and provide technical leadership across multiple teams.
- Influence roadmap planning and ensure reliability considerations are incorporated into product and infrastructure investments.
Requirements
What you’ll need- 8+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or related roles operating large scale distributed systems.
- Strong experience supporting high traffic, user facing production environments.
- Deep understanding of distributed systems, networking, Linux systems, cloud native architectures.
- Experience designing highly available systems with strong operational and reliability practices.
- Strong understanding of observability systems including metrics, logging, tracing, and alerting.
- Good programming skills in languages such as Go, Python, or similar.
- Experience improving reliability through SLOs, automation, incident management, and performance optimization.
- Demonstrated ability to troubleshoot complex issues across a modern distributed system stack.
- Strong collaboration and communication skills with the ability to influence technical direction across teams.
Benefits
Comp & perks- Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support
- Family Planning Support
- Gender-Affirming Care
- Mental Health & Coaching Benefits
- Private Medical, Dental, and Vision Benefits
- Personal Retirement Savings Account with matching contribution
- Cycle to Work and Tax Saver schemes
- Flexible Vacation & Paid Volunteer Time Off
- Generous Paid Parental Leave
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Site Reliability EngineeringInfrastructure Engineeringdistributed systemsLinux systemscloud native architecturesGoPythonSLOsautomationincident management
Soft Skills
collaborationcommunicationtechnical leadershipmentoringinfluencingtroubleshootingoperational excellenceengineering efficiencycross-functional coordinationproblem-solving