
Senior Site Reliability Engineer
Pylon
full-time
Posted on:
Location Type: Hybrid
Location: Palo Alto • California • United States
Visit company websiteExplore more
Salary
💰 $140,000 - $220,000 per year
Job Level
About the role
- You'll own reliability and operational excellence for Pylon's production systems.
- Designing and implementing monitoring, alerting, and incident response processes that scale as we grow.
- Building tooling that makes the entire engineering team more effective.
- Establish on-call rotations and runbooks.
- Ensure our platform can handle the demands of a regulated, high-stakes financial product.
- Spend 50%+ of your time writing code: building infrastructure tooling, automating operational burden, making reliability improvements, and productivity tools.
Requirements
- 4+ years experience in SRE, infrastructure, or platform engineering roles
- Experience working on a team of SREs at a company with mature SRE practices (not solo SRE roles)
- Real on-call experience at scale in a large production environment (you've carried the pager and lived through incidents)
- Deep AWS expertise (ECS, RDS, networking, security)
- Strong experience with declarative infrastructure (Terraform, CDK, or similar)
- Nix experience (we use it and want to expand its adoption)
- Track record of building reliability tooling and automation
- Can design and implement monitoring, alerting, and observability systems from first principles
- Comfortable working in a regulated environment where "breaking things" is not an option.
Benefits
- Equity
- Benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
SREinfrastructure engineeringplatform engineeringAWSTerraformCDKNixmonitoring systemsalerting systemsobservability systems
Soft Skills
operational excellenceteam collaborationincident responsereliability improvementsautomationtooling developmenton-call experiencedesign skillsproblem-solvingadaptability