
Senior Site Reliability Engineer – Infrastructure
Underdog Fantasy
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $160,000 - $240,000 per year
Job Level
About the role
- Own and maintain the incident response process, including defining procedures, tools, and best practices
- Guide teams in establishing and monitoring Service Level Objectives (SLOs), including setting up alerts and reporting systems
- Lead capacity planning initiatives, focusing on both short and long-term scalability while optimizing costs
- Develop and implement disaster recovery plans, including regular testing and regulatory compliance
- Collaborate with teams on architecture decisions to ensure high availability and scalability
- Manage launch and event planning for high-traffic occasions, focusing on infrastructure preparation and capacity management (a.k.a. Launch Readiness)
- Act as an internal expert and consultant for monitoring tools like Datadog and Pagerduty and infrastructure like AWS and Kubernetes
- Emphasis on automation and tooling to scale our workload
- Contribute across codebases in Ruby, Python, Go, TypeScript, Swift, and Kotlin as needed to support the initiatives described above.
Requirements
- A strong written and verbal communicator
- Collaborative by nature
- Someone who enjoys using research, data, and experiments to make decisions; you believe “Hope is not a strategy.”
- You enjoy working directly with customers (generally engineers or other people inside the company)
- You think long-term about what is best for the business and its customers
- You are excited to take ownership
- You are very comfortable around an IDE, working with multiple languages, multiple web application frameworks, AWS services, Kubernetes, PostgreSQL
- You can work independently to learn new languages/technologies as needed
- You enjoy deploying changes to production quickly, multiple times a week if necessary
Benefits
- Unlimited PTO (we're extremely flexible with the exception of the first few weeks before & into the NFL season)
- 16 weeks of fully paid parental leave
- Home office stipend
- A connected virtual first culture with a highly engaged distributed workforce
- 5% 401k match, FSA, company paid health, dental, vision plan options for employees and dependents
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
RubyPythonGoTypeScriptSwiftKotlindisaster recoverycapacity planningautomationmonitoring
Soft skills
written communicationverbal communicationcollaborationdata-driven decision makingcustomer engagementownershipindependencelong-term thinkingadaptabilityproblem-solving