
Site Reliability Engineer
Origami Risk
full-time
Posted on:
Location Type: Remote
Location: Illinois • United States
Visit company websiteExplore more
Salary
💰 $100,000 - $120,000 per year
About the role
- Leads post-incident investigations for the Site Reliability team.
- Conducts in-depth post-incident analyses to identify root causes and develops preventive strategies.
- Drafts clear and insightful RCAs for customer delivery.
- Cross trains colleagues on how to best leverage observability tools during incident and performance investigations.
- Provides visibility to all stakeholders throughout the entire Site Reliability process.
- Collaborates with cross-functional teams to implement system enhancements that enhance scalability and stability.
- Develops client-focused dashboards/alerts to proactively identify performance challenges.
- Monitors and continuously improves our time to resolution metrics.
- Maintains and configures core observability tools to ensure optimum performance and key metrics/data are available for incident response and performance investigations.
- Provides an actionable feedback loop to Observability and Engineering teams toward improving MELT and development patterns.
- Contributes to the development of automation tools to streamline incident response.
- Works proactively to prevent incidents and reduce their impact on our platform.
- Partners with the larger Cloud Operations, SRE, Engineering teams, and the business-at-large to advance our SaaS platforms.
- Participates in on-call rotation with other team members as needed.
- Other duties as assigned.
Requirements
- Bachelor's degree in Computer Science or related field (or equivalent experience)
- 5+ years of proven experience in a Site Reliability Engineering role.
- Strong knowledge of SRE best practices and incident management protocols
- Deep experience using and/or configuring New Relic, Data Dog, SumoLogic or similar observability tools
- Proficiency in reading and writing code (e.g., JavaScript, .NET, SQL)
- Familiarity with cloud platforms (e.g., AWS, Azure) and architectural patterns
- Excellent problem-solving skills and a data-driven approach to incident analysis
- Prior experience operating within a Public Cloud environment (AWS strongly preferred)
- Experience troubleshooting C#/.Net based web applications to identify bugs/performance challenges.
- Solid knowledge of SaaS operations
- Ability to succeed when facing ambiguity and differing levels of operational maturation
- Advanced written and verbal communication skills
- Windows and SQL-server troubleshooting skills preferred
- Knowledge of Continuous Integration and Continuous Delivery (CI/CD) pipelines preferred
- Experience working in an Infrastructure as a Code (IaC) environment preferred
- Previous experience as a Software Engineer and/or System Administrator is a plus
Benefits
- 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
JavaScript.NETSQLNew RelicData DogSumoLogicAWSAzureCI/CDInfrastructure as Code (IaC)
Soft Skills
problem-solvingdata-driven approachcommunicationcollaborationadaptabilitycross-trainingstakeholder visibilityfeedback provisionproactive incident preventionambiguity management