Origami Risk

Site Reliability Engineer

Origami Risk

full-time

Posted on:

Location Type: Remote

Location: IllinoisUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $100,000 - $120,000 per year

About the role

  • Leads post-incident investigations for the Site Reliability team.
  • Conducts in-depth post-incident analyses to identify root causes and develops preventive strategies.
  • Drafts clear and insightful RCAs for customer delivery.
  • Cross trains colleagues on how to best leverage observability tools during incident and performance investigations.
  • Provides visibility to all stakeholders throughout the entire Site Reliability process.
  • Collaborates with cross-functional teams to implement system enhancements that enhance scalability and stability.
  • Develops client-focused dashboards/alerts to proactively identify performance challenges.
  • Monitors and continuously improves our time to resolution metrics.
  • Maintains and configures core observability tools to ensure optimum performance and key metrics/data are available for incident response and performance investigations.
  • Provides an actionable feedback loop to Observability and Engineering teams toward improving MELT and development patterns.
  • Contributes to the development of automation tools to streamline incident response.
  • Works proactively to prevent incidents and reduce their impact on our platform.
  • Partners with the larger Cloud Operations, SRE, Engineering teams, and the business-at-large to advance our SaaS platforms.
  • Participates in on-call rotation with other team members as needed.
  • Other duties as assigned.

Requirements

  • Bachelor's degree in Computer Science or related field (or equivalent experience)
  • 5+ years of proven experience in a Site Reliability Engineering role.
  • Strong knowledge of SRE best practices and incident management protocols
  • Deep experience using and/or configuring New Relic, Data Dog, SumoLogic or similar observability tools
  • Proficiency in reading and writing code (e.g., JavaScript, .NET, SQL)
  • Familiarity with cloud platforms (e.g., AWS, Azure) and architectural patterns
  • Excellent problem-solving skills and a data-driven approach to incident analysis
  • Prior experience operating within a Public Cloud environment (AWS strongly preferred)
  • Experience troubleshooting C#/.Net based web applications to identify bugs/performance challenges.
  • Solid knowledge of SaaS operations
  • Ability to succeed when facing ambiguity and differing levels of operational maturation
  • Advanced written and verbal communication skills
  • Windows and SQL-server troubleshooting skills preferred
  • Knowledge of Continuous Integration and Continuous Delivery (CI/CD) pipelines preferred
  • Experience working in an Infrastructure as a Code (IaC) environment preferred
  • Previous experience as a Software Engineer and/or System Administrator is a plus
Benefits
  • 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
JavaScript.NETSQLNew RelicData DogSumoLogicAWSAzureCI/CDInfrastructure as Code (IaC)
Soft Skills
problem-solvingdata-driven approachcommunicationcollaborationadaptabilitycross-trainingstakeholder visibilityfeedback provisionproactive incident preventionambiguity management