Tech Stack
CloudDistributed SystemsPython
About the role
- Collaborate directly with our strategic enterprise accounts and product teams
- Design and run operational processes to monitor strategic customers
- Proactively identify and implement opportunities to scale support operations
- Configure and use advanced monitoring and alerting workflows
- Contribute to reliability reviews and preparedness for new features
- Design and refine incident response processes and documentation
- Analyze operational metrics and incident RCAs for improvements
- Provide support coverage during holidays and weekends
Requirements
- Bachelor’s degree in Computer Science or related field
- 8+ years of experience in technical operations roles such as SRE/NOC
- Deep familiarity with modern monitoring, alerting, and observability practices
- Proven experience leading incident response for high-severity outages
- Strong skills in scripting or software engineering (e.g., Python or similar)
- Solid understanding of cloud infrastructure and distributed systems fundamentals
- Effective at working cross-functionally in a high-trust environment
- 3 days in the office per week
- Relocation assistance for new employees
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
scriptingsoftware engineeringPythonmonitoring practicesalerting practicesobservability practicesincident responsecloud infrastructuredistributed systemsoperational metrics
Soft skills
collaborationproactive identificationprocess designcross-functional teamworkhigh-trust environment
Certifications
Bachelor’s degree in Computer Science