Salary
💰 $184,100 - $216,600 per year
About the role
- Ensure systems are robust, efficient, and scalable as part of engineering operations
- Collaborate closely with application engineering teams to enhance observability across platforms
- Spearhead the adoption of SRE best practices and embed reliability and efficiency into engineering culture
- Operate and improve monitoring, alerting, metrics, logging, and application performance monitoring
- Participate in on-call rotations and ensure Service Level Objectives are met for internal and external customers
- Influence both technology and culture across teams and drive change to improve reliability
Requirements
- 8+ years of experience with Site Reliability Engineering and/or DevOps, with a proven ability to work independently on projects and tasks
- Experience collaborating across engineering and product teams, to drive change
- Strong Site Reliability / DevOps Experience – knowledge of how to work with and properly apply SRE fundamentals in a mission-critical environment
- Experience with on-call rotations in mission-critical environments to ensure Service Level Objectives are met for internal and external customers
- 5+ years of Kubernetes experience
- 5+ years of AWS experience
- Experience with monitoring, alerting, metrics, logging, and application performance monitoring
- US-based (job states 100% remote work environment, US-based only) and ability to work legally in the United States
- Preferred: Well-seasoned and proficient knowledge of handling complex incidents properly according to best practices
- Preferred: Collaborative working style and thrives in a remote environment
- Preferred: High personal awareness, growth mindset; “know what they don’t know” during incidents, and while participating in complex problems
- Preferred: 1+ year of Harness Experience
- Preferred: Previous experience in mission-critical systems and enterprises; commitment to excellence in stressful situations