
Senior Site Reliability Engineer
AppOmni
full-time
Posted on:
Location Type: Hybrid
Location: California • Colorado • United States
Visit company websiteExplore more
Salary
💰 $180,000 - $200,000 per year
Job Level
About the role
- Ensure our systems and infrastructure's reliability, scalability, and performance
- Monitoring system availability
- Implementing automation for deployment and maintenance tasks
- Proactively identifying areas for optimization
- Collaborate with the development team to establish and refine service-level objectives
- Drive incident response and postmortem analysis to minimize service disruptions
Requirements
- Excellent technical and non-technical communication skills
- Prior Experience as an SRE or related discipline responsible for maintaining high availability of a cloud-based application
- Troubleshooting performance bottlenecks
- Configuring monitoring and alerting
- Conducting incident response in a blameless environment
- A knack for reducing manual toil tasks with automation and systematic thinking
- Prior experience working with CI/CD tools and processes, pipelines-as-code (GitHub Actions, CircleCI)
- At least 5+ years of hands-on experience with Python or Golang
- A solid background in configuration management and infrastructure-as-code (Terraform)
- Solid experience in monitoring/observability systems (Grafana, Prometheus, etc.)
- Demonstrated knowledge with Container orchestration (Kubernetes/GKE)
- Experience managing Kubernetes platforms and resources, and using Kubernetes deployment tool and patterns (Helm, GitOps, Knative)
Benefits
- Generous PTO
- Company and floating holidays
- Parental and family leave
- Health insurance (medical, dental, vision with HSA option)
- EAP
- Company-provided life insurance
- AD&D
- STD/LTD
- Supplemental life insurance options
- 401(k) with Roth
- Monthly wellness benefit reimbursement
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonGolangCI/CDinfrastructure-as-codeconfiguration managementmonitoring systemsobservability systemsContainer orchestrationKubernetesautomation
Soft Skills
technical communicationnon-technical communicationtroubleshootingsystematic thinkingincident responsecollaborationoptimizationblameless environmentproactive identificationservice-level objectives