
Senior Site Reliability Engineer
Latitude.sh
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Job Level
About the role
- Continuously improve Latitude.sh’s platform reliability and performance
- Design, build, and maintain tools to automate operational tasks and incident response
- Implement and improve observability solutions, including monitoring, alerting, and tracing
- Collaborate with engineering and platform teams to design scalable and resilient systems
- Participate in on-call rotations and lead post-incident reviews with a focus on learning
- Develop and document processes and runbooks that ensure operational excellence
- Contribute to SLOs/SLIs definition and reliability metrics adoption across teams
Requirements
- Strong verbal and written English communication skills
- Advanced knowledge of Linux/Unix systems in production environments
- Experience with Kubernetes and container orchestration
- Proficiency with infrastructure automation tools (e.g., Terraform, Ansible)
- Experience with observability stacks (e.g., Prometheus, Grafana, Loki, ELK)
- Familiarity with scripting and programming languages such as Bash, Python, Go, or Ruby
- Working knowledge of Git and CI/CD pipelines
- Solid understanding of incident management and root cause analysis processes
- Knowledge of cloud-native reliability and security best practices
Benefits
- Paid Time Off
- Competitive Compensation
- Annual Bonus based on company and team performance
- Flexible work hours
- Opportunities for professional growth and development
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
LinuxUnixKubernetesTerraformAnsiblePrometheusGrafanaLokiELKBash
Soft Skills
communicationcollaborationleadershipdocumentationincident managementroot cause analysis