Senior Site Reliability Engineer

Latitude.sh

full-time

Posted on: 3/20/2026

Location Type: Remote

✨ AI Apply

About the role

Continuously improve Latitude.sh’s platform reliability and performance
Design, build, and maintain tools to automate operational tasks and incident response
Implement and improve observability solutions, including monitoring, alerting, and tracing
Collaborate with engineering and platform teams to design scalable and resilient systems
Participate in on-call rotations and lead post-incident reviews with a focus on learning
Develop and document processes and runbooks that ensure operational excellence
Contribute to SLOs/SLIs definition and reliability metrics adoption across teams

Strong verbal and written English communication skills
Advanced knowledge of Linux/Unix systems in production environments
Experience with Kubernetes and container orchestration
Proficiency with infrastructure automation tools (e.g., Terraform, Ansible)
Experience with observability stacks (e.g., Prometheus, Grafana, Loki, ELK)
Familiarity with scripting and programming languages such as Bash, Python, Go, or Ruby
Working knowledge of Git and CI/CD pipelines
Solid understanding of incident management and root cause analysis processes
Knowledge of cloud-native reliability and security best practices

Benefits

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

LinuxUnixKubernetesTerraformAnsiblePrometheusGrafanaLokiELKBash

Soft Skills

communicationcollaborationleadershipdocumentationincident managementroot cause analysis