Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
ARA

Senior Site Reliability Engineer

ARA

Senior Site Reliability Engineer partnering with software developers and IT staff to enhance system design and operational readiness. Reporting on improvements in platform stability and availability.

Posted 7/3/2026full-timeRemote • New Mexico • 🇺🇸 United StatesSeniorWebsite

Tech Stack

Tools & technologies
GoKubernetesLinuxPython

About the role

Key responsibilities & impact
  • Partner with software developers, platform engineers, and IT staff to improve system design, operability, deployment safety, and production support readiness.
  • Define and maintain operational standards, runbooks, support procedures, escalation paths, and service-level objectives.
  • Evaluate system architecture and changes to ensure they balance functional requirements, service quality, reliability, security, and compliance needs.
  • Drive continuous improvement in platform stability, maintenance, and availability.
  • Provide advanced technical support and troubleshooting for complex platform and service issues affecting internal users and stakeholders.

Requirements

What you’ll need
  • 8+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, Systems Engineering, or related infrastructure roles supporting production services.
  • Strong experience with Linux systems administration and troubleshooting in enterprise environments.
  • Strong experience operating and maintaining on-prem Kubernetes platforms and all related components including CRI, CNI, and CSI plugins.
  • Experience deploying and maintaining applications on Kubernetes using Helm, Kustomize, and similar tooling.
  • Experience supporting DevOps tooling such as GitLab, Artifactory, Jira, Confluence.
  • Experience with GitOps tools such as FluxCD or ArgoCD.
  • Proficiency scripting with at least one of Python, Go, or Bash.
  • Strong experience designing, maintaining, and maturing observability tooling including monitoring, dashboards, logging and tracing, and supporting SLOs.
  • Strong understanding of reliability engineering concepts: Service health indicators, High availability design, failure reduction, and testing, Operational readiness practices, including developing documentation, runbooks, and architectural descriptions, Incident response, root cause analysis, remediation/recovery.
  • Ability to obtain a security clearance, which includes U.S. citizenship.

Benefits

Comp & perks
  • Equal Opportunity Employer/Protected Veterans/Individuals with Disabilities

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Site Reliability EngineeringLinux Systems AdministrationKubernetesHelmKustomizeGitOpsPythonGoBashObservability Tooling
Certifications
Security Clearance