Senior Site Reliability Engineer

ARA

. Partner with software developers, platform engineers, and IT staff to improve system design, operability, deployment safety, and production support readiness.

Posted 4/4/2026full-timeRemote • New Mexico • 🇺🇸 United StatesSeniorWebsite

Tech Stack

Tools & technologies

AWSAzureCloudGoKubernetesLinuxPython

About the role

Key responsibilities & impact

Partner with software developers, platform engineers, and IT staff to improve system design, operability, deployment safety, and production support readiness.
Define and maintain operational standards, runbooks, support procedures, escalation paths, and service-level objectives.
Evaluate system architecture and changes to ensure they balance functional requirements, service quality, reliability, security, and compliance needs.
Drive continuous improvement in platform stability, maintenance, and availability.
Provide advanced technical support and troubleshooting for complex platform and service issues affecting internal users and stakeholders.

Requirements

What you’ll need

8+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, Systems Engineering, or related infrastructure roles supporting production services.
Strong experience with Linux systems administration and troubleshooting in enterprise environments.
Strong experience operating and maintaining on-prem Kubernetes platforms and all related components including CRI, CNI, and CSI plugins.
Experience deploying and maintaining applications on Kubernetes using Helm, Kustomize, and similar tooling.
Experience supporting DevOps tooling such as GitLab, Artifactory, Jira, Confluence.
Experience with GitOps tools such as FluxCD or ArgoCD.
Proficiency scripting with at least one of Python, Go, or Bash.
Strong experience designing, maintaining, and maturing observability tooling including monitoring, dashboards, logging and tracing, and supporting SLOs.
Strong understanding of reliability engineering concepts: Service health indicators High availability design, failure reduction, and testing Operational readiness practices, including developing documentation, runbooks, and architectural descriptions Incident response, root cause analysis, remediation/recovery
Ability to obtain a security clearance, which includes U.S. citizenship.
Preferred: Experience with multiple Linux distributions including Ubuntu.
Experience with at least one of the following: Tanzu Kubernetes, Nutanix Kubernetes Platform, Canonical Kubernetes.
Experience with cloud platforms such as AWS and Azure.
Experience with infrastructure automation and configuration management.
Experience managing AI tooling on Kubernetes including MCP Servers, LLM platforms (vLLM, Ollama), Kubeflow.
Experience with security and compliance considerations in regulated environments.
DoD experience.
Active or inactive Secret Security Clearance.

Benefits

Comp & perks

🌐 Worldwide Post a Job Affiliates ❌ Jobs You've Hidden ⭐️ Saved Jobs ✅ Applied Jobs Account ARA Website LinkedIn All Job Openings 1001 - 5000 employees 🚀 Aerospace 🤖 Artificial Intelligence 🔬 Science 💰 $12M Grant on 2023-04 Aerospace
Artificial Intelligence
Science ARA is a 100% employee-owned applied research and engineering company that provides technically rigorous solutions across national security, infrastructure, health, and energy domains. The firm specializes in C4ISR and space technologies, unmanned systems and autonomy, sensors and advanced security systems, AR/VR and synthetic environments, AI/ML, electromagnetics and explosive testing, biodefense and physiological modeling, and disaster risk and infrastructure engineering. ARA delivers research, engineering, prototyping, and mission-focused technical services to government and commercial customers. Senior Site Reliability Engineer 🔥 1 hour ago 🌶️ New Mexico – Remote ⏰ Full Time 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) AWS Azure Cloud Kubernetes Linux Python Go Apply Now Find Hiring Managers Customize resume for this job ☆ Save ☑️ Mark as applied ❌ Hide Report problem 📋 Description
Partner with software developers, platform engineers, and IT staff to improve system design, operability, deployment safety, and production support readiness.
Define and maintain operational standards, runbooks, support procedures, escalation paths, and service-level objectives.
Evaluate system architecture and changes to ensure they balance functional requirements, service quality, reliability, security, and compliance needs.
Drive continuous improvement in platform stability, maintenance, and availability.
Provide advanced technical support and troubleshooting for complex platform and service issues affecting internal users and stakeholders. 🎯 Requirements
8+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, Systems Engineering, or related infrastructure roles supporting production services.
Strong experience with Linux systems administration and troubleshooting in enterprise environments.
Strong experience operating and maintaining on-prem Kubernetes platforms and all related components including CRI, CNI, and CSI plugins.
Experience deploying and maintaining applications on Kubernetes using Helm, Kustomize, and similar tooling.
Experience supporting DevOps tooling such as GitLab, Artifactory, Jira, Confluence.
Experience with GitOps tools such as FluxCD or ArgoCD.
Proficiency scripting with at least one of Python, Go, or Bash.
Strong experience designing, maintaining, and maturing observability tooling including monitoring, dashboards, logging and tracing, and supporting SLOs.
Strong understanding of reliability engineering concepts: Service health indicators High availability design, failure reduction, and testing Operational readiness practices, including developing documentation, runbooks, and architectural descriptions Incident response, root cause analysis, remediation/recovery
Ability to obtain a security clearance, which includes U.S. citizenship.
Preferred: Experience with multiple Linux distributions including Ubuntu.
Experience with at least one of the following: Tanzu Kubernetes, Nutanix Kubernetes Platform, Canonical Kubernetes.
Experience with cloud platforms such as AWS and Azure.
Experience with infrastructure automation and configuration management.
Experience managing AI tooling on Kubernetes including MCP Servers, LLM platforms (vLLM, Ollama), Kubeflow.
Experience with security and compliance considerations in regulated environments.
DoD experience.
Active or inactive Secret Security Clearance. Apply Now 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score Similar Jobs Senior GNC Engineer, DevOps 🔥 1 hour ago Archer 501 - 1000 Website LinkedIn All Job Openings Sr GNC Engineer, DevOps developing software tools for aircraft design at Archer Aviation. Collaborating with engineers to enhance models and simulations in aerospace technology. 🇺🇸 United States – Remote 💵 $140k - $170k / year ⏰ Full Time 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) 🦅 H1B Visa Sponsor Cloud Jenkins Python DevOps Engineer 🔥 6 hours ago ST Engineering iDirect 501 - 1000 📡 Telecommunications 🔒 Cybersecurity 🏛️ Government Website LinkedIn All Job Openings DevOps Engineer responsible for increasing productivity and reliability through automation and CI/CD at ST Engineering iDirect. Collaborating with engineering teams to enhance developer experience and streamline workflows. 🇺🇸 United States – Remote ⏰ Full Time 🟡 Mid-level 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) 🦅 H1B Visa Sponsor AWS Cloud Jenkins Linux Shell Scripting Senior Engineering Manager – Release Engineering 🔥 6 hours ago Mercury 201 - 500 💳 Fintech 💸 Finance ☁️ SaaS Website LinkedIn All Job Openings Engineering Manager leading the Release Engineering team to improve CI/CD processes at Mercury. Focused on building a culture of operational excellence in a fast-paced environment. 🇺🇸 United States – Remote 💵 $239k - $298.8k / year ⏰ Full Time 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) 🦅 H1B Visa Sponsor Senior Site Reliability Engineer 🔥 9 hours ago Satsuma Technology Ltd 1 - 10 🔌 API 🤖 Artificial Intelligence 🛍️ eCommerce Website LinkedIn All Job Openings Senior SRE responsible for reliability and operability of Satsuma's multi-cloud infrastructure. Enhancing infrastructure via AI-assisted development and collaborating closely with engineering teams. 🇺🇸 United States – Remote ⏰ Full Time 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) AWS Azure Cloud Google Cloud Platform Grafana Kubernetes Terraform Site Reliability Engineer 🔥 11 hours ago Zelis 1001 - 5000 ⚕️ Healthcare Insurance 💸 Finance Website LinkedIn All Job Openings Site Reliability Engineer defining observability strategy across platforms for healthcare technology. Collaborating with teams to enhance system reliability and operational efficiency. 🇺🇸 United States – Remote 💵 $86k - $109.3k / year 💰 $20.1M Venture Round on 2020-01 ⏰ Full Time 🟡 Mid-level 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) 🦅 H1B Visa Sponsor ASP.NET AWS Azure Cloud Docker Google Cloud Platform Grafana JavaScript Kubernetes Prometheus React Splunk .NET View More DevOps Jobs 🌐 Worldwide Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com Search Search Jobs by country Search jobs by city Search jobs by job title Search entry-level jobs Search junior-level jobs Search senior-level jobs Search jobs by tech stack Search jobs by contract type Search remote internships Search remote part-time jobs Remote jobs Anywhere in the World Companies Hiring Anywhere in the World Companies Hiring Sales People Anywhere in the World Companies Hiring Software Engineers Anywhere in the World Resources Advice Tips for finding remote jobs Interview questions and answers Resume examples Cover letter examples Post a job Affiliates Privacy policy Terms of service Job board SEO course AI Apply Copilot OpenClaw job finder Jobs by Country Remote jobs anywhere in the world (Worldwide remote jobs) Remote jobs United States Remote jobs Australia Remote jobs Brazil Remote jobs Canada Remote jobs France Remote jobs Ireland Remote jobs Germany Remote jobs Netherlands Remote jobs Spain Remote jobs UK Popular Jobs Remote data analyst jobs Remote customer support jobs Remote executive assistant jobs Remote marketing jobs Remote product designer jobs Remote product manager jobs Remote project manager jobs Remote recruiter jobs Remote sales jobs Remote software engineer jobs Jobs by Type Remote full-time jobs Remote part-time jobs Remote contract jobs Remote internship jobs Remote entry-level jobs Remote jobs with no experience required Remote junior jobs (1-3 years of experience) Digital nomad jobs Remote jobs with no degree required Freelance remote jobs Temporary remote jobs Remote jobs hiring now Stay at home mom jobs

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Site Reliability EngineeringDevOpsPlatform EngineeringSystems EngineeringLinux systems administrationKubernetesGitOpsPythonGoBash

Soft Skills

technical supporttroubleshootingcontinuous improvementoperational readinessincident responseroot cause analysisdocumentationcommunication

Certifications

Secret Security Clearance