Senior DevOps Engineer

TrueML

Senior DevOps Engineer focusing on cloud architecture and CI/CD at TrueML, enhancing infrastructure scalability and reliability. Engaging in hands-on technical execution and team collaboration.

Posted 6/9/2026full-timeRemote • 🇺🇸 United StatesSenior💰 $120,000 - $155,000 per yearWebsite

ATS Keywords

Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills

Infrastructure as Code (IaC)CI/CDcloud-native architectureAWSKubernetesDockerTerraformGitHub ActionsGitLab CIJenkins

Soft Skills

cost-optimizationincident managementproblem-solvingteam collaborationcommunication

Tools & Technologies

AIOpsDatadogObserveClineGitHub Copilot

Industry Keywords

High Availability (HA)Disaster Recovery (DR)Site Reliability Engineering (SRE)SLIsSLOsError Budgets

Tech Stack

Tools & technologies

AWSCloudDockerGoJenkinsKubernetesPythonTerraform

About the role

Key responsibilities & impact

Implement the technical roadmap for Infrastructure as Code (IaC), CI/CD evolution, and cloud-native architecture to support TrueML’s scaling needs.
Design, develop, and maintain self-service internal platforms to reduce developer cognitive load, enabling feature teams to deploy and manage services with minimal friction at increased velocity.
Act as a core steward for cloud spend (AWS), proactively identifying and driving cost-optimization initiatives across our infrastructure.
Build and maintain infrastructure architecture that supports strict High Availability (HA) requirements and robust Disaster Recovery (DR) protocols across multiple regions.
Implement and evolve comprehensive monitoring, logging, and distributed tracing systems, leveraging AIOps to move from reactive to predictive system maintenance.

Requirements

What you’ll need

Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
6+ years of experience in DevOps, Site Reliability Engineering (SRE), or Software Engineering, working within high-performing senior engineering teams.
Expert-level mastery with AWS and hands-on experience managing multi-region, high-availability deployments.
Advanced experience with Kubernetes (K8s) and Docker, including cluster management, networking, and scaling in production environments.
High proficiency in Terraform to drive consistency and automation across all infrastructure layers (Experience with Atlantis is a plus).
Deep experience designing and maintaining complex pipelines (GitHub Actions, GitLab CI, or Jenkins) and mastery of scripting languages like Python, Go, or Bash.
Hands-on experience with modern monitoring, observability, and tracing stacks (Datadog, Observe) and a firm grasp of SRE principles (SLIs/SLOs/Error Budgets).
Experience acting as an Incident Commander or critical responder for high-severity outages.
Experience integrating AI-assisted productivity tools (Cline, GitHub Copilot) into your engineering workflow to accelerate delivery, troubleshooting, and system monitoring.

Benefits

Comp & perks

Flexible vacation
Medical/dental/vision insurance
Traditional/Roth retirement savings options
Company-paid disability and life insurance
Flexible Spending Account & Limited FSA
Family-friendly parental leave, volunteer and voting time off
On-demand wellness platform access for you and 5 friends and family
PerkSpot discount program for 900+ merchants nationwide