Senior Manager, DevOps

TrueML

Senior Manager, DevOps leading infrastructure and platform engineering efforts at TrueML. Focus on cloud architecture and CI/CD standards for machine learning-driven products.

Posted 4/21/2026full-timeRemote • California • 🇺🇸 United StatesSenior💰 $150,000 - $220,000 per yearWebsite

Tech Stack

Tools & technologies

AWSCloudDockerGoJenkinsKubernetesPythonTerraform

About the role

Key responsibilities & impact

Define and execute the long-term strategic vision for Infrastructure as Code (IaC), CI/CD evolution, and cloud-native architecture to support TrueML’s scaling needs.
Lead the design and implementation of self-service internal platforms to reduce developer cognitive load, enabling feature teams to deploy and manage services with minimal friction at increased velocity.
Act as the primary stakeholder for cloud spend (AWS); drive cost-optimization initiatives and lead contract negotiations for the DevOps toolstack and third-party vendors.
Ensure the infrastructure architecture supports strict High Availability (HA) requirements and robust Disaster Recovery (DR) protocols, maintaining system integrity across multiple regions.
Oversee the implementation and evolution of comprehensive monitoring, logging, and distributed tracing systems, leveraging AIOps to move from reactive to predictive system maintenance.
Champion security by design by integrating automated vulnerability scanning, secret management, and compliance checks directly into the automated build pipelines.
Serve as the ultimate escalation point for major production outages, facilitating blameless post-mortem reviews that focus on systemic improvements rather than individual error.
Maintain deep technical currency in container orchestration (Kubernetes), serverless patterns, and modern automation frameworks to provide meaningful mentorship and architectural guidance to senior engineering staff.

Requirements

What you’ll need

Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
10+ years of experience in DevOps, Site Reliability Engineering (SRE), or Software Engineering; 5+ years of experience managing engineers
Expert-level mastery with AWS and experience managing multi-region, high-availability deployments
Advanced experience with Kubernetes (K8s) and Docker, including cluster management, networking, and scaling in a production environment.
Proficiency in Terraform to drive consistency and automation across all infrastructure layers. Experience with Atlantis is a plus.
Deep experience designing and maintaining complex pipelines (GitHub Actions, GitLab CI, or Jenkins) and mastery of scripting languages like Python, Go, or Bash.
Hands-on experience with modern monitoring, observability, and tracing stacks (Datadog, Observe) and a firm grasp of SRE principles (SLIs/SLOs/Error Budgets).
Experience acting as an Incident Commander for high-severity outages and fostering a "blameless" post-mortem culture.
Demonstrated ability to influence executive leadership and collaborate cross-functionally with Product, Engineering, and Security teams.
Experience integrating AI-assisted productivity tools (Cline, GitHub Copilot) into the engineering workflow to accelerate delivery.

Benefits

Comp & perks

🌐 Worldwide ❌ Jobs You've Hidden ⭐️ Saved Jobs ✅ Applied Jobs ✉️ Email Alerts 👤 Account TrueML Website LinkedIn All Job Openings 51 - 200 employees 💳 Fintech 💸 Finance 👥 B2C Fintech
Finance
B2C TrueML is a leading company in the fintech sector, known for its innovative solutions that prioritize customer experience in the financial services industry. The company, along with its family of companies like TrueAccord, focuses on developing intelligent, digital-first communication platforms and products that revolutionize the consumer experience in financial health management. TrueML leverages the expertise of a dynamic team of data scientists, financial services experts, and customer experience specialists to create technology that addresses roadblocks to consumers' financial well-being, ensuring inclusivity and accessibility in financial systems. Founded in 2013 by Ohad Samet, TrueML continues to disrupt traditional financial services by making them more consumer-friendly and effective. Senior Manager, DevOps Job not on LinkedIn 🔥 48 minutes ago 🏄 California – Remote 💵 $150k - $220k / year ⏰ Full Time 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) AWS Cloud Docker Jenkins Kubernetes Python Terraform Go Apply Now Find Hiring Managers Customize resume for this job Report problem ☆ Save ☑️ Mark as applied ❌ Hide 📋 Description
Define and execute the long-term strategic vision for Infrastructure as Code (IaC), CI/CD evolution, and cloud-native architecture to support TrueML’s scaling needs.
Lead the design and implementation of self-service internal platforms to reduce developer cognitive load, enabling feature teams to deploy and manage services with minimal friction at increased velocity.
Act as the primary stakeholder for cloud spend (AWS); drive cost-optimization initiatives and lead contract negotiations for the DevOps toolstack and third-party vendors.
Ensure the infrastructure architecture supports strict High Availability (HA) requirements and robust Disaster Recovery (DR) protocols, maintaining system integrity across multiple regions.
Oversee the implementation and evolution of comprehensive monitoring, logging, and distributed tracing systems, leveraging AIOps to move from reactive to predictive system maintenance.
Champion security by design by integrating automated vulnerability scanning, secret management, and compliance checks directly into the automated build pipelines.
Serve as the ultimate escalation point for major production outages, facilitating blameless post-mortem reviews that focus on systemic improvements rather than individual error.
Maintain deep technical currency in container orchestration (Kubernetes), serverless patterns, and modern automation frameworks to provide meaningful mentorship and architectural guidance to senior engineering staff. 🎯 Requirements
Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
10+ years of experience in DevOps, Site Reliability Engineering (SRE), or Software Engineering; 5+ years of experience managing engineers
Expert-level mastery with AWS and experience managing multi-region, high-availability deployments
Advanced experience with Kubernetes (K8s) and Docker, including cluster management, networking, and scaling in a production environment.
Proficiency in Terraform to drive consistency and automation across all infrastructure layers. Experience with Atlantis is a plus.
Deep experience designing and maintaining complex pipelines (GitHub Actions, GitLab CI, or Jenkins) and mastery of scripting languages like Python, Go, or Bash.
Hands-on experience with modern monitoring, observability, and tracing stacks (Datadog, Observe) and a firm grasp of SRE principles (SLIs/SLOs/Error Budgets).
Experience acting as an Incident Commander for high-severity outages and fostering a "blameless" post-mortem culture.
Demonstrated ability to influence executive leadership and collaborate cross-functionally with Product, Engineering, and Security teams.
Experience integrating AI-assisted productivity tools (Cline, GitHub Copilot) into the engineering workflow to accelerate delivery. Apply Now 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score Similar Jobs DevOps Engineer 🔥 1 hour ago Sweed POS 11 - 50 🛒 Retail 🛍️ eCommerce 🤝 B2B Website LinkedIn All Job Openings DevOps Engineer optimizing infrastructure and implementing automation for Sweed's cannabis retail platform. Collaborate with global teams to enhance development and deployment processes. 🇺🇸 United States – Remote ⏰ Full Time 🟡 Mid-level 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) Cloud Docker Jenkins Kubernetes Linux Python Terraform Go Deployment Engineer 🔥 1 hour ago Cyngn 51 - 200 🚗 Transport ☁️ SaaS 🔧 Hardware Website LinkedIn All Job Openings Deployment Engineer optimizing autonomy for Cyngn's autonomous robotic systems deployed across North America. Leading on-site deployments and ensuring customer satisfaction in a diverse team environment. 🇺🇸 United States – Remote 💵 $100k - $125k / year 💰 $20M Post-IPO Equity on 2022-04 ⏰ Full Time 🟡 Mid-level 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) 🦅 H1B Visa Sponsor Grafana Linux Site Reliability Engineer 🔥 1 hour ago Clarity Innovations, Inc. 11 - 50 📚 Education 🤝 B2B ☁️ SaaS Website LinkedIn All Job Openings Site Reliability Engineer focusing on data at Clarity Innovations, enhancing observability and automation for cloud-based systems. Collaborating with teams to improve product reliability and performance. 🇺🇸 United States – Remote ⏰ Full Time 🟡 Mid-level 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) AWS Cloud Jenkins Prometheus SQL Site Reliability Engineer 🔥 1 hour ago RunPod 51 - 200 🤖 Artificial Intelligence ☁️ SaaS Website LinkedIn All Job Openings Site Reliability Engineer ensuring stability and resilience of Runpod's AI systems platform. Collaborating with engineering teams to improve observability and prevent incidents. 🇺🇸 United States – Remote 💵 $150k - $200k / year 💰 Seed Round on 2024-05 ⏰ Full Time 🟡 Mid-level 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) Distributed Systems Grafana Linux Prometheus Python Go Senior Technical Manager – Site Reliability Engineering 🔥 2 hours ago Coalfire 1001 - 5000 🔒 Cybersecurity 📋 Compliance 🏢 Enterprise Website LinkedIn All Job Openings Technical Senior Manager of SRE leading engineering tasks and ensuring operational excellence for cybersecurity client infrastructures. Join a mission-driven team at Coalfire. 🇺🇸 United States – Remote 💵 $94k - $163k / year ⏰ Full Time 🟠 Senior ⛑ DevOps & Site Reliability Engineer (SRE) 🦅 H1B Visa Sponsor Ansible AWS Azure Cloud Google Cloud Platform Terraform View More DevOps Jobs 🌐 Worldwide Built by Lior Neu-ner. I'd love to hear your feedback — Get in touch via DM or support@remoterocketship.com Search Search Jobs by country Search jobs by city Search jobs by job title Search entry-level jobs Search junior-level jobs Search senior-level jobs Search jobs by tech stack Search jobs by contract type Search remote internships Search remote part-time jobs Remote jobs Anywhere in the World Companies Hiring Anywhere in the World Companies Hiring Sales People Anywhere in the World Companies Hiring Software Engineers Anywhere in the World Resources Advice Tips for finding remote jobs Interview questions and answers Resume examples Cover letter examples Post a job Affiliates Privacy policy Terms of service Job board SEO course AI Apply Copilot OpenClaw job finder Jobs by Country Remote jobs anywhere in the world (Worldwide remote jobs) Remote jobs United States Remote jobs Australia Remote jobs Brazil Remote jobs Canada Remote jobs France Remote jobs Ireland Remote jobs Germany Remote jobs Netherlands Remote jobs Spain Remote jobs UK Popular Jobs Remote data analyst jobs Remote customer support jobs Remote executive assistant jobs Remote marketing jobs Remote product designer jobs Remote product manager jobs Remote project manager jobs Remote recruiter jobs Remote sales jobs Remote software engineer jobs Jobs by Type Remote full-time jobs Remote part-time jobs Remote contract jobs Remote internship jobs Remote entry-level jobs Remote jobs with no experience required Remote junior jobs (1-3 years of experience) Digital nomad jobs Remote jobs with no degree required Freelance remote jobs Temporary remote jobs Remote jobs hiring now Stay at home mom jobs

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Infrastructure as Code (IaC)CI/CDcloud-native architectureAWSKubernetesDockerTerraformGitHub ActionsGitLab CIJenkins

Soft Skills

leadershipmentorshipcollaborationinfluenceincident managementblameless post-mortem culturecost optimizationstrategic visioncommunicationproblem-solving