Raft

Principal MLOps Engineer

Raft

full-time

Posted on:

Location Type: Remote

Location: ColoradoFloridaUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $150,000 - $200,000 per year

Job Level

About the role

  • Design, build, and maintain secure, scalable MLOps infrastructure and deployment pipelines for production ML systems
  • Help mature Raft’s internal ML platform and model lifecycle capabilities, including model packaging, registry/catalog workflows, deployment, monitoring, and operational support
  • Deploy and manage machine learning workloads on Kubernetes, including GPU-enabled clusters
  • Support model serving and inference infrastructure for a range of ML use cases, including traditional ML, computer vision, speech/audio, and LLM-based systems
  • Build and maintain CI/CD workflows for ML services, model artifacts, and platform components
  • Partner closely with ML engineers, software engineers, and product teams to move models from experimentation to reliable operational deployment
  • Improve observability, reliability, security, and maintainability across ML infrastructure and services
  • Help evaluate and standardize runtime patterns, serving frameworks, and deployment architectures for production ML workloads
  • Contribute to infrastructure decisions across edge, on-prem, and cloud-hosted deployment environments
  • Support compliance-driven deployment practices and secure software supply chain requirements in defense environments
  • Get hands-on with customers at the most forward-leaning places in the Department of War

Requirements

  • 7+ years of relevant hands-on experience in software engineering, platform engineering, DevOps, MLOps, or related technical roles
  • 5+ years of experience with Docker and Kubernetes in production environments
  • 5+ years of experience supporting enterprise cloud infrastructure or applications in AWS, Azure, or similar environments
  • Strong experience provisioning, operating, and troubleshooting Kubernetes clusters in production
  • Experience building and maintaining machine learning platforms, infrastructure, or pipelines used by engineering or data science teams
  • Practical experience deploying machine learning workloads on Kubernetes
  • Experience managing clusters or workloads that use GPUs
  • Strong understanding of Helm and Kubernetes deployment patterns
  • Strong scripting or programming skills, preferably in Python
  • Experience with modern software engineering practices including Git, CI/CD, DevOps, and Agile/Scrum workflows
  • Strong troubleshooting, systems thinking, and communication skills
  • Ability to work independently and collaboratively in a fast-moving environment
  • Ability to obtain and maintain a Top Secret clearance
  • Ability to obtain Security+ certification within the first 90 days of employment.
Benefits
  • Highly competitive salary
  • Fully covered healthcare, dental, and vision coverage
  • 401(k) and company match
  • Take as you need PTO + 11 paid holidays
  • Education & training benefits
  • Annual budget for your tech/gadgets needs
  • Monthly box of yummy snacks to eat while doing meaningful work
  • Remote, hybrid, and flexible work options
  • Team off-site in fun places!
  • Generous Referral Bonuses
  • And More!
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
MLOpsKubernetesDockerAWSAzurePythonCI/CDDevOpsmachine learningHelm
Soft Skills
troubleshootingsystems thinkingcommunicationindependent workcollaborationadaptabilityproblem-solvingorganizational skillsleadershipcustomer engagement
Certifications
Top Secret clearanceSecurity+