
Principal MLOps Engineer
Raft
full-time
Posted on:
Location Type: Remote
Location: Colorado • Florida • United States
Visit company websiteExplore more
Salary
💰 $150,000 - $200,000 per year
Job Level
About the role
- Design, build, and maintain secure, scalable MLOps infrastructure and deployment pipelines for production ML systems
- Help mature Raft’s internal ML platform and model lifecycle capabilities, including model packaging, registry/catalog workflows, deployment, monitoring, and operational support
- Deploy and manage machine learning workloads on Kubernetes, including GPU-enabled clusters
- Support model serving and inference infrastructure for a range of ML use cases, including traditional ML, computer vision, speech/audio, and LLM-based systems
- Build and maintain CI/CD workflows for ML services, model artifacts, and platform components
- Partner closely with ML engineers, software engineers, and product teams to move models from experimentation to reliable operational deployment
- Improve observability, reliability, security, and maintainability across ML infrastructure and services
- Help evaluate and standardize runtime patterns, serving frameworks, and deployment architectures for production ML workloads
- Contribute to infrastructure decisions across edge, on-prem, and cloud-hosted deployment environments
- Support compliance-driven deployment practices and secure software supply chain requirements in defense environments
- Get hands-on with customers at the most forward-leaning places in the Department of War
Requirements
- 7+ years of relevant hands-on experience in software engineering, platform engineering, DevOps, MLOps, or related technical roles
- 5+ years of experience with Docker and Kubernetes in production environments
- 5+ years of experience supporting enterprise cloud infrastructure or applications in AWS, Azure, or similar environments
- Strong experience provisioning, operating, and troubleshooting Kubernetes clusters in production
- Experience building and maintaining machine learning platforms, infrastructure, or pipelines used by engineering or data science teams
- Practical experience deploying machine learning workloads on Kubernetes
- Experience managing clusters or workloads that use GPUs
- Strong understanding of Helm and Kubernetes deployment patterns
- Strong scripting or programming skills, preferably in Python
- Experience with modern software engineering practices including Git, CI/CD, DevOps, and Agile/Scrum workflows
- Strong troubleshooting, systems thinking, and communication skills
- Ability to work independently and collaboratively in a fast-moving environment
- Ability to obtain and maintain a Top Secret clearance
- Ability to obtain Security+ certification within the first 90 days of employment.
Benefits
- Highly competitive salary
- Fully covered healthcare, dental, and vision coverage
- 401(k) and company match
- Take as you need PTO + 11 paid holidays
- Education & training benefits
- Annual budget for your tech/gadgets needs
- Monthly box of yummy snacks to eat while doing meaningful work
- Remote, hybrid, and flexible work options
- Team off-site in fun places!
- Generous Referral Bonuses
- And More!
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
MLOpsKubernetesDockerAWSAzurePythonCI/CDDevOpsmachine learningHelm
Soft Skills
troubleshootingsystems thinkingcommunicationindependent workcollaborationadaptabilityproblem-solvingorganizational skillsleadershipcustomer engagement
Certifications
Top Secret clearanceSecurity+