Machine Learning Operations Engineer

K1X

full-time

Posted on: 4/9/2026

Location Type: Remote

Location: Illinois • United States

Visit company website

Explore more

Machine Learning Engineer jobs

✨ AI Apply

Apply

Job Level

Mid-Level Senior

Tech Stack

Cloud Distributed Systems Docker Grafana Kubernetes Microservices Prometheus Python

About the role

Design and build scalable ML infrastructure to support model training, evaluation, and deployment.
Develop and maintain containerized environments using Docker and Kubernetes.
Build and manage distributed training pipelines and orchestration workflows.
Implement and maintain ML lifecycle tooling such as MLflow for experiment tracking and reproducibility.
Own production inference systems, including NVIDIA Triton Inference Server.
Design and operate low-latency, high-availability model serving architectures.
Implement CI/CD pipelines for ML deployment, versioning, and rollback strategies.
Build and maintain data pipelines integrated with Snowflake and related data systems.
Implement monitoring, logging, and alerting for model performance, drift detection, and system health.
Partner with ML Engineers to improve developer experience and accelerate delivery.

Requirements

Bachelor’s or Master’s degree in Computer Science, Engineering, or equivalent experience.
5+ years of experience in software engineering, DevOps, or MLOps roles.
Strong proficiency in Python and experience building production-grade systems.
Hands-on experience with Docker, Kubernetes, and distributed systems.
Experience building and maintaining CI/CD pipelines.
Familiarity with ML lifecycle tools such as MLflow or similar.
Experience working with cloud-based data platforms such as Snowflake.
Strong understanding of system design, APIs, and microservices architectures.
Proven debugging and troubleshooting ability across distributed systems.
Experience managing inference infrastructure such as NVIDIA Triton Inference Server.
Experience building large-scale training infrastructure including GPU workloads and distributed training.
Familiarity with feature stores, data versioning, and experiment tracking systems.
Experience supporting NLP or document processing pipelines.
Exposure to observability tools such as Prometheus, Grafana, or similar.
Experience working in SaaS environments with high availability, productivity, and performance requirements.
A strong bias toward automation, scalability, and continuous improvement.
A collaborative mindset and ability to work cross-functionally with engineering and data teams.

Benefits

Unlimited Vacation Policy + Sick Time
Fully Remote Opportunity
Benefits/401K
Growing Startup Culture
Unlimited Vacation Policy + Sick Time + Holidays
Paid Parental Leave
Fully Remote Opportunity
Healthcare Benefits and 401K
Growing Startup Culture

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

machine learning infrastructurePythonDockerKubernetesCI/CD pipelinesMLflowNVIDIA Triton Inference Serverdistributed systemsdata pipelinesGPU workloads

Soft Skills

collaborative mindsetdebuggingtroubleshootingstrong bias toward automationcontinuous improvementcross-functional teamworkstrong understanding of system designcommunicationorganizational skillsleadership

Certifications

Bachelor’s degree in Computer ScienceMaster’s degree in Engineering