K1X

Machine Learning Operations Engineer

K1X

full-time

Posted on:

Location Type: Remote

Location: IllinoisUnited States

Visit company website

Explore more

AI Apply
Apply

About the role

  • Design and build scalable ML infrastructure to support model training, evaluation, and deployment.
  • Develop and maintain containerized environments using Docker and Kubernetes.
  • Build and manage distributed training pipelines and orchestration workflows.
  • Implement and maintain ML lifecycle tooling such as MLflow for experiment tracking and reproducibility.
  • Own production inference systems, including NVIDIA Triton Inference Server.
  • Design and operate low-latency, high-availability model serving architectures.
  • Implement CI/CD pipelines for ML deployment, versioning, and rollback strategies.
  • Build and maintain data pipelines integrated with Snowflake and related data systems.
  • Implement monitoring, logging, and alerting for model performance, drift detection, and system health.
  • Partner with ML Engineers to improve developer experience and accelerate delivery.

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or equivalent experience.
  • 5+ years of experience in software engineering, DevOps, or MLOps roles.
  • Strong proficiency in Python and experience building production-grade systems.
  • Hands-on experience with Docker, Kubernetes, and distributed systems.
  • Experience building and maintaining CI/CD pipelines.
  • Familiarity with ML lifecycle tools such as MLflow or similar.
  • Experience working with cloud-based data platforms such as Snowflake.
  • Strong understanding of system design, APIs, and microservices architectures.
  • Proven debugging and troubleshooting ability across distributed systems.
  • Experience managing inference infrastructure such as NVIDIA Triton Inference Server.
  • Experience building large-scale training infrastructure including GPU workloads and distributed training.
  • Familiarity with feature stores, data versioning, and experiment tracking systems.
  • Experience supporting NLP or document processing pipelines.
  • Exposure to observability tools such as Prometheus, Grafana, or similar.
  • Experience working in SaaS environments with high availability, productivity, and performance requirements.
  • A strong bias toward automation, scalability, and continuous improvement.
  • A collaborative mindset and ability to work cross-functionally with engineering and data teams.
Benefits
  • Unlimited Vacation Policy + Sick Time
  • Fully Remote Opportunity
  • Benefits/401K
  • Growing Startup Culture
  • Unlimited Vacation Policy + Sick Time + Holidays
  • Paid Parental Leave
  • Fully Remote Opportunity
  • Healthcare Benefits and 401K
  • Growing Startup Culture
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
machine learning infrastructurePythonDockerKubernetesCI/CD pipelinesMLflowNVIDIA Triton Inference Serverdistributed systemsdata pipelinesGPU workloads
Soft Skills
collaborative mindsetdebuggingtroubleshootingstrong bias toward automationcontinuous improvementcross-functional teamworkstrong understanding of system designcommunicationorganizational skillsleadership
Certifications
Bachelor’s degree in Computer ScienceMaster’s degree in Engineering