Mosai

Machine Learning Operations Engineer

Mosai

full-time

Posted on:

Location Type: Remote

Location: Remote • Florida • 🇺🇸 United States

Visit company website
AI Apply
Apply

Job Level

Mid-LevelSenior

Tech Stack

AWSAzureCloudDockerPython

About the role

  • Design, build, and maintain scalable data pipelines supporting model training, inference, batch processing, and real-time analytics workflows.
  • Monitor production ML pipelines to identify anomalies, performance degradations, or failures related to data quality, logic defects, or infrastructure issues.
  • Execute rapid troubleshooting and root-cause analysis followed by timely remediation, validation, and full regression testing prior to redeployment.
  • Collaborate with Data Science, Engineering, and Product teams to operationalize machine learning models—including LLM-based and MCP-orchestrated systems—ensuring seamless integration into production environments.
  • Develop CI/CD workflows, model deployment strategies, and automated testing frameworks to support reliable, repeatable releases.
  • Implement and maintain observability tooling (logging, monitoring, alerting) to ensure high availability and traceability of ML systems.
  • Manage and optimize cloud infrastructure across Azure and AWS for compute, storage, orchestration, and security needs.
  • Create and maintain documentation, runbooks, and best practices for model operations and system maintenance.
  • Perform all other job-related duties as assigned.

Requirements

  • Bachelor’s Degree in Computer Science, Engineering, Data Science, or equivalent work experience.
  • 5–7 years of combined experience in Data Science, MLOps, Machine Learning Engineering, or related fields.
  • Advanced proficiency in Python, Jupyter, and common ML/analytics frameworks.
  • Hands-on experience with Snowflake or similar cloud data warehousing environments.
  • Strong working knowledge of both Azure and AWS cloud platforms, including compute orchestration, networking, and security best practices.
  • Demonstrated experience operationalizing traditional ML models as well as LLM-based and MCP-orchestrated systems.
  • Experience with CI/CD tools, containerization (Docker), infrastructure-as-code, and ML pipeline frameworks.
  • Strong ability to diagnose and resolve pipeline failures, data anomalies, and complex system issues.
  • Excellent problem-solving skills, attention to detail, and a proactive, self-directed work ethic.
  • Strong communication skills and comfort working in fast-paced, cross-functional environments.
Benefits
  • Health insurance
  • 401(k) plan
  • Professional development opportunities

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
PythonJupyterML frameworksCI/CDDockerinfrastructure-as-codedata pipelinesmodel deploymentobservability toolingroot-cause analysis
Soft skills
problem-solvingattention to detailproactive work ethiccommunicationcollaboration
Certifications
Bachelor’s Degree in Computer ScienceBachelor’s Degree in EngineeringBachelor’s Degree in Data Science