
Machine Learning Operations Engineer
Mosai
full-time
Posted on:
Location Type: Remote
Location: Remote • Florida • 🇺🇸 United States
Visit company websiteJob Level
Mid-LevelSenior
Tech Stack
AWSAzureCloudDockerPython
About the role
- Design, build, and maintain scalable data pipelines supporting model training, inference, batch processing, and real-time analytics workflows.
- Monitor production ML pipelines to identify anomalies, performance degradations, or failures related to data quality, logic defects, or infrastructure issues.
- Execute rapid troubleshooting and root-cause analysis followed by timely remediation, validation, and full regression testing prior to redeployment.
- Collaborate with Data Science, Engineering, and Product teams to operationalize machine learning models—including LLM-based and MCP-orchestrated systems—ensuring seamless integration into production environments.
- Develop CI/CD workflows, model deployment strategies, and automated testing frameworks to support reliable, repeatable releases.
- Implement and maintain observability tooling (logging, monitoring, alerting) to ensure high availability and traceability of ML systems.
- Manage and optimize cloud infrastructure across Azure and AWS for compute, storage, orchestration, and security needs.
- Create and maintain documentation, runbooks, and best practices for model operations and system maintenance.
- Perform all other job-related duties as assigned.
Requirements
- Bachelor’s Degree in Computer Science, Engineering, Data Science, or equivalent work experience.
- 5–7 years of combined experience in Data Science, MLOps, Machine Learning Engineering, or related fields.
- Advanced proficiency in Python, Jupyter, and common ML/analytics frameworks.
- Hands-on experience with Snowflake or similar cloud data warehousing environments.
- Strong working knowledge of both Azure and AWS cloud platforms, including compute orchestration, networking, and security best practices.
- Demonstrated experience operationalizing traditional ML models as well as LLM-based and MCP-orchestrated systems.
- Experience with CI/CD tools, containerization (Docker), infrastructure-as-code, and ML pipeline frameworks.
- Strong ability to diagnose and resolve pipeline failures, data anomalies, and complex system issues.
- Excellent problem-solving skills, attention to detail, and a proactive, self-directed work ethic.
- Strong communication skills and comfort working in fast-paced, cross-functional environments.
Benefits
- Health insurance
- 401(k) plan
- Professional development opportunities
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
PythonJupyterML frameworksCI/CDDockerinfrastructure-as-codedata pipelinesmodel deploymentobservability toolingroot-cause analysis
Soft skills
problem-solvingattention to detailproactive work ethiccommunicationcollaboration
Certifications
Bachelor’s Degree in Computer ScienceBachelor’s Degree in EngineeringBachelor’s Degree in Data Science