
ML Ops Engineer
Zelis
full-time
Posted on:
Location Type: Remote
Location: New Jersey • United States
Visit company websiteExplore more
Salary
💰 $127,000 - $160,550 per year
About the role
- Build and maintain monitoring infrastructure for conventional machine learning models, with capabilities for performance tracking, drift detection, and alerting.
- Research, evaluate, and implement monitoring strategies and tools for Generative AI systems, including LLMs and Agentic AI architectures.
- Collaborate with ML Engineers, Data Scientists, and DevOps teams to deploy, manage, and monitor models in production.
- Develop and support scalable, secure, and automated data pipelines using Snowflake, SQL, and Python for training, serving, and monitoring ML and GenAI models.
- Leverage AutoML tools and frameworks (e.g., MLflow, Kubeflow, SageMaker Autopilot) to streamline experimentation and deployment.
- Design dashboards and reporting systems to visualize model health metrics and surface key operational insights.
- Ensure auditability, reproducibility, and compliance for model performance and data flow in production environments, with consideration for regulatory standards like GDPR and HIPAA.
- Maintain CI/CD workflows and version-controlled codebases (e.g., Git) for ML infrastructure and pipelines.
- Utilize containerization and orchestration technologies (e.g., Docker) to manage scalable ML infrastructure.
- Leverage tools such as Streamlit and Python visualization libraries to present insights from model and data monitoring.
- Perform root cause analyses on model degradation or data quality issues, and proactively implement improvements.
- Stay current on industry developments related to ML observability, model governance, responsible GenAI practices, and AI security.
- Contribute to analytics projects and data engineering initiatives as needed.
- Provide off-hours support for critical deployments or urgent data/model issues.
Requirements
- 2–5 years of experience in ML Ops, ML Engineering, or a related role with a focus on production-level model monitoring, automation, and deployment.
- Strong experience with ML observability tools or custom-built monitoring systems.
- Experience with monitoring LLMs and Generative AI models, including prompt evaluation, hallucination tracking, and agent behavior auditing.
- Experience in deploying and managing ML workloads using containerization and orchestration platforms such as Docker, Kubernetes, Kubeflow, or TensorFlow Extended.
- Familiarity with AutoML pipelines and workflow management tools (e.g., MLflow, SageMaker Autopilot).
- Experience working in cloud environments, preferably AWS (e.g., SageMaker, S3, Lambda, ECS/EKS).
- Understanding of ML lifecycle tools (e.g., MLflow, SageMaker Pipelines) and CI/CD practices.
- Strong security and compliance awareness, particularly related to model/data governance (e.g., HIPAA, GDPR).
- Proficiency in Python and key data libraries (Pandas, Numpy, Matplotlib, etc.).
- Advanced SQL skills and experience with Snowflake or similar data warehousing platforms.
- Proficiency with version control (Git) and agile development methodologies.
- Strong collaboration and communication skills, with the ability to explain technical issues to both technical and non-technical stakeholders.
- Bachelor’s degree in Computer Science, Engineering, Data Science, or a related field—or equivalent industry experience.
- Domain experience in healthcare data (claims, payments) is preferred.
Benefits
- 401k plan with employer match
- flexible paid time off
- holidays
- parental leaves
- life and disability insurance
- health benefits including medical, dental, vision, and prescription drug coverage
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
machine learningmonitoring infrastructuredata pipelinesPythonSQLAutoMLML observabilitycontainerizationorchestrationdata governance
Soft Skills
collaborationcommunicationproblem-solvingroot cause analysistechnical explanation