
Senior ML Ops Engineer
Fortive
full-time
Posted on:
Location Type: Remote
Location: India
Visit company websiteExplore more
Job Level
Tech Stack
About the role
- Own the ML platform infrastructure (training, serving, feature store, model registry) with attention to cost, reliability, and security.
- Lead initiatives to improve observability and monitoring for data pipelines and ML services, including data drift, model performance, and latency.
- Provide on-call support for production ML services; drive incident management and disaster recovery for ML workloads.
- Lead the implementation and adoption of MLOps automation (CI/CD for ML, model packaging, deployment, rollback, and retraining orchestration).
- Partner with Data Science and Engineering to improve reproducibility, experiment tracking, and model governance (versioning, lineage, approvals).
- Establish quality gates for datasets, features, and models (tests, validation, bias/risk checks) before promotion to production.
- Drive platform and tooling improvements (build in-house frameworks, templates, and reusable components to accelerate ML delivery).
- Champion Responsible AI practices: auditability, explainability, access controls, and compliance processes.
- Implement and maintain model monitoring systems to track:
- Prediction accuracy and performance metrics over time.
- Data drift and concept drift detection to trigger retraining workflows.
- Latency and resource utilization for inference services.
- Alerts and dashboards for anomalies, failures, and SLA breaches.
- Develop automated retraining and rollback strategies based on monitoring insights
Requirements
- 8+ years of experience in Cloud Ops
- 3+ years of experience in MLOps, ML engineering, or platform/DevOps roles supporting ML in production.
- Proficient with containerization and orchestration: Docker, Kubernetes.
- Experience building CI/CD pipelines for ML (GitHub Actions, GitLab CI, Jenkins).
- Proficient with ML lifecycle tooling: MLflow, Kubeflow, TFX, model registries.
- Strong Python skills and familiarity with ML frameworks (TensorFlow, PyTorch).
- Experience deploying online/batch inference services and optimizing for latency and throughput.
- Proficient with cloud platforms (AWS/GCP/Azure) and managed ML services.
- Knowledge of data engineering foundations: feature stores, data validation, lineage.
- Experience with observability: logs, metrics, traces (Prometheus, Grafana) and model/data drift monitoring.
- Solid understanding of security and governance for ML systems.
- Bachelor’s/Master’s degree in Computer Science, Engineering, Data Science, or related fields.
- Experience with infrastructure as code (Terraform, CloudFormation).
- Familiarity with feature stores and data quality frameworks.
- Hands-on with real-time/streaming data and online feature serving.
- Experience with model explainability and Responsible AI risk checks.
- Certifications in cloud ML services or Kubernetes are a plus.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
MLOpsML engineeringCloud OpsPythonDockerKubernetesCI/CDMLflowKubeflowTensorFlow
Soft skills
leadershipcommunicationincident managementdisaster recoverycollaborationproblem-solvingobservabilityquality assuranceresponsible AI practicesmonitoring
Certifications
cloud ML servicesKubernetes