Platform Engineer

• Design, improve, and operate MLOps pipelines for training, deploying, and managing ML models in production.
• Build and maintain CI/CD-style workflows for model packaging, versioning, and deployment across environments.
• Operate and optimise AWS-based infrastructure for AI workloads, including compute, storage, and networking components.
• Manage GPU-enabled workloads, addressing scalability, reliability, and cost-efficiency for high-load AI applications.
• Implement monitoring and alerting for deployed models, focusing on system health, performance, and operational stability.
• Own and evolve shared tooling such as MLflow, Docker-based workflows, and deployment frameworks to improve developer productivity.
• Work closely with infrastructure, SRE, and engineering teams to align AI platform practices with broader system standards.
• Support live AI services by diagnosing deployment, scaling, and infrastructure-related issues impacting AI features.
• Ensure reproducibility, traceability, and governance across the full ML lifecycle, from experimentation to production.

AI Platform Engineer – ML Ops

Job Level

Tech Stack

About the role

Requirements

Applicant Tracking System Keywords

Hard skills

Soft skills