Tech Stack
AirflowAWSAzureCloudDockerKafkaKubernetesPythonSparkTerraform
About the role
- Design, implement, and maintain Nuuly’s ML platform to support feature stores, training pipelines, inference services, and retraining workflows
- Develop automated pipelines and infrastructure for model deployment, configuration management, and reproducibility across environments
- Architect cloud-native solutions for distributed training, high-throughput batch scoring, and low-latency real-time inference
- Build systems to monitor model health, detect drift, track infrastructure performance, and integrate with alerting/incident response
- Manage containerized workloads (Docker, Kubernetes) and provision cloud infrastructure with Terraform, Helm, and related IaC tools
- Partner with data scientists, ML engineers, and data engineers to translate model requirements into scalable, reliable platform capabilities
- Evaluate and integrate emerging technologies in MLOps, cloud, and data infrastructure to advance platform efficiency and scalability
Requirements
- 4–6 years of relevant experience in ML platform engineering, MLOps, or cloud infrastructure for ML systems
- Strong proficiency in Python and software engineering best practices
- Deep expertise in Kubernetes, containerization, and cloud-native architectures
- Proven experience with CI/CD, orchestration, and ML lifecycle tools (e.g., Argo, Airflow, Kubeflow, MLflow)
- Hands-on experience with cloud platforms (Google Cloud, AWS, or Azure) and infrastructure-as-code (Terraform, Helm)
- Strong knowledge of observability practices: monitoring, logging, metrics, and alerting for ML/production systems
- Familiarity with distributed computing and streaming platforms (Kafka, Spark, Flink)
- Bonus: Experience implementing security, compliance, and cost-optimization strategies in ML platforms