Salary
💰 $185,000 - $210,000 per year
Tech Stack
AWSAzureCloudGoogle Cloud PlatformPythonPyTorchTensorflow
About the role
- Design, build and maintain infrastructure for deploying, monitoring and updating machine learning models at scale
- Automate end to end model pipelines from ingestion and preprocessing to training, validation and deployment
- Implement monitoring for model performance, accuracy, drift and latency
- Ensure ML systems are secure, cost efficient and scalable in production
- Document and continuously improve ML infrastructure, workflows and tooling
- Partner with ML engineers, scientists and product to move models seamlessly from research to production
- Apply software engineering best practices (testing, CI/CD, version control) to ML systems
Requirements
- 5+ years of experience in ML Ops including ownership of production ML systems
- Bachelor’s or Master’s in Computer Science or a related field
- Strong expertise in Python and ML/DS libraries (e.g. TensorFlow, PyTorch)
- Experience with machine learning lifecycle management tools
- Hands on experience deploying and monitoring deep learning models
- Strong knowledge of cloud platforms such as AWS, Azure, or GCP