Tech Stack
AnsibleAWSAzureCloudDockerGrafanaKubernetesNoSQLPrometheusPythonPyTorchSQLTensorflowTerraform
About the role
- Drive operational excellence of Autodesk's AI/ML Platform by implementing and optimizing MLOps practices
- Design and implement automated deployment pipelines for machine learning models
- Collaborate to design, implement, and maintain scalable infrastructure for model training, inference, and data processing
- Develop and maintain monitoring and logging systems to track model performance and system health
- Work closely with data engineers to ensure efficient data pipelines for training and validation
- Implement version control systems for machine learning models and contribute to model governance practices
- Contribute to governance, trust, data privacy, and ethical considerations in AI/ML solutions
- Enforce security best practices and compliance standards across MLOps processes
- Identify opportunities for automation and optimization across the MLOps lifecycle
- Play a key role in troubleshooting, incident response, and system recovery
Requirements
- BS or MS in Computer Science, or related field
- 5+ years of hands-on experience in DevOps and MLOps, with a focus on deploying and managing machine learning models in production environments
- Proficiency in Infrastructure as Code (IaC) using tools such as Terraform or Ansible
- Strong expertise in containerization technologies (Docker, Kubernetes)
- Experience setting up and managing CI/CD pipelines for machine learning projects
- Strong scripting skills in Python, Bash, or similar languages for automation
- Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK Stack)
- Understanding of security best practices in MLOps, including data encryption, access controls, and compliance standards
- Excellent collaboration and communication skills
- Proven ability to troubleshoot and resolve complex operational issues
- Preferred: Experience with cloud platforms (especially AWS or Azure)
- Preferred: Familiarity with databases and data storage solutions (SQL, NoSQL, data lakes)
- Preferred: Exposure to machine learning frameworks (TensorFlow, PyTorch)
- Preferred: Experience with Git for version control and Jira for project management
- Preferred: Familiarity with Agile development methodologies