Tech Stack
AirflowAWSAzureCloudJenkinsKubernetesPySparkTerraform
About the role
- Lead and help grow our ML Platform team
- Design and implement scalable MLOps infrastructure for the ML lifecycle
- Build and maintain feature stores to ensure consistent feature management
- Automate ML workflows with CI/CD pipelines for models
- Implement resource management and orchestration using Kubernetes, Airflow, or similar
- Monitor and debug production ML systems with monitoring and alerting systems
- Collaborate with data scientists and server engineers to understand infrastructure needs
- Operationalize production ML models ensuring scalability, reliability, and monitoring
Requirements
- 4+ years of experience in MLOps, ML Engineering, or DevOps for machine learning
- Experience with machine learning algorithms and model development process
- Strong expertise in cloud platforms (AWS or Azure) for ML deployment
- Experience with CI/CD pipelines (e.g., GitHub Actions, Jenkins) for ML
- Strong knowledge of model monitoring tools (e.g., Evidently AI, MLflow)
- Hands-on experience with orchestration frameworks (e.g., Airflow, Kubeflow)
- Familiarity with Pyspark and Databricks is a plus
- Familiarity with infrastructure-as-code (Terraform, CloudFormation) is a plus
- Able to effectively leverage AI-powered development tools (e.g., Cursor, Augment, Factory) to enhance productivity, code quality, and collaboration
- Willingness to travel minimum 2 times a year for company offsites