Lead the design, development, and maintenance of scalable ML infrastructure on AWS, utilizing services like AWS SageMaker for model training and deployment.
Collaborate with product teams to develop MVPs for AI-driven features and enable rapid iteration and market testing.
Create and enhance monitoring and alerting frameworks to ensure high performance, reliability, and minimal downtime of ML models.
Enable cross-departmental use of AI/ML models, including Generative AI solutions, for various organizational use cases.
Provide production support: debug and resolve issues related to ML models in production and participate in on-call rotations.
Design and scale ML architecture to support rapid user growth, optimizing for robustness and cost-effectiveness.
Mentor team members, conduct code reviews, and elevate overall team capabilities through knowledge sharing and collaboration.
Stay updated with the latest advancements in machine learning technologies and AWS services and drive adoption of cutting-edge solutions.
Requirements
Bachelor's degree in Computer Science, Computer Engineering, Machine Learning, Statistics, Physics, or a relevant technical field, or equivalent practical experience.
At least 6+ years of experience in machine learning engineering, with demonstrated success in deploying scalable ML models in a production environment.
Deep expertise in one or more of the following areas: machine learning, recommendation systems, pattern recognition, data mining, artificial intelligence, or related technical fields.
Proven track record of developing machine learning models from inception to business impact.
Proficiency with Python (required); experience with Golang is a plus.
Demonstrated technical leadership in guiding teams, owning end-to-end projects, and setting technical direction.
Experience working with relational databases, data warehouses, and using SQL to explore them.
Strong familiarity with AWS cloud services, especially AWS SageMaker, for deploying and scaling ML solutions.
Knowledge of Kubernetes, Docker, and CI/CD pipelines for efficient deployment and management of ML models.
Comfortable with monitoring and observability tools tailored for ML models (e.g., Prometheus, Grafana, AWS CloudWatch).
Experience developing recommender systems or enhancing user experiences through personalized recommendations.
Solid foundation in data processing and pipeline frameworks (e.g., Apache Spark, Kafka) for handling real-time data streams.
Willingness to provide production support and participate in on-call rotations for operational troubleshooting and incident resolution.
ATS Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Bachelor's degree in Computer ScienceBachelor's degree in Computer EngineeringBachelor's degree in Machine LearningBachelor's degree in StatisticsBachelor's degree in Physics