Tech Stack
AWSAzureCloudGoogle Cloud PlatformPySparkPythonPyTorchScikit-LearnSQLTensorflow
About the role
- At Definitive Healthcare, lead design and implementation of cutting-edge AI/ML systems delivering transformative business outcomes
- Take ownership of end-to-end ML solutions from architecture and modeling to production and performance optimization
- Lead design and implementation of scalable, production-grade ML systems in cloud environments focused on performance, reliability, and reproducibility
- Collaborate with product managers and senior stakeholders to define and prioritize ML initiatives aligned with business goals
- Oversee architecture and evolution of data pipelines for multi-terabyte datasets, ensuring efficiency and reliability
- Guide development of high-impact features and label sets across healthcare and consumer analytics domains
- Lead experimentation strategy including A/B tests, advanced validation methods, and lifecycle management using tools like MLflow and Databricks
- Drive continual model improvement through automated retraining, model decay analysis, and bias mitigation
- Champion rapid prototyping and proof-of-concept development to evaluate emerging technologies and ML techniques
- Lead technical explorations into new ML architectures (e.g., foundation models, causal inference, time series deep learning)
- Serve as technical leader and trusted advisor across product, engineering, data, and executive teams
- Set standards for code quality, performance, and documentation and mentor junior engineers in best practices
Requirements
- Bachelor’s or Master’s degree in Computer Science, Machine Learning, Data Science, or a related field (or equivalent practical experience)
- 5+ years of industry experience as an ML Engineer, Data Scientist, or Data Engineer, with a focus on deploying and scaling ML systems
- Deep expertise in Python, SQL, and PySpark for distributed data processing
- Proficiency in libraries like scikit-learn, PyTorch, and XGBoost
- Proven experience designing robust ML pipelines, leveraging tools like MLflow or equivalent
- Strong command of ML frameworks (e.g., scikit-learn, TensorFlow, XGBoost, PyTorch)
- Hands-on experience deploying models in cloud-based environments (AWS, GCP, Azure, and Databricks)
- Proven ability to manage end-to-end ML lifecycles at scale, including data ingestion, training, evaluation, deployment, and monitoring
- Excellent communication skills and demonstrated ability to influence cross-functional teams
- Experience working with healthcare claims, EHR, or life sciences datasets (preferred)
- Advanced degree (M.S. or Ph.D.) in Computer Science, Data Science, or related technical field (preferred)
- Strong knowledge of MLOps practices including CI/CD for ML, automated retraining, and model versioning (preferred)
- Experience with deep learning architectures for time series forecasting, sequential data, or hierarchical modeling (preferred)
- Proficient in designing evaluation protocols and defining performance metrics to rigorously assess model effectiveness (preferred)
- Comfortable operating in fast-paced, high-ownership environments and prioritizing multiple high-impact projects (preferred)