Develop SOTA models across different modalities (text, image, speech) and apply them across diverse use cases and domains
Run pre-training, post-training and deploy state-of-the-art models on clusters with thousands of GPUs
Generate and curate data for pre-training and post-training and work on model evaluations to ensure performance exceeds expectations
Develop tools and frameworks to facilitate data generation, model training, evaluation and deployment
Collaborate with cross-functional teams to tackle complex use cases using agents and RAG pipelines
Manage research projects and communications with client research teams
Deliver high-impact AI solutions by working with external and internal science, engineering, and product teams
Requirements
Fluent in English with excellent communication skills
Expert with PyTorch or JAX
Proficient in writing clean, readable, high-performance, fault-tolerant Python code
Comfortable contributing to and navigating a large codebase independently
Experience running pre-training/post-training and deploying models on large GPU clusters (handling OOM, NCCL issues)
Experience generating and curating data for pre-training and post-training, and performing evaluations
Ability to develop tools and frameworks for data generation, model training, evaluation and deployment
Experience collaborating with cross-functional teams and managing client research communications
Track record of success through personal projects, professional projects or academia
Preferred: PhD or Master in Mathematics, Physics, Machine Learning, Computer Science & Engineering (but exceptional candidates from other backgrounds encouraged to apply)
Nice-to-have: research experience in agents, multi-modality, robotics, diffusion, time-series
Nice-to-have: contributions to a large codebase (open source or industry)
Nice-to-have: publications in top academic journals or conferences