Salary
💰 $160,000 - $299,000 per year
Tech Stack
Distributed SystemsPythonPyTorch
About the role
- Designing and implementing post-training algorithms for LLMs and DLMs
- Driving efficiency and scalability improvements across training pipelines and serving systems
- Collaborating with researchers to translate cutting-edge ideas into production-ready implementations
- Exploring new paradigms for evaluation
- Demonstrating strong engineering practices and contributing to open-source communities
- Working with researchers and engineers to build next-generation generative AI systems
Requirements
- PhD in Computer Science, Electrical Engineering, or related field, or equivalent research experience in LLMs, systems, or related areas
- 2+ years of experiences in machine learning, systems, distributed computing, or large-scale model training
- Proficiency in Python with hands-on experience in frameworks such as PyTorch
- Solid background in computer science fundamentals: algorithms, data structures, parallel/distributed computing, and systems programming
- Proven ability to collaborate across research and engineering teams in multifaceted environments
- (Nice to have) Expertise in post-training LLMs with novel algorithmic/data pipelines
- (Nice to have) Experience developing and scaling large distributed systems for deep learning
- (Nice to have) Contributions to open-source LLM systems or large-scale AI infrastructure