Salary
💰 $195,000 - $250,000 per year
Tech Stack
PythonPyTorchTensorflow
About the role
- Conduct advanced research in reinforcement learning, focusing on RLHF, DPO, PPO, and multi-agent systems
- Develop custom solutions for real-world practical applications and challenging problem domains
- Develop advanced reasoning models to enhance logical and contextual understanding for tasks like code generation and structured decision-making
- Develop and fine-tune large language models, exploring post-training optimization techniques
- Create and curate post-training data from synthetic data processing or human-annotated data pipelines
- Collaborate with cross-functional teams to integrate research findings into real-world applications and products
- Publish high-quality research papers in top-tier conferences and journals (e.g., NeurIPS, ICML, ICLR, ACL)
- Stay up-to-date with latest developments in AI, RL, and LLMs and identify opportunities for innovation
Requirements
- PhD in Computer Science, Machine Learning, Artificial Intelligence, or a closely related field (or equivalent research experience)
- Deep understanding of reinforcement learning algorithms, including RLHF, DPO, PPO, and multi-agent systems
- Strong expertise in large language models, including fine-tuning and post-training optimization techniques
- Proven track record of publishing in top-tier AI conferences or journals (e.g., NeurIPS, ICML, ICLR, ACL)
- Proficiency in programming languages and frameworks commonly used in AI research, such as Python, TensorFlow, or PyTorch
- Excellent problem-solving skills, with a strong analytical and mathematical foundation
- Ability to work both independently and collaboratively in a fast-paced research environment
- Preferred: Experience with large-scale training and deployment of RL or LLM models
- Preferred: Customer solution experience across vertical domains like healthcare and finance
- Preferred: Familiarity with distributed computing and efficient training paradigms
- Preferred: Experience developing reasoning models for mathematical problem-solving, code generation, or structured decision-making
- Preferred: Background in human-computer interaction or related interdisciplinary fields