AI Research Scientist – Reinforcement Learning

Snowflake

full-time

Posted on: 9/7/2025

Location: California • 🇺🇸 United States

✨ AI Apply

💰 $195,000 - $250,000 per year

Mid-LevelSenior

PythonPyTorchTensorflow

About the role

Conduct advanced research in reinforcement learning, focusing on RLHF, DPO, PPO, and multi-agent systems
Develop custom solutions for real-world practical applications and challenging problem domains
Develop advanced reasoning models to enhance logical and contextual understanding for tasks like code generation and structured decision-making
Develop and fine-tune large language models, exploring post-training optimization techniques
Create and curate post-training data from synthetic data processing or human-annotated data pipelines
Collaborate with cross-functional teams to integrate research findings into real-world applications and products
Publish high-quality research papers in top-tier conferences and journals (e.g., NeurIPS, ICML, ICLR, ACL)
Stay up-to-date with latest developments in AI, RL, and LLMs and identify opportunities for innovation

PhD in Computer Science, Machine Learning, Artificial Intelligence, or a closely related field (or equivalent research experience)
Deep understanding of reinforcement learning algorithms, including RLHF, DPO, PPO, and multi-agent systems
Strong expertise in large language models, including fine-tuning and post-training optimization techniques
Proven track record of publishing in top-tier AI conferences or journals (e.g., NeurIPS, ICML, ICLR, ACL)
Proficiency in programming languages and frameworks commonly used in AI research, such as Python, TensorFlow, or PyTorch
Excellent problem-solving skills, with a strong analytical and mathematical foundation
Ability to work both independently and collaboratively in a fast-paced research environment
Preferred: Experience with large-scale training and deployment of RL or LLM models
Preferred: Customer solution experience across vertical domains like healthcare and finance
Preferred: Familiarity with distributed computing and efficient training paradigms
Preferred: Experience developing reasoning models for mathematical problem-solving, code generation, or structured decision-making
Preferred: Background in human-computer interaction or related interdisciplinary fields