Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Thermo Fisher Scientific

Staff Research Scientist – Reinforcement Learning

Thermo Fisher Scientific

Staff Research Scientist at Centific designing AI-driven simulation systems for enterprises and training LLM agents. Leading efforts in reinforcement learning and shaping technical direction for a talented team.

Posted 6/10/2026full-timeRemote • California • 🇺🇸 United StatesLead💰 $200,000 - $250,000 per yearWebsite

Tech Stack

Tools & technologies
Python

About the role

Key responsibilities & impact
  • Design simulation environments and digital twins for enterprise workflows
  • Post-train LLM agents using RLHF, DPO, GRPO, PPO, and emerging methods
  • Build pipelines that convert human-labeled traces and verifiable signals into training data
  • Architect multi-turn, tool-using agents with closed learning loops
  • Design reward functions and verifiers that resist reward hacking and reflect real task outcomes
  • Set the technical bar across the team — architecture, code review, engineering standards
  • Mentor researchers and engineers; drive technical direction through influence
  • Translate research into production; contribute to publications

Requirements

What you’ll need
  • 7+ years in ML/AI research or engineering; 3+ years at senior/staff level
  • MS or PhD in Computer Science, Machine Learning, or related field (or equivalent)
  • 5+ years hands-on RL — environment design, reward engineering, policy optimization — with at least one production deployment LLM Post-Training
  • 3+ years fine-tuning LLMs with hands-on RL post-training (RLHF, DPO, GRPO, PPO)
  • Expert-level implementation of RLHF pipelines, reward modeling (Bradley-Terry), DPO, and KTO
  • Working knowledge of modern post-training and rollout-serving libraries (TRL, veRL, OpenRLHF, SkyRL)
  • Experience building LLM-based agents: tool use, multi-turn reasoning, trajectory evaluation
  • Strong Python and software engineering skills — comfortable building production pipelines, not just notebooks
  • Deep expertise in MDPs, policy gradient methods (PPO, SAC), and temporal difference learning
  • Hands-on experience with Gymnasium-based environments and reward engineering (sparse vs. dense)

Benefits

Comp & perks
  • N/A 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
machine learningartificial intelligencereinforcement learningreward engineeringpolicy optimizationLLM post-trainingreward modelingpolicy gradient methodstemporal difference learningenvironment design
Soft Skills
mentoringtechnical directioninfluencecode reviewengineering standards
Certifications
MS in Computer SciencePhD in Machine Learning