FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Staff Research Scientist – Reinforcement Learning
Thermo Fisher ScientificStaff Research Scientist at Centific designing AI-driven simulation systems for enterprises and training LLM agents. Leading efforts in reinforcement learning and shaping technical direction for a talented team.
Posted 6/10/2026full-timeRemote • California • 🇺🇸 United StatesLead💰 $200,000 - $250,000 per yearWebsite
Tech Stack
Tools & technologiesPython
About the role
Key responsibilities & impact- Design simulation environments and digital twins for enterprise workflows
- Post-train LLM agents using RLHF, DPO, GRPO, PPO, and emerging methods
- Build pipelines that convert human-labeled traces and verifiable signals into training data
- Architect multi-turn, tool-using agents with closed learning loops
- Design reward functions and verifiers that resist reward hacking and reflect real task outcomes
- Set the technical bar across the team — architecture, code review, engineering standards
- Mentor researchers and engineers; drive technical direction through influence
- Translate research into production; contribute to publications
Requirements
What you’ll need- 7+ years in ML/AI research or engineering; 3+ years at senior/staff level
- MS or PhD in Computer Science, Machine Learning, or related field (or equivalent)
- 5+ years hands-on RL — environment design, reward engineering, policy optimization — with at least one production deployment LLM Post-Training
- 3+ years fine-tuning LLMs with hands-on RL post-training (RLHF, DPO, GRPO, PPO)
- Expert-level implementation of RLHF pipelines, reward modeling (Bradley-Terry), DPO, and KTO
- Working knowledge of modern post-training and rollout-serving libraries (TRL, veRL, OpenRLHF, SkyRL)
- Experience building LLM-based agents: tool use, multi-turn reasoning, trajectory evaluation
- Strong Python and software engineering skills — comfortable building production pipelines, not just notebooks
- Deep expertise in MDPs, policy gradient methods (PPO, SAC), and temporal difference learning
- Hands-on experience with Gymnasium-based environments and reward engineering (sparse vs. dense)
Benefits
Comp & perks- N/A 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
machine learningartificial intelligencereinforcement learningreward engineeringpolicy optimizationLLM post-trainingreward modelingpolicy gradient methodstemporal difference learningenvironment design
Soft Skills
mentoringtechnical directioninfluencecode reviewengineering standards
Certifications
MS in Computer SciencePhD in Machine Learning