Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Grupo Protege

Machine Learning Researcher – RL and Agentic Systems

Grupo Protege

Machine Learning Researcher specializing in RL and agentic systems for evaluating AI datasets. Working with cross-functional teams to enhance data quality and model performance.

Posted 5/28/2026full-timeRemote • 🇧🇷 BrazilMid-LevelSeniorWebsite

About the role

Key responsibilities & impact
  • Design and build datasets, tasks, and environments for benchmarking agentic systems and multi-step model behavior.
  • Translate real-world workflows into structured tasks, interaction traces, trajectories, stateful environments, and verifiable outcomes that can be used to evaluate advanced AI systems.
  • Develop frameworks that assess diversity, realism, coverage, fidelity, informativeness, and downstream usefulness of datasets for agentic systems.
  • Build quality scorecards and evaluation methods that make dataset strengths, weaknesses, and failure modes legible across teams.
  • Evaluate planning, tool use, robustness, recovery from failure, task completion, and generalization behavior in RL-style or agentic environments.
  • Connect model failures back to concrete dataset, environment, or task-design gaps and recommend improvements grounded in empirical evidence.
  • Contribute to tools and systems that automate dataset validation, environment generation, rollout analysis, benchmark construction, and evaluation workflows.
  • Improve internal infrastructure for reproducible experimentation, benchmark management, and evaluation quality.
  • Collaborate closely with research and engineering teams to identify data bottlenecks, improve evaluation methodology, and shape internal best practices around task-grounded AI training data.
  • Represent DataLab’s perspective in cross-functional discussions around dataset quality, benchmark design, and frontier agentic-system evaluation.

Requirements

What you’ll need
  • PhD or equivalent Master’s Degree + 4+ years industry experience in machine learning, computer science, statistics, engineering, mathematics, economics, or related quantitative fields.
  • Strong understanding of AI model training pipelines, evaluation methodology, and the role of data in shaping model performance.
  • Experience working with large, unstructured, or semi-structured datasets used to train or evaluate ML systems.
  • Experience with reinforcement learning, sequential decision-making, agentic systems, tool-using models, or multi-step model evaluation.
  • Experience designing tasks, benchmarks, environments, simulations, or evaluation frameworks for real-world model behavior.
  • Strong intuition for realism, coverage, difficulty, fidelity, and meaningful outcome structure in datasets.
  • Strong experimental design, evaluation, benchmarking, and data-validation skills.
  • High ownership and ability to independently identify and solve high-impact problems.

Benefits

Comp & perks
  • Health insurance
  • 401(k) matching
  • Paid time off
  • Remote work options

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
machine learningreinforcement learningexperimental designevaluation methodologybenchmarkingdata validationdataset designtask designmodel evaluationstatistical analysis
Soft Skills
problem solvingcollaborationcommunicationownershipintuitioncritical thinkingindependencecreativityattention to detailadaptability
Certifications
PhDMaster's Degree