Machine Learning Researcher – RL and Agentic Systems

Grupo Protege

Machine Learning Researcher specializing in RL and agentic systems for evaluating AI datasets. Working with cross-functional teams to enhance data quality and model performance.

Posted 5/28/2026full-timeRemote • 🇧🇷 BrazilMid-LevelSeniorWebsite

ATS Keywords

Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills

machine learningreinforcement learningexperimental designevaluation methodologybenchmarkingdata validationdataset designtask designmodel evaluationstatistical analysis

Soft Skills

problem solvingcollaborationcommunicationownershipintuitioncritical thinkingindependencecreativityattention to detailadaptability

Tools & Technologies

dataset management toolsevaluation frameworksbenchmark construction toolsdata analysis softwaresimulation environmentsautomated validation systemsdata processing platformsAI training pipelinescross-functional collaboration toolsinfrastructure for reproducible experimentation

Certifications & Qualifications

PhDMaster's Degree

Industry Keywords

agentic systemsmulti-step model behaviorreal-world workflowsdata bottleneckstask-grounded AImodel performanceunstructured datasetssemi-structured datasetsevaluation qualityempirical evidence

About the role

Key responsibilities & impact

Design and build datasets, tasks, and environments for benchmarking agentic systems and multi-step model behavior.
Translate real-world workflows into structured tasks, interaction traces, trajectories, stateful environments, and verifiable outcomes that can be used to evaluate advanced AI systems.
Develop frameworks that assess diversity, realism, coverage, fidelity, informativeness, and downstream usefulness of datasets for agentic systems.
Build quality scorecards and evaluation methods that make dataset strengths, weaknesses, and failure modes legible across teams.
Evaluate planning, tool use, robustness, recovery from failure, task completion, and generalization behavior in RL-style or agentic environments.
Connect model failures back to concrete dataset, environment, or task-design gaps and recommend improvements grounded in empirical evidence.
Contribute to tools and systems that automate dataset validation, environment generation, rollout analysis, benchmark construction, and evaluation workflows.
Improve internal infrastructure for reproducible experimentation, benchmark management, and evaluation quality.
Collaborate closely with research and engineering teams to identify data bottlenecks, improve evaluation methodology, and shape internal best practices around task-grounded AI training data.
Represent DataLab’s perspective in cross-functional discussions around dataset quality, benchmark design, and frontier agentic-system evaluation.

Requirements

What you’ll need

PhD or equivalent Master’s Degree + 4+ years industry experience in machine learning, computer science, statistics, engineering, mathematics, economics, or related quantitative fields.
Strong understanding of AI model training pipelines, evaluation methodology, and the role of data in shaping model performance.
Experience working with large, unstructured, or semi-structured datasets used to train or evaluate ML systems.
Experience with reinforcement learning, sequential decision-making, agentic systems, tool-using models, or multi-step model evaluation.
Experience designing tasks, benchmarks, environments, simulations, or evaluation frameworks for real-world model behavior.
Strong intuition for realism, coverage, difficulty, fidelity, and meaningful outcome structure in datasets.
Strong experimental design, evaluation, benchmarking, and data-validation skills.
High ownership and ability to independently identify and solve high-impact problems.

Benefits

Comp & perks

Health insurance
401(k) matching
Paid time off
Remote work options