Researcher, Agent Post-Training, API & Power-Users

OpenAI

Researcher improving capabilities and reliability of OpenAI’s agentic models for power users. Collaborating with teams to enhance model behavior in real workflows and develop innovative solutions.

Posted 6/29/2026full-timeSan Francisco • California • 🇺🇸 United StatesMid-LevelSenior💰 $295,000 - $445,000 per yearWebsite

About the role

Key responsibilities & impact

Design and run experiments that improve model behavior in API and power-user workflows: function calling, tool use, coding, planning, long-horizon execution, factuality, instruction following, error recovery, and calibrated reasoning.
Build evals, graders, and environments from real developer and power-user workflows, then turn observed failures into training data, model-behavior hypotheses, and shipped improvements.
Partner with API and power-users to identify high-leverage behavior gaps and convert product signals into post-training interventions.
Improve how models behave when composed into systems: using tools reliably, respecting developer intent, handling partial failures, asking for clarification when appropriate, and maintaining coherence across multi-step tasks.
Own end-to-end model behavior projects, from qualitative failure analysis through data generation, training experiments, eval design, integration into major runs, and launch readiness.
Develop feedback loops that use power-user traces, API usage patterns, and production-like environments to discover the next frontier of agentic model failures and gaps.
Help decide which agentic capabilities, behavioral fixes, and partner-team integrations are ready for inclusion in major model runs.
Debug hard failures in shipped or near-shipped models by moving between traces, evals, training data, model outputs, and product context.
Work on early-training and alignment interventions, including data mixtures, objectives, synthetic data, and eval loops that shape downstream agent behavior.
Improve the machinery for large-scale training and launch: experiment velocity, reliability, observability, reproducibility, cost, latency, and production readiness.
Take on cross-functional projects that touch model training, product infrastructure, and the production agent harness, such as multi-agent systems or training directly against production-like environments.

Requirements

What you’ll need

Have strong technical fundamentals in ML, software engineering, systems, statistics, or applied research, and can quickly learn across unfamiliar parts of the stack
Have hands-on experience with LLMs, post-training, RL/RLHF/RLAIF, evals, graders, synthetic data, coding agents, tool-using agents, API products, or production ML systems
Have strong taste for model behavior: you can look at a transcript, trace, eval failure, or API interaction and form concrete hypotheses about what the model needs to learn
Are excited by ambiguous capability problems where the signal is noisy, the failures are qualitative, and the solution may involve data, training, evals, product changes, or all of the above
Deeply care about developer and expert-user experience, especially how models behave when embedded in real user workflows, API products, and agent harnesses
Are comfortable working across research, product, infrastructure, data, evals, and safety boundaries, and can communicate clearly with each group
Like building load-bearing systems and processes when that is what the team needs, even if the work is not glamorous
Want to train and ship the models that make agents genuinely useful for developers, enterprises, researchers, and everyday users.

Benefits

Comp & perks

Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
401(k) retirement plan with employer match
Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
Mental health and wellness support
Employer-paid basic life and disability coverage
Annual learning and development stipend to fuel your professional growth
Daily meals in our offices, and meal delivery credits as eligible
Relocation support for eligible employees
Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Machine LearningSoftware EngineeringStatisticsExperiment DesignData GenerationModel EvaluationCoding AgentsTool-Using AgentsSynthetic DataProduction ML Systems

Soft Skills

Clear CommunicationProblem-SolvingCollaborationAdaptabilityUser Experience Focus