Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
OpenAI

Researcher, Agent Post-Training, API & Power-Users

OpenAI

Researcher improving capabilities and reliability of OpenAI’s agentic models for power users. Collaborating with teams to enhance model behavior in real workflows and develop innovative solutions.

Posted 6/29/2026full-timeSan Francisco • California • 🇺🇸 United StatesMid-LevelSenior💰 $295,000 - $445,000 per yearWebsite

About the role

Key responsibilities & impact
  • Design and run experiments that improve model behavior in API and power-user workflows: function calling, tool use, coding, planning, long-horizon execution, factuality, instruction following, error recovery, and calibrated reasoning.
  • Build evals, graders, and environments from real developer and power-user workflows, then turn observed failures into training data, model-behavior hypotheses, and shipped improvements.
  • Partner with API and power-users to identify high-leverage behavior gaps and convert product signals into post-training interventions.
  • Improve how models behave when composed into systems: using tools reliably, respecting developer intent, handling partial failures, asking for clarification when appropriate, and maintaining coherence across multi-step tasks.
  • Own end-to-end model behavior projects, from qualitative failure analysis through data generation, training experiments, eval design, integration into major runs, and launch readiness.
  • Develop feedback loops that use power-user traces, API usage patterns, and production-like environments to discover the next frontier of agentic model failures and gaps.
  • Help decide which agentic capabilities, behavioral fixes, and partner-team integrations are ready for inclusion in major model runs.
  • Debug hard failures in shipped or near-shipped models by moving between traces, evals, training data, model outputs, and product context.
  • Work on early-training and alignment interventions, including data mixtures, objectives, synthetic data, and eval loops that shape downstream agent behavior.
  • Improve the machinery for large-scale training and launch: experiment velocity, reliability, observability, reproducibility, cost, latency, and production readiness.
  • Take on cross-functional projects that touch model training, product infrastructure, and the production agent harness, such as multi-agent systems or training directly against production-like environments.

Requirements

What you’ll need
  • Have strong technical fundamentals in ML, software engineering, systems, statistics, or applied research, and can quickly learn across unfamiliar parts of the stack
  • Have hands-on experience with LLMs, post-training, RL/RLHF/RLAIF, evals, graders, synthetic data, coding agents, tool-using agents, API products, or production ML systems
  • Have strong taste for model behavior: you can look at a transcript, trace, eval failure, or API interaction and form concrete hypotheses about what the model needs to learn
  • Are excited by ambiguous capability problems where the signal is noisy, the failures are qualitative, and the solution may involve data, training, evals, product changes, or all of the above
  • Deeply care about developer and expert-user experience, especially how models behave when embedded in real user workflows, API products, and agent harnesses
  • Are comfortable working across research, product, infrastructure, data, evals, and safety boundaries, and can communicate clearly with each group
  • Like building load-bearing systems and processes when that is what the team needs, even if the work is not glamorous
  • Want to train and ship the models that make agents genuinely useful for developers, enterprises, researchers, and everyday users.

Benefits

Comp & perks
  • Medical, dental, and vision insurance for you and your family, with employer contributions to Health Savings Accounts
  • Pre-tax accounts for Health FSA, Dependent Care FSA, and commuter expenses (parking and transit)
  • 401(k) retirement plan with employer match
  • Paid parental leave (up to 24 weeks for birth parents and 20 weeks for non-birthing parents), plus paid medical and caregiver leave (up to 8 weeks)
  • Paid time off: flexible PTO for exempt employees and up to 15 days annually for non-exempt employees
  • 13+ paid company holidays, and multiple paid coordinated company office closures throughout the year for focus and recharge, plus paid sick or safe time (1 hour per 30 hours worked, or more, as required by applicable state or local law)
  • Mental health and wellness support
  • Employer-paid basic life and disability coverage
  • Annual learning and development stipend to fuel your professional growth
  • Daily meals in our offices, and meal delivery credits as eligible
  • Relocation support for eligible employees
  • Additional taxable fringe benefits, such as charitable donation matching and wellness stipends, may also be provided.

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Machine LearningSoftware EngineeringStatisticsExperiment DesignData GenerationModel EvaluationCoding AgentsTool-Using AgentsSynthetic DataProduction ML Systems
Soft Skills
Clear CommunicationProblem-SolvingCollaborationAdaptabilityUser Experience Focus