Software Engineer, Applied Evals

OpenAI

full-time

Posted on: 9/10/2025

Origin: • 🇺🇸 United States • California

✨ AI Apply

💰 $255,000 - $325,000 per year

Mid-LevelSenior

About the role

Define the core evaluation signals that drive model improvement at OpenAI, turning vague product gaps into crisp, defensible measures of quality
Design agents, harnesses, and eval pipelines that are reliable, reproducible, and extendable
Prototype solutions with real workflows and convert them into scalable feedback loops
Connect evaluation signals directly to research and training systems so product improvements show up in what users experience
Evaluate multi-turn and tool-using systems, design agent harnesses, and apply reinforcement learning and related methods in production settings
Collaborate closely with research and product teams and work across the stack, from backend pipelines to user-facing interfaces
Build reusable systems and tools that enable contributions from across the company and steadily raise the quality bar
Operate like a founder or founding engineer: take initiative, move quickly, and create structure where none exists

4+ years of experience in software engineering with strong fundamentals and a track record of shipping production systems end-to-end
Experience building AI agents or applications, including designing evals and improving performance through prompting or scaffolding
Familiarity with evaluation methods for LLMs and patterns like multi-agent workflows, tool use, or long context
Familiarity with deep learning concepts or prior exposure to training models
Strong communication across technical and non-technical audiences
Motivated by high-impact collaboration with research and product teams and thrive in ambiguity
Ability to work from San Francisco office three days per week (hybrid)
Willingness to prototype with users and build reusable systems and tools