OpenAI

Software Engineer, Data Infrastructure – Research

OpenAI

full-time

Posted on:

Origin:  • 🇺🇸 United States • California

Visit company website
AI Apply
Apply

Salary

💰 $250,000 - $380,000 per year

Job Level

Mid-LevelSenior

Tech Stack

Distributed Systems

About the role

  • Design and maintain standardized dataset APIs, including for multimodal data that cannot fit in memory
  • Build proactive testing and scale validation pipelines for dataset loading at GPU scale
  • Integrate datasets into training and inference pipelines, collaborating with multimodal researchers and infra teams
  • Document and maintain dataset interfaces for discoverability and consistent adoption
  • Establish safeguards and validation systems to ensure reproducibility of standardized datasets
  • Debug and resolve performance bottlenecks in distributed dataset loading (e.g., stragglers)
  • Provide visualization and inspection tools to surface errors, bugs, or bottlenecks
  • Work on LLM training and inference infrastructure to support massive-scale GPU/accelerator fleets

Requirements

  • Strong engineering fundamentals with experience in distributed systems, data pipelines, or infrastructure
  • Experience building APIs, modular code, and scalable abstractions with attention to UX
  • Comfortable debugging bottlenecks across large fleets of machines
  • Collaborative, humble, and able to own foundational ML infrastructure
  • Bonus: background in data math, probability, or distributed data theory
  • Bonus: experience with GPU-scale distributed systems or dataset scaling for real-time data
Moonvalley

Head of Data

Moonvalley
Leadfull-time🇺🇸 United States
Posted: 32 days agoSource: jobs.ashbyhq.com
Distributed Systems
Reddit, Inc.

Staff Software Engineer, ML Ranking Platform

Reddit, Inc.
Leadfull-time$230k–$322k / year🇺🇸 United States
Posted: 18 days agoSource: boards.greenhouse.io
Distributed SystemsGoPython
Elevance Health

Lead AI Platform Engineer

Elevance Health
Seniorfull-time$138k–$226k / yearCalifornia, Illinois, Tennessee, Washington · 🇺🇸 United States
Posted: 3 days agoSource: elevancehealth.wd1.myworkdayjobs.com
Distributed Systems
Luma AI

Research Engineer - Evaluations

Luma AI
Mid · Seniorfull-time$220k–$280k / yearCalifornia · 🇺🇸 United States
Posted: 26 days agoSource: jobs.ashbyhq.com
Distributed SystemsPythonPyTorchTensorflow
NVIDIA

Senior Software Engineer, AI Systems

NVIDIA
Seniorfull-time$116k–$247k / year🇨🇦 Canada
Posted: 33 days agoSource: nvidia.wd5.myworkdayjobs.com
AWSAzureCloudDistributed SystemsDockerGoogle Cloud PlatformKubernetesNode.jsPythonPyTorch