Research Engineer – RL Infrastructure

Prime Intellect

full-time

Posted on: 3/27/2026

Location Type: Remote

✨ AI Apply

About the role

Build and optimize the systems infrastructure behind large-scale RL and distributed training workloads.
Improve end-to-end training efficiency across compute, memory, networking, and scheduling layers.
Design and implement low-level performance optimizations, including kernels, communication paths, and runtime improvements.
Work on distributed training systems spanning data, tensor, and pipeline parallel workloads.
Help shape the architecture of our RL training stack, including async rollout and post-training systems.
Contribute to open-source libraries and internal infrastructure used for frontier-scale model training.
Collaborate closely with researchers and infrastructure engineers to translate bottlenecks into concrete systems improvements.
Stay at the frontier of training systems, inference systems, compiler/runtime tooling, and hardware-aware optimization techniques.

Strong systems engineering experience in AI/ML infrastructure, especially around large-scale model training or inference.
Deep familiarity with PyTorch and distributed training frameworks such as PyTorch Distributed, DeepSpeed, FSDP, Megatron, vLLM, Ray, or related tooling.
Experience optimizing training performance across kernels, memory movement, communication overhead, or parallelization strategy.
Hands-on experience with large-scale training techniques including data parallelism, tensor parallelism, and pipeline parallelism.
Strong understanding of GPU architecture, profiling, and performance debugging.
Ability to identify bottlenecks across the stack and drive improvements from first principles.
Comfort working in a fast-moving environment with ambiguous problems and high ownership.

Benefits

Competitive compensation, including equity.
Flexible work arrangements, with the option to work remotely or in person from our San Francisco office.
Visa sponsorship and relocation support for international candidates.
Quarterly team offsites, hackathons, conferences, and learning opportunities.
A deeply technical, high-agency team working on infrastructure for open superintelligence.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

systems engineeringAI/ML infrastructurelarge-scale model traininginferencePyTorchdistributed training frameworksDeepSpeedFSDPMegatronvLLM

Soft Skills

collaborationproblem-solvingownershipadaptability