Runway

Member of Technical Staff, Research Engineer – GPU Performance

Runway

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $270,000 - $370,000 per year

Job Level

About the role

  • Help world models train faster and run more efficiently.
  • Profile, optimize, and rearchitect systems that turn research ideas into models that run at scale and in real time.
  • Optimize training throughput across large GPU clusters.
  • Design and maintain distributed training infrastructure.
  • Profile and accelerate inference pipelines for real-time multimodal generation.
  • Optimize and scale training infrastructure to improve efficiency and reliability.
  • Contribute to the entire stack, from low-level kernel optimizations to high-level model design.

Requirements

  • 4+ years of experience in systems engineering, ML infrastructure, or performance optimization for deep learning.
  • Familiarity with GPU kernel development (CUDA, Triton, CUTLASS) and distributed systems (NCCL, collective communication, model parallelism).
  • Experience with ML framework internals (PyTorch, JAX) and mixed-precision / low-precision techniques (FP8, INT8).
  • Experience building and operating large-scale training infrastructure, including fault tolerance and cluster orchestration.
  • Excitement about building AI that simulates the world — and making it performant enough to run in real time.
  • Bonus if you have experience with torch’s compilation feature.
Benefits
  • Salary range based on competitive market rates for our size, stage and industry.
  • Pay equity for our team.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
performance optimizationGPU kernel developmentCUDATritonCUTLASSdistributed systemsNCCLPyTorchJAXmixed-precision techniques
Soft Skills
problem-solvingcollaborationcommunicationcreativityadaptability