
Member of Technical Staff, Research Engineer – GPU Performance
Runway
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $270,000 - $370,000 per year
Job Level
Tech Stack
About the role
- Help world models train faster and run more efficiently.
- Profile, optimize, and rearchitect systems that turn research ideas into models that run at scale and in real time.
- Optimize training throughput across large GPU clusters.
- Design and maintain distributed training infrastructure.
- Profile and accelerate inference pipelines for real-time multimodal generation.
- Optimize and scale training infrastructure to improve efficiency and reliability.
- Contribute to the entire stack, from low-level kernel optimizations to high-level model design.
Requirements
- 4+ years of experience in systems engineering, ML infrastructure, or performance optimization for deep learning.
- Familiarity with GPU kernel development (CUDA, Triton, CUTLASS) and distributed systems (NCCL, collective communication, model parallelism).
- Experience with ML framework internals (PyTorch, JAX) and mixed-precision / low-precision techniques (FP8, INT8).
- Experience building and operating large-scale training infrastructure, including fault tolerance and cluster orchestration.
- Excitement about building AI that simulates the world — and making it performant enough to run in real time.
- Bonus if you have experience with torch’s compilation feature.
Benefits
- Salary range based on competitive market rates for our size, stage and industry.
- Pay equity for our team.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
performance optimizationGPU kernel developmentCUDATritonCUTLASSdistributed systemsNCCLPyTorchJAXmixed-precision techniques
Soft Skills
problem-solvingcollaborationcommunicationcreativityadaptability