Runtime Engineer

Lemurian Labs

full-time

Posted on: 4/1/2026

Location Type: Hybrid

Location: San Francisco • California • United States

✨ AI Apply

About the role

Design, develop, maintain and improve our multi-target runtime
Use the latest techniques in parallelization and partitioning to automate generation and exploit highly optimized kernels
Rapid prototyping and data driven exploration of new ideas
Benchmark and analyze the outputs produced by our optimizing compiler on target hardware
Work closely with our product team to understand the evolving needs of ML engineers and drive improvements in architecture
Build tools to collect and analyze performance bottlenecks

A deep understanding of asynchronous, concurrent programming.
4+ years of experience with C/C++ (C++14 or newer).
An understanding of HW architecture (vector vs scalar registers and instructions, memory hierarchies).
Knowledge of operating system kernel development or hypervisor development.
Experience developing or maintaining libraries like CUDA or ROCm.
Experience with GPU programming.
Experience with high performance computing (HPC).
Masters or PhD degree in computer science, or equivalent practical experience.
Knowledge of DL frameworks such as PyTorch, JAX or Triton.
Experience with programming large compute clusters.

Benefits

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

CC++C++14asynchronous programmingconcurrent programmingGPU programminghigh performance computingCUDAROCmDL frameworks

Soft Skills

collaborationcommunicationproblem-solvingdata-driven explorationbenchmarkinganalysis

Certifications

Masters degree in computer sciencePhD in computer science