
Senior Machine Learning Engineer – Applications, Compilers
NVIDIA
full-time
Posted on:
Location Type: Hybrid
Location: Cambridge • United Kingdom
Visit company websiteExplore more
Job Level
Tech Stack
About the role
- Build, develop, and maintain high-performance runtime and compiler components, focusing on end-to-end inference optimization.
- Define and implement mappings of large-scale inference workloads onto NVIDIA’s systems.
- Extend and integrate with NVIDIA’s SW ecosystem, contributing to libraries, tooling, and interfaces that enable seamless deployment of models across platforms.
- Benchmark, profile, and monitor key performance and efficiency metrics to ensure the compiler generates efficient mappings of neural network graphs to our inference hardware.
- Collaborate closely with hardware architects and design teams to feedback software observations, influence future architectures, and codesign features that unlock new performance and efficiency points.
- Prototype and evaluate new compilation and runtime techniques, including graph transformations, scheduling strategies, and memory/layout optimizations tailored to spatial processors.
- Publish and present technical work on novel compilation approaches for inference and related spatial accelerators at top tier ML, compiler, and computer architecture venues.
Requirements
- MS or PhD in Computer Science, Electrical/Computer Engineering, or related field, or equivalent experience, with 5 years of relevant experience.
- Strong software engineering background with proficiency in systems level programming (e.g., C/C++ and/or Rust) and solid CS fundamentals in data structures, algorithms, and concurrency.
- Hands on experience with compiler or runtime development, including IR design, optimization passes, or code generation.
- Experience with LLVM and/or MLIR, including building custom passes, dialects, or integrations.
- Familiarity with deep learning frameworks such as TensorFlow and PyTorch, and experience working with portable graph formats such as ONNX.
- Solid understanding of parallel and heterogeneous compute architectures, such as GPUs, spatial accelerators, or other domain specific processors.
- Strong analytical and debugging skills, with experience using profiling, tracing, and benchmarking tools to drive performance improvements.
- Excellent communication and collaboration skills, with the ability to work across hardware, systems, and software teams.
- Ideal candidates will have direct experience with MLIR based compilers or other multilevel IR stacks, especially in the context of graph based deep learning workloads.
Benefits
- Flexible work arrangements
- Professional development opportunities
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
CC++Rustcompiler developmentruntime developmentIR designoptimization passescode generationLLVMMLIR
Soft Skills
analytical skillsdebugging skillscommunication skillscollaboration skills
Certifications
MS in Computer SciencePhD in Computer ScienceMS in Electrical EngineeringPhD in Electrical Engineering