Senior Performance Engineer – LLM Inference Frameworks

NVIDIA

Senior Performance Engineer optimizing LLM inference on NVIDIA GPUs, developing efficient pipelines and innovative techniques in TensorRT‑LLM team for high-performance infrastructure.

Posted 4/20/2026full-timeYokneam • 🇮🇱 IsraelSeniorWebsite

Tech Stack

Tools & technologies

PythonPyTorch

About the role

Key responsibilities & impact

Design, implement, and optimize high‑performance inference pipelines for large language models running on GPUs
Profile and tune model execution across the stack - from scheduler design to kernel fusions and everything in-between
Design and experiment with memory management strategies for improved memory bandwidth optimization and cache efficiency
Innovate and Implement cutting-edge techniques such as Speculative Decoding, Context Caching, and FP8/INT4 quantization to push the boundaries of tokens-per-second-per-watt
Develop and maintain benchmarking and testing systems that quantify latency, utilization, and efficiency

Requirements

What you’ll need

Bachelor's, Master's, or higher degree in Computer Engineering, Computer Science, Applied Mathematics, or related computing-focused degree (or equivalent experience)
5+ years of relevant software development experience
Excellent Python programming skills, software design, and software engineering skills
Experience working with deep learning frameworks like PyTorch and HuggingFace
Experience profiling and debugging performance at all levels - Python runtime, PyTorch internals, and GPU utilization metrics
Awareness of the latest developments in LLM architectures and LLM inference techniques
Proactive and able to work without supervision
Excellent written and oral communication skills in English

Benefits

Comp & perks

Competitive salaries
Comprehensive benefits package

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Pythondeep learninginference pipelinesmemory managementbenchmarkingperformance profilingFP8 quantizationINT4 quantizationscheduler designkernel fusion

Soft Skills

proactivecommunicationindependent work