Lead the design, development, and roadmap of AI-Perf, defining benchmarking methodologies, performance metrics, and reproducible experimental workflows.
Build scalable and high-performance features to measure latency, throughput, and efficiency across AI models and distributed systems.
Partner with AI researchers, platform teams, and engineers to translate experimental challenges into robust, user-friendly performance tooling.
Integrate AI-Perf with the Dynamo Inference Stack, other NVIDIA inference stacks, and open-source inference frameworks, delivering end-to-end performance insights for researchers and production users.

Requirements

Bachelor’s, Master’s, or PhD in Computer Science, Computer Engineering, or related field—or equivalent experience.
8+ years of experience in systems software, distributed performance engineering, or AI infrastructure research.
Expert-level Python skills, including profiling, optimization, automation, and debugging of complex systems.
Deep knowledge of distributed systems concepts, including scalability, concurrency, fault tolerance, and performance trade-offs.
Experience designing or maintaining performance benchmarking frameworks or tooling for AI/ML systems.
Hands-on experience with LLMs and deep learning frameworks such as PyTorch, TensorFlow, TensorRT, or ONNX Runtime.
Contributions to open-source or research projects in AI performance, infrastructure, or distributed systems.
Experience running large-scale inference experiments across cloud and on-prem environments (AWS, Azure, GCP, bare metal).

Benefits

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Pythonperformance benchmarkingprofilingoptimizationautomationdebuggingdistributed systemsAI/ML systemsLLMsdeep learning frameworks