Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
NVIDIA

Senior Inference Engineer, AIConfigurator

NVIDIA

Senior Inference Engineer optimizing large-scale LLM serving for NVIDIA AIConfigurator. Building APIs, collaborating with teams, and enhancing model performance on NVIDIA platforms.

Posted 6/13/2026full-timeSanta Clara • California • 🇺🇸 United StatesSenior💰 $184,000 - $356,500 per yearWebsite

Tech Stack

Tools & technologies
Distributed SystemsKubernetesPythonRust

About the role

Key responsibilities & impact
  • Build and evolve AIConfigurator's core optimization engine for LLM serving, including configuration search, SLA-aware ranking, efficiency and latency estimation, and Pareto frontier analysis.
  • Build production-quality Python/Rust APIs, CLIs, SDK surfaces, and web workflows that help users generate strong deployment configurations for NVIDIA GPU clusters.
  • Develop configuration generation systems that emit backend-specific artifacts for Dynamo, Kubernetes, TensorRT-LLM, vLLM, and SGLang deployments.
  • Collaborate with inference runtime, performance, benchmarking, and product groups to ensure simulated results correspond with actual deployment performance on H100, H200, B200, GB200, and upcoming NVIDIA platforms.
  • Improve model, hardware, and backend support by integrating performance databases, profiling data, support matrices, and validation tools.
  • Drive software quality through maintainable architecture, schema development, tests, documentation, and automation suitable for open-source and production users.
  • Convert intricate inference ideas like prefill/decode disaggregation, tensor parallelism, pipeline parallelism, expert parallelism, batching, and KV cache behavior into dependable software abstractions.

Requirements

What you’ll need
  • BS, MS, or PhD in Computer Science, Computer Engineering, Electrical Engineering, Applied Math, or a related field, or equivalent experience.
  • 10+ years of relevant software engineering experience.
  • Strong Python/Rust engineering skills, including production APIs, CLI tools, packaging, testing, debugging, and maintainable software development.
  • Experience with GPU computing, distributed systems, ML infrastructure, or high-performance model serving.
  • Understanding of LLM inference concepts such as batching, latency, efficiency, memory constraints, parallelism strategies, and serving SLAs.
  • Experience working with data-driven performance analysis, benchmarking, simulation, optimization, or managing resource needs.
  • Ability to collaborate across research, runtime, platform, and customer-facing engineering teams.
  • Strong written and verbal communication skills, with the ability to explain sophisticated technical tradeoffs clearly.

Benefits

Comp & perks
  • equity
  • benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PythonRustAPIsCLIsSDKDynamoKubernetesTensorRT-LLMvLLMSGLang
Soft Skills
collaborationcommunicationproblem-solvingdocumentationautomationsoftware qualitymaintainable architecturetestingdebuggingperformance analysis