FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior Inference Engineer, AIConfigurator
NVIDIASenior Inference Engineer optimizing large-scale LLM serving for NVIDIA AIConfigurator. Building APIs, collaborating with teams, and enhancing model performance on NVIDIA platforms.
Posted 6/13/2026full-timeSanta Clara • California • 🇺🇸 United StatesSenior💰 $184,000 - $356,500 per yearWebsite
Tech Stack
Tools & technologiesDistributed SystemsKubernetesPythonRust
About the role
Key responsibilities & impact- Build and evolve AIConfigurator's core optimization engine for LLM serving, including configuration search, SLA-aware ranking, efficiency and latency estimation, and Pareto frontier analysis.
- Build production-quality Python/Rust APIs, CLIs, SDK surfaces, and web workflows that help users generate strong deployment configurations for NVIDIA GPU clusters.
- Develop configuration generation systems that emit backend-specific artifacts for Dynamo, Kubernetes, TensorRT-LLM, vLLM, and SGLang deployments.
- Collaborate with inference runtime, performance, benchmarking, and product groups to ensure simulated results correspond with actual deployment performance on H100, H200, B200, GB200, and upcoming NVIDIA platforms.
- Improve model, hardware, and backend support by integrating performance databases, profiling data, support matrices, and validation tools.
- Drive software quality through maintainable architecture, schema development, tests, documentation, and automation suitable for open-source and production users.
- Convert intricate inference ideas like prefill/decode disaggregation, tensor parallelism, pipeline parallelism, expert parallelism, batching, and KV cache behavior into dependable software abstractions.
Requirements
What you’ll need- BS, MS, or PhD in Computer Science, Computer Engineering, Electrical Engineering, Applied Math, or a related field, or equivalent experience.
- 10+ years of relevant software engineering experience.
- Strong Python/Rust engineering skills, including production APIs, CLI tools, packaging, testing, debugging, and maintainable software development.
- Experience with GPU computing, distributed systems, ML infrastructure, or high-performance model serving.
- Understanding of LLM inference concepts such as batching, latency, efficiency, memory constraints, parallelism strategies, and serving SLAs.
- Experience working with data-driven performance analysis, benchmarking, simulation, optimization, or managing resource needs.
- Ability to collaborate across research, runtime, platform, and customer-facing engineering teams.
- Strong written and verbal communication skills, with the ability to explain sophisticated technical tradeoffs clearly.
Benefits
Comp & perks- equity
- benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonRustAPIsCLIsSDKDynamoKubernetesTensorRT-LLMvLLMSGLang
Soft Skills
collaborationcommunicationproblem-solvingdocumentationautomationsoftware qualitymaintainable architecturetestingdebuggingperformance analysis