
Deep Learning Software Engineer, LLM Performance
NVIDIA
full-time
Posted on:
Location Type: Remote
Location: California • United States
Visit company websiteExplore more
Salary
💰 $124,000 - $195,500 per year
Tech Stack
About the role
- Performance optimization, analysis, and tuning of LLM, VLM and GenAI models for DL inference, serving and deployment in NVIDIA/OSS LLM frameworks
- Scale performance of LLM models across different architectures and types of NVIDIA accelerators
- Scale performance for max throughput, minimum latency and throughput under latency constraints
- Contribute features and code to NVIDIA/OSS LLM frameworks, inference benchmarking frameworks, TensorRT, and Triton
- Work with cross-collaborative teams across generative AI, automotive, image understanding, and speech understanding to develop innovative solutions
Requirements
- Bachelors, Masters, PhD, or equivalent experience in relevant fields (Computer Engineering, Computer Science, EECS, AI)
- 2+ years of relevant software development experience
- Excellent Python/C/C++ programming, software design and software engineering skills
- Experience with a DL framework like PyTorch, JAX, TensorFlow
- Prior experience with a LLM framework or a DL compiler in inference, deployment, algorithms, or implementation
- Prior experience with performance modeling, profiling, debug, and code optimization of a DL/HPC/high-performance application
- Architectural knowledge of CPU and GPU
- GPU programming experience (CUDA or OpenCL)
Benefits
- Equity
- Benefits 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonCC++Deep LearningPyTorchJAXTensorFlowCUDAOpenCLperformance optimization
Soft Skills
collaborationcommunicationproblem-solvinginnovation
Certifications
Bachelor's degreeMaster's degreePhD