
Software Engineer – Model Performance
Baseten
full-time
Posted on:
Location Type: Hybrid
Location: San Francisco • California • United States
Visit company websiteExplore more
Salary
💰 $180,000 - $360,000 per year
Tech Stack
About the role
- Implement, refine, and productionize cutting-edge techniques (quantization, speculative decoding, kv cache reuse, chunked prefill and LoRA) for ML model inference and infrastructure.
- Deep dive into underlying codebases of TensorRT, PyTorch, TensorRT-LLM, vllm, sglang, CUDA, and other libraries to debug ML performance issues.
- Apply and scale optimization techniques across a wide range of ML models, particularly large language models.
- Collaborate with a diverse team to design and implement innovative solutions.
- Own projects from idea to production.
Requirements
- Bachelor's, Master's, or Ph.D. degree in Computer Science, Engineering, Mathematics, or related field.
- Experience with one or more general-purpose programming languages, such as Python or C++.
- Familiarity with LLM optimization techniques (e.g., quantization, speculative decoding, continuous batching).
- Strong familiarity with ML libraries, especially PyTorch, TensorRT, or TensorRT-LLM.
- Demonstrated interest and experience in LLM’s.
- Deep understanding of GPU architecture.
- Proficiency in enhancing the performance of software systems, particularly in the context of large language models (LLMs) (Bonus).
- Experience with CUDA or similar technologies (Bonus).
- Deep understanding of software engineering principles and a proven track record of developing and deploying AI/ML inference solutions (Bonus).
- Experience with Docker and Kubernetes (Bonus).
Benefits
- Competitive compensation, including meaningful equity.
- 100% coverage of medical, dental, and vision insurance for employee and dependents
- Generous PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)
- Paid parental leave
- Company-facilitated 401(k)
- Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonC++quantizationspeculative decodingML model inferenceLLM optimization techniquesPyTorchTensorRTCUDAsoftware engineering principles
Soft Skills
collaborationproject ownershipproblem-solving