Baseten

Software Engineer – Model Performance

Baseten

full-time

Posted on:

Location Type: Hybrid

Location: San FranciscoCaliforniaUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $180,000 - $360,000 per year

About the role

  • Implement, refine, and productionize cutting-edge techniques (quantization, speculative decoding, kv cache reuse, chunked prefill and LoRA) for ML model inference and infrastructure.
  • Deep dive into underlying codebases of TensorRT, PyTorch, TensorRT-LLM, vllm, sglang, CUDA, and other libraries to debug ML performance issues.
  • Apply and scale optimization techniques across a wide range of ML models, particularly large language models.
  • Collaborate with a diverse team to design and implement innovative solutions.
  • Own projects from idea to production.

Requirements

  • Bachelor's, Master's, or Ph.D. degree in Computer Science, Engineering, Mathematics, or related field.
  • Experience with one or more general-purpose programming languages, such as Python or C++.
  • Familiarity with LLM optimization techniques (e.g., quantization, speculative decoding, continuous batching).
  • Strong familiarity with ML libraries, especially PyTorch, TensorRT, or TensorRT-LLM.
  • Demonstrated interest and experience in LLM’s.
  • Deep understanding of GPU architecture.
  • Proficiency in enhancing the performance of software systems, particularly in the context of large language models (LLMs) (Bonus).
  • Experience with CUDA or similar technologies (Bonus).
  • Deep understanding of software engineering principles and a proven track record of developing and deploying AI/ML inference solutions (Bonus).
  • Experience with Docker and Kubernetes (Bonus).
Benefits
  • Competitive compensation, including meaningful equity.
  • 100% coverage of medical, dental, and vision insurance for employee and dependents
  • Generous PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)
  • Paid parental leave
  • Company-facilitated 401(k)
  • Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PythonC++quantizationspeculative decodingML model inferenceLLM optimization techniquesPyTorchTensorRTCUDAsoftware engineering principles
Soft Skills
collaborationproject ownershipproblem-solving