Baseten

Software Engineer, Model Performance Tooling

Baseten

full-time

Posted on:

Location Type: Hybrid

Location: VancouverCanada

Visit company website

Explore more

AI Apply
Apply

Salary

💰 CA$130,000 - CA$200,000 per year

About the role

  • Performance Benchmarking: Run and automate standard LLM quality benchmarks (GSM8K, MMLU) alongside custom performance suites for specific workloads (e.g., long-context window, KV cache reuse).
  • Infrastructure Validation: Create automated acceptance tests for new GPU clusters across x86 and ARM systems, measuring GPU memory bandwidth, networking throughput, and multi-node networking performance.
  • Model Dev Experience: Develop and maintain internal GPU-enabled development environments (similar to GitHub Codespaces). You will ensure the team has seamless, high-performance "dev machines" optimized for model experimentation.
  • Tool Development: Build and contribute to tools such as InferenceMAX and genai-bench to automate model evaluation and optimization.
  • Deep Hardware Profiling: Use PyTorch Profiler and NVIDIA Nsight Systems to collect performance profiles, identify bottlenecks, and debug the NVIDIA compute/networking stack.
  • Monitoring & Observability: Develop real-time dashboards and alerts to monitor system health, model startup times, and runtime performance.
  • Continuous Integration: Automate performance testing via CI/CD pipelines to catch regressions in model setups before they hit production.
  • Optimization Automation: Build tools to find the "Pareto frontier"—identifying the absolute best configuration (latency vs. cost vs. quality) for a given model and workload.

Requirements

  • A Love for Systems & Hardware: You aren’t just interested in the AI; you want to understand GPU memory subsystems, InfiniBand, and how data moves across a cluster.
  • An Automation Mindset: You believe that if a task has to be done twice, it should be scripted. You have a passion for stress-testing and fuzzy testing to find the "breaking point" of a system.
  • Mathematical Curiosity: A desire to understand the underlying math of Transformers and how it translates into FLOPs and memory requirements.
  • Interest in Optimization: You are excited to learn about (or already play with) quantization, speculative decoding, disaggregated serving, and kernel-level optimizations.
  • Technical Toolkit: Familiarity with Python, and an eagerness to master the NVIDIA software stack. C++ familiarity is good to have.
Benefits
  • Competitive compensation, including meaningful equity.
  • 100% coverage of medical, dental, and vision insurance for employee and dependents
  • Generous PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)
  • Paid parental leave
  • Company-facilitated 401(k)
  • Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
PythonC++GPU memory subsystemsFLOPsquantizationspeculative decodingdisaggregated servingkernel-level optimizationsperformance benchmarkingautomated acceptance tests
Soft skills
automation mindsetmathematical curiosityinterest in optimizationpassion for stress-testingfuzzy testing