
Software Engineer, Model Performance Tooling
Baseten
full-time
Posted on:
Location Type: Hybrid
Location: Vancouver • Canada
Visit company websiteExplore more
Salary
💰 CA$130,000 - CA$200,000 per year
About the role
- Performance Benchmarking: Run and automate standard LLM quality benchmarks (GSM8K, MMLU) alongside custom performance suites for specific workloads (e.g., long-context window, KV cache reuse).
- Infrastructure Validation: Create automated acceptance tests for new GPU clusters across x86 and ARM systems, measuring GPU memory bandwidth, networking throughput, and multi-node networking performance.
- Model Dev Experience: Develop and maintain internal GPU-enabled development environments (similar to GitHub Codespaces). You will ensure the team has seamless, high-performance "dev machines" optimized for model experimentation.
- Tool Development: Build and contribute to tools such as InferenceMAX and genai-bench to automate model evaluation and optimization.
- Deep Hardware Profiling: Use PyTorch Profiler and NVIDIA Nsight Systems to collect performance profiles, identify bottlenecks, and debug the NVIDIA compute/networking stack.
- Monitoring & Observability: Develop real-time dashboards and alerts to monitor system health, model startup times, and runtime performance.
- Continuous Integration: Automate performance testing via CI/CD pipelines to catch regressions in model setups before they hit production.
- Optimization Automation: Build tools to find the "Pareto frontier"—identifying the absolute best configuration (latency vs. cost vs. quality) for a given model and workload.
Requirements
- A Love for Systems & Hardware: You aren’t just interested in the AI; you want to understand GPU memory subsystems, InfiniBand, and how data moves across a cluster.
- An Automation Mindset: You believe that if a task has to be done twice, it should be scripted. You have a passion for stress-testing and fuzzy testing to find the "breaking point" of a system.
- Mathematical Curiosity: A desire to understand the underlying math of Transformers and how it translates into FLOPs and memory requirements.
- Interest in Optimization: You are excited to learn about (or already play with) quantization, speculative decoding, disaggregated serving, and kernel-level optimizations.
- Technical Toolkit: Familiarity with Python, and an eagerness to master the NVIDIA software stack. C++ familiarity is good to have.
Benefits
- Competitive compensation, including meaningful equity.
- 100% coverage of medical, dental, and vision insurance for employee and dependents
- Generous PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)
- Paid parental leave
- Company-facilitated 401(k)
- Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
PythonC++GPU memory subsystemsFLOPsquantizationspeculative decodingdisaggregated servingkernel-level optimizationsperformance benchmarkingautomated acceptance tests
Soft skills
automation mindsetmathematical curiosityinterest in optimizationpassion for stress-testingfuzzy testing