Baseten

Software Engineer – GPU Networking, Distributed Systems

Baseten

full-time

Posted on:

Location Type: Hybrid

Location: San FranciscoCaliforniaUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $150,000 - $250,000 per year

Tech Stack

About the role

  • Make RDMA First-Class: Integrate RDMA/RoCE/InfiniBand capabilities into our inference stack.
  • Optimize Distributed Inference: Implement and tune networking layers for Disaggregated KV Cache Offload and WideEP.
  • Enable Serverless-Grade Startup Speeds for LLMs: Work with checkpointing and storage for sub-10-second startup for models.
  • Deep-Dive into Hardware: Validate networking performance on bleeding-edge clusters and write acceptance tests.
  • Build Observability: Design tools to visualize packet flow and diagnose distributed system behaviors.
  • Optimize Kernels: Work with communication libraries (NCCL, NVSHMEM) and write custom kernels to overlap compute and data transfer.

Requirements

  • Deep experience with high-performance networking protocols (InfiniBand, RoCE v2) and understand the physics of data movement.
  • Fluent in C++ or Python, with the ability to bridge the gap between high-level logic and hardware.
  • Deep understanding of the memory hierarchy in modern NVIDIA architectures (H100/Blackwell) and know how to optimize for it.
  • Experience with NCCL, NVSHMEM, and UCX is highly preferred.
  • Experience with GPUDirect Storage (GDS) or high-performance filesystems like Weka or 3FS.
  • Familiarity with TensorRT-LLM, vLLM, or Sglang is a plus.
  • Experience running low-level benchmarks to "qualify" new hardware clusters.
Benefits
  • Competitive compensation, including meaningful equity.
  • 100% coverage of medical, dental, and vision insurance for employee and dependents
  • Generous PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)
  • Paid parental leave
  • Company-facilitated 401(k)
  • Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
C++PythonRDMARoCEInfiniBandNCCLNVSHMEMUCXGPUDirect StorageTensorRT-LLM