Parasail.ai

Senior Software Engineer, LLM Performance

Parasail.ai

full-time

Posted on:

Location Type: Hybrid

Location: San MateoCaliforniaUnited States

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Add support for new LLMs, working across the stack from low-level GPU kernels to Kubernetes-based deployments.
  • Contribute to cutting-edge open-source LLM engines such as vLLM or SGLang to extend their capabilities and performance (e.g. use Python technologies to improve API servers or request schedulers).
  • Operate closer to the hardware, focusing on building and integrating solutions to boost performance and hardware utilization. For example, improve attention backends like FlashAttention or FlashInfer by contributing to their development and optimization, or by integrating their solutions into vLLM.
  • Improve LLM performance using advanced algorithmic solutions such as speculative decoding, quantization, or other state-of-the-art techniques. Understand the impact of such techniques in model quality.

Requirements

  • Expertise in GPU computing, including low-level platforms such as CUDA, ROCm, XLA, PyTorch, Jax, etc.
  • Background in performance analysis and optimization of AI/HPC workloads (e.g. profiling or theoretical analysis of Flops and bandwidth).
  • Experience in writing GPU kernels using technologies like CUDA, CUTLASS, Triton.
  • Strength in Python and C++.
  • Demonstrated contributions to open-source projects. Contributions to inference engines such as vLLM is a strong plus.
  • A production-oriented mindset emphasizing robust, scalable code suitable for enterprise-grade applications.
  • A relentless curiosity about cutting-edge AI technologies combined with a passion for solving complex problems.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
GPU computingCUDAROCmXLAPyTorchJaxPythonC++GPU kernelsperformance optimization
Soft Skills
production-oriented mindsetcuriosityproblem-solving