Salary
💰 $120,000 - $235,750 per year
Tech Stack
Distributed SystemsPythonPyTorch
About the role
- Write safe, scalable, modular, and high-quality (C++/Python) code for our core backend software for LLM inference.
- Perform benchmarking, profiling, and system-level programming for GPU applications.
- Provide code reviews, design docs, and tutorials to facilitate collaboration among the team.
- Conduct unit tests and performance tests for different stages of the inference pipeline.
- Work and collaborate with teams involving resource orchestration, distributed systems, inference engine optimization, and writing high performance GPU kernels.
- Architect and implement inference stacks to enable efficient, scalable, and accessible LLM inference.
Requirements
- Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent experience.
- Strong coding skills in Python and C/C++.
- 2+ years of industry experience in software engineering or equivalent research experience.
- Knowledgeable and passionate about machine learning and performance engineering.
- Proven project experiences in building software where performance is one of its core offerings.
- Solid fundamentals in machine learning, deep learning, operating systems, computer architecture and parallel programming.
- Research experience in systems or machine learning.
- Project experience in modern DL software such as PyTorch, CUDA, vLLM, SGLang, and TensorRT-LLM.
- Experience with performance modelling, profiling, debug, and code optimization or architectural knowledge of CPU and GPU.
- We strongly encourage including sample projects (e.g. Github) that demonstrate the qualifications above.