Salary
💰 $184,000 - $356,500 per year
About the role
- Design and develop a C++-based system to simplify and accelerate computing for unstructured sparsity in DL and HPC on NVIDIA GPUs
- Enable the system in languages and frameworks that are more commonly used in DL, such as Python and PyTorch
- Evaluate and improve the performance of the system on real-life applications
- Realize opportunities to improve library quality, performance and maintainability by writing effective and well-tested code for production use
- Work closely with product management and other internal and external partners to understand feature and performance requirements and contribute to technical roadmaps
Requirements
- BS, MS or PhD in Computer Science, Applied Math, or related field (or equivalent experience)
- 6+ years of overall experience in developing, debugging and optimizing high-performance software using C++ and parallel programming; ideally for sparse linear algebra applications and using CUDA, MPI, OpenMP, or equivalent technologies
- Experience with domain-specific language design and compiler optimizations, in particular sparse compilers (MLIR or TACO)
- Excellent C++, Python, and CUDA programming skills
- Strong collaboration, communication, and documentation habits and ideally experience with working in a globally distributed organization