Salary
💰 $184,000 - $356,500 per year
Tech Stack
LinuxPyTorchTensorflow
About the role
- Lead, mentor, and grow your library engineering team and be responsible for the planning and execution of projects as well as the quality, and performance of your libraries.
- Participate in feature design and implementation as a technical leader.
- Interact with internal and external partners and researchers to understand their use cases and requirements.
- Collaborate with engineering teams, program and product management, and partners to define the product roadmap.
- Continuously review and identify improvement opportunities in established processes, infrastructure, and practices to ensure the teams are executing in the most efficient and transparent manner.
Requirements
- 10+ overall years of experience in the software industry with specialization in HPC networking or system software.
- 4+ years of management experience.
- BS, MS, or Ph.D. in CS, CE, EE (related technical field) or equivalent experience.
- Prior systems software or communication runtime or high performance networking software development experience with a successful track record of taking several complex software features or products through the full product life cycle.
- Strong understanding of computer system architecture, operating systems principles (aka systems software fundamentals), HW-SW interactions and performance analysis/optimizations.
- Excellent C/C++ programming and debugging skills in Linux.
- Experience balancing multiple projects with competing priorities.
- Flexibility to work and communicate effectively across different teams and timezones.
- Experience with parallel programming models (MPI, SHMEM) and at least one communication runtime (MPI, NCCL, NVSHMEM, OpenSHMEM, UCX, UCC).
- Experience with programming using CUDA, MPI, OpenMP, OpenACC, pthreads.
- Background with RDMA, high-performance networking technologies (InfiniBand, RoCE, Ethernet, EFA), network architecture and network topologies.
- Knowledge of HPC and ML/DL fundamentals.
- Experience with Deep Learning Frameworks such PyTorch, TensorFlow, etc.