
Senior HPC and AI Networking Performance Engineer
NVIDIA
full-time
Posted on:
Location Type: Office
Location: Shanghai • China
Visit company websiteExplore more
Job Level
Tech Stack
About the role
- Profile and analyze AI workloads on large GPUs and CPUs scale clusters for distributed Deep Learning LLM training and inference focusing at the communication patterns
- Benchmarking, Profiling, and Analyzing the performance to find bottlenecks and identify areas of improvement
- Implement performance analysis tools
- Collaborating with many teams from HW to SW to provide performance analysis insights
- Define performance test planning and set performance expectations
Requirements
- B.Sc in Computer Science or Software Engineering
- 8+ years of experience with high-performance Networking (RDMA, MPI, NCCL)
- Demonstrated Performance Analysis skills and methodologies.
- Experience with NVIDIA GPUs, CUDA library, deep learning frameworks like TensorFlow or PyTorch
- Fast and self-learning capabilities with strong analytical and problem solving skills.
- Programming Languages: Python, Bash and C languages
- Experience with Linux OS distros
- Team player with good communication and interpersonal skills.
Benefits
- Health insurance
- Retirement plans
- Professional development
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Performance AnalysisDeep LearningNVIDIA GPUsCUDATensorFlowPyTorchPythonBashCHigh-performance Networking
Soft Skills
Analytical skillsProblem solvingCommunicationInterpersonal skillsTeam playerSelf-learning
Certifications
B.Sc in Computer ScienceB.Sc in Software Engineering