Senior HPC and AI Networking Performance Engineer

NVIDIA

full-time

Posted on: 1/29/2026

Location Type: Office

Location: Shanghai • China

✨ AI Apply

About the role

Profile and analyze AI workloads on large GPUs and CPUs scale clusters for distributed Deep Learning LLM training and inference focusing at the communication patterns
Benchmarking, Profiling, and Analyzing the performance to find bottlenecks and identify areas of improvement
Implement performance analysis tools
Collaborating with many teams from HW to SW to provide performance analysis insights
Define performance test planning and set performance expectations

B.Sc in Computer Science or Software Engineering
8+ years of experience with high-performance Networking (RDMA, MPI, NCCL)
Demonstrated Performance Analysis skills and methodologies.
Experience with NVIDIA GPUs, CUDA library, deep learning frameworks like TensorFlow or PyTorch
Fast and self-learning capabilities with strong analytical and problem solving skills.
Programming Languages: Python, Bash and C languages
Experience with Linux OS distros
Team player with good communication and interpersonal skills.

Benefits

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Performance AnalysisDeep LearningNVIDIA GPUsCUDATensorFlowPyTorchPythonBashCHigh-performance Networking

Soft Skills

Analytical skillsProblem solvingCommunicationInterpersonal skillsTeam playerSelf-learning

Certifications

B.Sc in Computer ScienceB.Sc in Software Engineering