Tech Stack
CloudGrafanaKubernetesLinuxNode.jsPrometheusPythonPyTorchTensorflow
About the role
- Design and execute performance benchmarks across AI, HPC, and storage platforms
- Run and tune AI inference workloads using frameworks such as PyTorch, TensorFlow, Triton, NVIDIA NIMs, and vector databases
- Benchmark large-scale RAG pipelines including data ingestion, retrieval, and inference performance
- Profile and optimize MPI and multi-node distributed applications
- Compile and debug C/C++, Python, and CUDA-based codes across heterogeneous systems
- Generate automated test scripts and benchmarking workflows (e.g., with Bash, Python, or Slurm job scripts)
- Analyze and visualize results using Excel, Jupyter, or reporting tools; create comparison graphs and KPIs
- Write clear, concise performance reports for both technical and non-technical stakeholders
- Present findings internally and externally, translating results into architectural guidance for field engineers and sales teams
- Collaborate with system engineers, product managers, and partners to tune and improve software/hardware stack performance
- Validate and tune performance on storage systems including parallel file systems (e.g., Lustre, GPFS), object storage, and NVMe over Fabrics
- Contribute to internal tooling to automate test cycles and performance regression tracking
Requirements
- 7+ years of experience in performance engineering, benchmarking, or HPC/AI systems
- Deep experience with AI/ML and deep learning frameworks (PyTorch, TensorFlow, ONNX, Triton)
- Familiarity with NVIDIA NIMs and containerized model serving stacks
- Proven expertise with MPI, OpenMP, Slurm or similar schedulers in large-scale compute environments
- Solid understanding of file and storage systems (e.g., POSIX, Lustre, S3, NVMe-oF)
- Strong Linux skills (debugging, tuning, networking, storage stack)
- Proficiency in scripting (e.g., Bash, Python) for job orchestration and result parsing
- Ability to create clear Excel graphs and presentations from raw benchmark data
- Strong communication skills — able to convey technical results and trade-offs to engineering and customer-facing teams
- Preferred: Experience with RAG pipelines, vector databases (e.g., FAISS, Milvus, Qdrant)
- Familiarity with Kubernetes and CSI-based persistent volume systems
- Understanding of GPU profiling tools (Nsight, nvprof, PyTorch Profiler)
- Knowledge of telemetry and monitoring frameworks (e.g., Prometheus, Grafana)
- Prior work publishing or presenting technical performance results