
Principal Engineer – Service Delivery
Metsi Technologies
full-time
Posted on:
Location Type: Remote
Location: India
Visit company websiteExplore more
Job Level
Tech Stack
About the role
- Provide world-class delivery support to our customers
- Deploy, configure, and validate GPU‑accelerated compute clusters for AI, ML, and HPC with NVIDIA Base Command Manager
- Perform benchmarking with HPL GPU, HPL MxP, STREAM, NCCL, RCCL, and related tools
- Produce as-built documentation, performance reports, and share best practices amongst the team
- Configure and secure RHEL, Ubuntu, Rocky for GenAI or HPC workloads
- Constantly learn and work with the latest GenAI platforms and infrastructure
Requirements
- 7+ years with HPC or GenAI clusters, GPU based systems, AI infrastructure, or related fields
- Deep hands‑on experience with GPU deployment, configuration, and multi-node testing
- Proficiency with benchmarking tools: HPL, STREAM, NCCL, RCCL, MxP
- Red Hat certification (RHCSA/RHCE) or 7+ years of relevant RH distros experience
- Experience with GenAI/HPC networking (InfiniBand and/or RoCE)
- Bachelor’s degree (desirable)
- Strong proven ability to lead sub-teams (desirable)
Benefits
- Health insurance
- Flexible working hours
- Professional development
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
GPU deploymentGPU configurationmulti-node testingbenchmarking toolsHPLSTREAMNCCLRCCLGenAIHPC
Soft Skills
leadership
Certifications
Red Hat certificationRHCSARHCE