
Senior GenAI, High Performance Computing Delivery Engineer
Metsi Technologies
full-time
Posted on:
Location Type: Remote
Location: Texas • United States
Visit company websiteExplore more
Salary
💰 $153,850 - $199,100 per year
Job Level
Tech Stack
About the role
- Deploy, configure, and validate GPU accelerated compute clusters for AI, ML, and HPC with NVIDIA Base Command Manager (Warewulf and OpenHPC knowledge are a plus)
- Perform benchmarking with HPL GPU, HPL MxP, STREAM, NCCL, RCCL, OSU Microbenchmarks, and related tools
- Produce as-built documentation, performance reports, and share best practices amongst the team.
- Configure and secure RHEL, Ubuntu, Rocky for GenAI or HPC workloads
- Work directly with customers onsite (travel both regionally and across the U.S.)
Requirements
- 7+ years with HPC or GenAI clusters, GPU based systems, AI infrastructure, or related fields
- Deep hands on experience with GPU deployment, configuration, and multi-node testing using NVIDIA Base Command Manager
- Proficiency with benchmarking tools: HPL, STREAM, NCCL, RCCL, MxP, OSU Microbenchmarks
- Red Hat certification (RHCSA/RHCE) or 7+ years of relevant RH distros experience
- Experience with GenAI/HPC networking (InfiniBand and/or RoCE)
- Experience working in Linux based parallel computing environments at scale
- Experience with containers/orchestration (Docker, Singularity/Apptainer, Kubernetes, Slurm)
- Ability to travel up to 70% of the time across the U.S. as needed for projects
- Strong customer facing and communication skills
Benefits
- Health insurance
- Paid time off
- Flexible work arrangements
- Professional development
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
GPU deploymentconfigurationmulti-node testingbenchmarking toolsHPLSTREAMNCCLRCCLLinux based parallel computingcontainers/orchestration
Soft Skills
customer facing skillscommunication skills
Certifications
Red Hat certificationRHCSARHCE