Metsi Technologies

Senior GenAI, High Performance Computing Delivery Engineer

Metsi Technologies

full-time

Posted on:

Location Type: Remote

Location: TexasUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $153,850 - $199,100 per year

Job Level

About the role

  • Deploy, configure, and validate GPU accelerated compute clusters for AI, ML, and HPC with NVIDIA Base Command Manager (Warewulf and OpenHPC knowledge are a plus)
  • Perform benchmarking with HPL GPU, HPL MxP, STREAM, NCCL, RCCL, OSU Microbenchmarks, and related tools
  • Produce as-built documentation, performance reports, and share best practices amongst the team.
  • Configure and secure RHEL, Ubuntu, Rocky for GenAI or HPC workloads
  • Work directly with customers onsite (travel both regionally and across the U.S.)

Requirements

  • 7+ years with HPC or GenAI clusters, GPU based systems, AI infrastructure, or related fields
  • Deep hands on experience with GPU deployment, configuration, and multi-node testing using NVIDIA Base Command Manager
  • Proficiency with benchmarking tools: HPL, STREAM, NCCL, RCCL, MxP, OSU Microbenchmarks
  • Red Hat certification (RHCSA/RHCE) or 7+ years of relevant RH distros experience
  • Experience with GenAI/HPC networking (InfiniBand and/or RoCE)
  • Experience working in Linux based parallel computing environments at scale
  • Experience with containers/orchestration (Docker, Singularity/Apptainer, Kubernetes, Slurm)
  • Ability to travel up to 70% of the time across the U.S. as needed for projects
  • Strong customer facing and communication skills
Benefits
  • Health insurance
  • Paid time off
  • Flexible work arrangements
  • Professional development
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
GPU deploymentconfigurationmulti-node testingbenchmarking toolsHPLSTREAMNCCLRCCLLinux based parallel computingcontainers/orchestration
Soft Skills
customer facing skillscommunication skills
Certifications
Red Hat certificationRHCSARHCE