Metsi Technologies

HPC Solution Architect

Metsi Technologies

full-time

Posted on:

Location Type: Remote

Location: MassachusettsTexasUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $210,000 - $265,000 per year

Job Level

About the role

  • Lead customer architecture & design, translating HPC/AI workload requirements into scalable cluster architectures
  • Deploy and operationalize clusters using Omnia or similar automation
  • Build and maintain provisioning workflows (OpenCHAMI-based or equivalent)
  • Serve as Tier-3 engineering escalation, troubleshooting complex provisioning, scheduling, GPU, networking, and performance issues
  • Contribute to open source and customer enablement through code contributions, documentation, workshops, runbooks, templates, and field readiness materials

Requirements

  • 8+ years engineering large-scale HPC and distributed infrastructure
  • Strong knowledge of cluster architecture, schedulers, and provisioning workflows
  • Deep experience with RHEL/Rocky/Ubuntu
  • Hands-on cluster deployments using open-source toolchains, Omnia, and OpenCHAMI
  • Production experience with Slurm and/or Kubernetes
  • Proficient with Docker/Podman, OpenTelemetry pipelines, and telemetry instrumentation
  • Strong skills in Ansible, Python, Bash
  • Expertise with Prometheus and Grafana dashboards
Benefits
  • Health insurance
  • Retirement plans
  • Professional development opportunities
  • Paid time off
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
HPCdistributed infrastructurecluster architectureschedulersprovisioning workflowsRHELRockyUbuntuSlurmKubernetes
Soft Skills
troubleshootingengineering escalationcustomer enablementdocumentationworkshopsrunbooksfield readiness