
HPC Solution Architect
Metsi Technologies
full-time
Posted on:
Location Type: Remote
Location: Massachusetts • Texas • United States
Visit company websiteExplore more
Salary
💰 $210,000 - $265,000 per year
About the role
- Lead customer architecture & design, translating HPC/AI workload requirements into scalable cluster architectures
- Deploy and operationalize clusters using Omnia or similar automation
- Build and maintain provisioning workflows (OpenCHAMI-based or equivalent)
- Serve as Tier-3 engineering escalation, troubleshooting complex provisioning, scheduling, GPU, networking, and performance issues
- Contribute to open source and customer enablement through code contributions, documentation, workshops, runbooks, templates, and field readiness materials
Requirements
- 8+ years engineering large-scale HPC and distributed infrastructure
- Strong knowledge of cluster architecture, schedulers, and provisioning workflows
- Deep experience with RHEL/Rocky/Ubuntu
- Hands-on cluster deployments using open-source toolchains, Omnia, and OpenCHAMI
- Production experience with Slurm and/or Kubernetes
- Proficient with Docker/Podman, OpenTelemetry pipelines, and telemetry instrumentation
- Strong skills in Ansible, Python, Bash
- Expertise with Prometheus and Grafana dashboards
Benefits
- Health insurance
- Retirement plans
- Professional development opportunities
- Paid time off
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
HPCdistributed infrastructurecluster architectureschedulersprovisioning workflowsRHELRockyUbuntuSlurmKubernetes
Soft Skills
troubleshootingengineering escalationcustomer enablementdocumentationworkshopsrunbooksfield readiness