Apply faster with JobTailor
RecommendedApply
Apply your way
Use the standard apply link, or let JobTailor help you move faster.
- Apply directly in one click
- No setup required
- Best if you’re in a hurry
✨ Start AI Apply
Tech Stack
Tools & technologiesCloudGrafanaKubernetesLinuxPrometheusPython
About the role
Key responsibilities & impact- Advise on and help maintain large-scale computational and AI infrastructure
- Provide consultative guidance and perform hands-on solving across the full stack
- Assess customer environments and recommend optimized, production-ready Kubernetes-based container platforms
- Serve as a key technical resource: develop, refine, and document standard methodologies and operational guidelines
- Support Research & Development activities and engage in POCs/POVs to validate new features
- Create and deliver high-quality documentation, including runbooks, onboarding materials, and best-practice guides
- Act as the technical leader for assigned customer accounts, providing strategic guidance on DevOps and platform architecture
Requirements
What you’ll need- BS/MS/PhD in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, or related fields (or equivalent experience)
- 8+ years of professional experience in leading scalable cloud environments and automation engineering roles
- Shown understanding of networking fundamentals, data center architectures, and hands-on experience leading HPC/AI clusters
- Validated hands-on experience deploying, configuring, and optimizing NVIDIA GPU-accelerated infrastructure
- Extensive experience with Kubernetes for container orchestration, resource scheduling, scaling, and integration with GPU-accelerated and HPC environments
- Strong familiarity with HPC and AI technologies (CPUs, GPUs, high-speed interconnects) and supporting software stacks
- Deep knowledge of Linux (RedHat, Ubuntu), OS-level security, and protocols
- Proficiency in Python and Bash scripting, configuration management, and Infrastructure-as-Code tools
- Experience with observability stacks (Grafana, Loki, Prometheus)
- Strong background in crafting scalable solutions and providing consultative support to customers
Benefits
Comp & perks- Professional development opportunities
- Paid time off
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
KubernetesNVIDIA GPUHPCAILinuxPythonBash scriptingInfrastructure-as-Codeconfiguration managementobservability stacks
Soft Skills
consultative guidancetechnical leadershipstrategic guidancedocumentationcustomer supportproblem-solvingcommunicationcollaborationorganizationmentorship
