NVIDIA

Solutions Architect, DGX Cloud

NVIDIA

full-time

Posted on:

Origin:  • 🇺🇸 United States • California

Visit company website
AI Apply
Manual Apply

Salary

💰 $148,000 - $235,750 per year

Job Level

Mid-LevelSenior

Tech Stack

AnsibleAWSAzureCloudDNSDockerGoogle Cloud PlatformGrafanaKubernetesLinuxMicroservicesPrometheusTerraform

About the role

  • Do you want to be part of the team that brings Artificial Intelligence (AI) emerging technology to the field? We are looking for a hardworking Solution Architect (SA) to join the DGX Cloud SA Segment Team. The mission of the DGX Cloud Segment team is to guide and enable the successful adoption at scale of DGX Cloud and NVIDIA AI Enterprise Software in production. NVIDIA DGX Cloud is an AI platform for developers, researchers, and enterprises, optimized for the demands of Generative AI. The DGX Cloud SA team is dedicated to shaping the future of DGX Cloud by actively gathering and incorporating partner feedback and product requirements. Our team will help optimize the onboarding process for NVIDIA Cloud Partners, ensuring fast time to insights and exceptional user experience. Work closely with DGX Cloud Partners, become their trusted technical advisor, advocate for their needs, and ensure they are successful in accomplishing their business goals with the platform. Accelerate NVIDIA Cloud Partner onboarding time, cluster manageability and reliability. Scale knowledge, reach, and opportunities by building and educating vertical teams and communities on DGX Cloud and NVIDIA Reference Architectures. Communicate to our Reference Architecture teams findings gathered from the field. Provide technical education and facilitate field product feedback to improve DGX Cloud. Enable partners to participate in the DGX Cloud Ecosystem with the goal of end-user satisfaction and increased sales.

Requirements

  • Strong foundational expertise, from a BS, MS, or Ph.D. degree in Engineering, Mathematics, Physics, Computer Science, Data Science (or equivalent experience) 5+ years of proven experience with one or more Cloud Service Providers (AWS, Azure, GCP or OCI), NVIDIA Cloud Partners (CoreWeave, Lambda Labs, Crusoe, etc) and cloud-native architectures and software. Demonstrated experience in technical leadership, strong understanding of NVIDIA technologies, and success in working with customers. Expertise with parallel filesystems (e.g. Lustre, GPFS, BeeGFS, WekaIO) and high-speed interconnects (InfiniBand, Omni Path, RoCE, and Gig-E). Strong coding and debugging skills, and demonstrated expertise in one or more of the following areas: Machine Learning, Deep Learning, Slurm, Kubernetes, MPI, MLOps, LLMOps, Ansible, Terraform, and other high-performance AI cluster solutions. Proficient in deploying GPU applications in Slurm, Kubernetes, docker, helm, registries Linux-based configuration management and monitoring solutions, system administration, OS installation, configuration, and troubleshooting Networking technologies (e.g. router, firewall, load balancer, DNS, VPN) for complex infrastructure configuration