
HPC AI/ML Platform Manager
Ford Motor Company
full-time
Posted on:
Location Type: Remote
Location: Remote • Missouri • 🇺🇸 United States
Visit company websiteJob Level
Mid-LevelSenior
Tech Stack
GoKubernetesLinuxPythonPyTorch
About the role
- Managing the team responsible for the engineering and operations of the AI/ML infrastructure and middleware
- Includes CPU and GPU resources in both the HPC batch-based supercomputing environment as well as the HPC Kubernetes platform
- Some hands-on work is expected as well as being an after hours escalation contact for the regular on-call team
- Oversee the integration of related HPC infrastructure (e.g. HPC storage, high speed interconnects, directory services, authentication, etc.)
- Managing the team's Jira backlog including setting priorities that support application team deliverables
- Assist in setting infrastructure platform strategy
- Representing the service offering inside and outside of the broader organization including participation in status meetings with key customers
- Supporting application team's needs and requests including evaluating and supporting new components
- Manage team performance including mentoring, performance reviews, and general coaching
Requirements
- Bachelor's Degree or equivalent professional experience
- 4 years of experience managing high-performance computing (HPC) and AI/ML infrastructure platforms, including Kubernetes and GPU batch clusters.
- Proven ability to quickly learn and adapt complex technologies
- Strong foundation in Kubernetes, Linux, networking, containers
- Proficient with GIT and ability to code in Go or Python
- Demonstrated ability to manage a high-tech team
- Experience with Agile processes
- Be a self-starter
- Have the ability to develop and communicate a strong POV
- Good people and communication skills
- Having a passion for and being energized by work on infrastructure and middleware technologies
- Basic understanding of AI/ML model training frameworks (e.g. PyTorch)
- Familiarity with HPC supercomputing environments
- Knowledge of architecture frameworks, patterns, and reference architectures
- Ability to work with vendors to coordinate installations, resolve issues, manage the PO process
- Capability to work on multiple projects simultaneously
- Self-researcher with ability to research a technology and provide insight to project team
- Ability to present on technical topics
Benefits
- Immediate medical, dental, and prescription drug coverage
- Flexible family care, parental leave, new parent ramp-up programs, subsidized back-up child care and more
- Vehicle discount program for employees and family members, and management leases
- Tuition assistance
- Established and active employee resource groups
- Paid time off for individual and team community service
- A generous schedule of paid holidays, including the week between Christmas and New Year’s Day
- Paid time off and the option to purchase additional vacation time.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
high-performance computing (HPC)AI/ML infrastructureKubernetesGPU batch clustersLinuxnetworkingcontainersGITGoPython
Soft skills
team managementmentoringperformance reviewscoachingcommunication skillsself-starteradaptabilityproblem-solvingproject managementpresentation skills
Certifications
Bachelor's Degree