HPC AI/ML Platform Manager

Ford Motor Company

full-time

Posted on: 11/15/2025

Location Type: Remote

Location: Remote • Missouri • 🇺🇸 United States

✨ AI Apply

Mid-LevelSenior

GoKubernetesLinuxPythonPyTorch

About the role

Managing the team responsible for the engineering and operations of the AI/ML infrastructure and middleware
Includes CPU and GPU resources in both the HPC batch-based supercomputing environment as well as the HPC Kubernetes platform
Some hands-on work is expected as well as being an after hours escalation contact for the regular on-call team
Oversee the integration of related HPC infrastructure (e.g. HPC storage, high speed interconnects, directory services, authentication, etc.)
Managing the team's Jira backlog including setting priorities that support application team deliverables
Assist in setting infrastructure platform strategy
Representing the service offering inside and outside of the broader organization including participation in status meetings with key customers
Supporting application team's needs and requests including evaluating and supporting new components
Manage team performance including mentoring, performance reviews, and general coaching

Bachelor's Degree or equivalent professional experience
4 years of experience managing high-performance computing (HPC) and AI/ML infrastructure platforms, including Kubernetes and GPU batch clusters.
Proven ability to quickly learn and adapt complex technologies
Strong foundation in Kubernetes, Linux, networking, containers
Proficient with GIT and ability to code in Go or Python
Demonstrated ability to manage a high-tech team
Experience with Agile processes
Be a self-starter
Have the ability to develop and communicate a strong POV
Good people and communication skills
Having a passion for and being energized by work on infrastructure and middleware technologies
Basic understanding of AI/ML model training frameworks (e.g. PyTorch)
Familiarity with HPC supercomputing environments
Knowledge of architecture frameworks, patterns, and reference architectures
Ability to work with vendors to coordinate installations, resolve issues, manage the PO process
Capability to work on multiple projects simultaneously
Self-researcher with ability to research a technology and provide insight to project team
Ability to present on technical topics

Benefits

Immediate medical, dental, and prescription drug coverage
Flexible family care, parental leave, new parent ramp-up programs, subsidized back-up child care and more
Vehicle discount program for employees and family members, and management leases
Tuition assistance
Established and active employee resource groups
Paid time off for individual and team community service
A generous schedule of paid holidays, including the week between Christmas and New Year’s Day
Paid time off and the option to purchase additional vacation time.

Tip: use these terms in your resume and cover letter to boost ATS matches.

high-performance computing (HPC)AI/ML infrastructureKubernetesGPU batch clustersLinuxnetworkingcontainersGITGoPython

team managementmentoringperformance reviewscoachingcommunication skillsself-starteradaptabilityproblem-solvingproject managementpresentation skills

Bachelor's Degree