Ford Motor Company

HPC AI/ML Platform Manager

Ford Motor Company

full-time

Posted on:

Location Type: Remote

Location: Remote • Missouri • 🇺🇸 United States

Visit company website
AI Apply
Apply

Job Level

Mid-LevelSenior

Tech Stack

GoKubernetesLinuxPythonPyTorch

About the role

  • Managing the team responsible for the engineering and operations of the AI/ML infrastructure and middleware
  • Includes CPU and GPU resources in both the HPC batch-based supercomputing environment as well as the HPC Kubernetes platform
  • Some hands-on work is expected as well as being an after hours escalation contact for the regular on-call team
  • Oversee the integration of related HPC infrastructure (e.g. HPC storage, high speed interconnects, directory services, authentication, etc.)
  • Managing the team's Jira backlog including setting priorities that support application team deliverables
  • Assist in setting infrastructure platform strategy
  • Representing the service offering inside and outside of the broader organization including participation in status meetings with key customers
  • Supporting application team's needs and requests including evaluating and supporting new components
  • Manage team performance including mentoring, performance reviews, and general coaching

Requirements

  • Bachelor's Degree or equivalent professional experience
  • 4 years of experience managing high-performance computing (HPC) and AI/ML infrastructure platforms, including Kubernetes and GPU batch clusters.
  • Proven ability to quickly learn and adapt complex technologies
  • Strong foundation in Kubernetes, Linux, networking, containers
  • Proficient with GIT and ability to code in Go or Python
  • Demonstrated ability to manage a high-tech team
  • Experience with Agile processes
  • Be a self-starter
  • Have the ability to develop and communicate a strong POV
  • Good people and communication skills
  • Having a passion for and being energized by work on infrastructure and middleware technologies
  • Basic understanding of AI/ML model training frameworks (e.g. PyTorch)
  • Familiarity with HPC supercomputing environments
  • Knowledge of architecture frameworks, patterns, and reference architectures
  • Ability to work with vendors to coordinate installations, resolve issues, manage the PO process
  • Capability to work on multiple projects simultaneously
  • Self-researcher with ability to research a technology and provide insight to project team
  • Ability to present on technical topics
Benefits
  • Immediate medical, dental, and prescription drug coverage
  • Flexible family care, parental leave, new parent ramp-up programs, subsidized back-up child care and more
  • Vehicle discount program for employees and family members, and management leases
  • Tuition assistance
  • Established and active employee resource groups
  • Paid time off for individual and team community service
  • A generous schedule of paid holidays, including the week between Christmas and New Year’s Day
  • Paid time off and the option to purchase additional vacation time.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
high-performance computing (HPC)AI/ML infrastructureKubernetesGPU batch clustersLinuxnetworkingcontainersGITGoPython
Soft skills
team managementmentoringperformance reviewscoachingcommunication skillsself-starteradaptabilityproblem-solvingproject managementpresentation skills
Certifications
Bachelor's Degree