
Senior AI Infrastructure Engineer
BMW Group
full-time
Posted on:
Location Type: Office
Location: Munich • Germany
Visit company websiteExplore more
Job Level
About the role
- Design, build and operate GPU-centric AI infrastructure (primarily NVIDIA) in on-premise and cloud environments, with a strong focus on performance, scalability and efficiency.
- Responsible for the architecture and operation of high-performance compute environments for distributed training and optimized model execution.
- Optimize compute, memory, and high-performance networking (e.g., InfiniBand, NCCL) to enable large-scale AI workloads in industrial contexts.
- Develop and operate infrastructure components such as scheduling and resource management systems (e.g., SLURM, Ray, Run:ai) to ensure efficient utilization of shared GPU resources.
- Create and maintain automated, reproducible infrastructure using modern tools (e.g., Docker, Kubernetes, Terraform, Ansible, CI/CD).
- Contribute to BMW-specific AI use cases by providing reliable and scalable infrastructure.
- Technical ownership of the AI infrastructure stack, defining best practices and mentoring less experienced engineers.
Requirements
- University degree in Computer Science, Computer/Electrical Engineering, or a related field
- Several years of professional experience (8–10 years) in industry building and operating AI and HPC infrastructures
- Solid hands-on experience with GPU systems (particularly NVIDIA), including drivers, CUDA, and performance optimization
- Experience with distributed systems and high-performance networking (e.g., InfiniBand, NCCL), as well as cloud environments (AWS, Azure) in addition to on-premise infrastructure
- Practical experience with resource scheduling and workload orchestration (e.g., SLURM, Ray, NVIDIA Run:ai)
- Extensive experience in infrastructure automation (e.g., Docker, Kubernetes, Terraform, Ansible, CI/CD) and proficiency in Python for infrastructure and system tooling
- Experience with training, fine-tuning, or deploying ML models in production, as well as exposure to industrial AI use cases (e.g., simulation, robotics, engineering), is a plus
Benefits
- Challenging projects
- Wide range of personal and professional development opportunities
- Attractive, fair, and performance-based compensation
- High job security
- Annual special payments such as holiday pay, Christmas bonus, and profit-sharing
- Flexible working hours, including six weeks of annual leave and overtime compensation
- Discounted BMW & MINI benefits
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
GPU systemsNVIDIACUDAperformance optimizationdistributed systemshigh-performance networkingresource schedulingworkload orchestrationinfrastructure automationPython
Soft Skills
technical ownershipmentoringbest practices