
Technical Product Manager – Mission Control
Nebius Group
full-time
Posted on:
Location Type: Remote
Location: Netherlands
Visit company websiteExplore more
Tech Stack
About the role
- Own reliability and performance opportunities across the Nebius stack: from bare metal to applications.
- Define product direction end-to-end: problem discovery → design → delivery → adoption.
- Drive cross-functional execution across compute, networking, storage, observability, platform, and hardware teams.
- Lead deep problem research using customer interviews, analytics, workload studies, and logs investigations.
- Identify and prioritize bottlenecks affecting large-scale training/inference performance and stability.
- Translate advanced ML/infrastructure research into practical, scalable product capabilities.
- Define and operationalize product metrics for cluster experience (e.g. reliability, efficiency, latency-to-start, utilization, throughput).
Requirements
- 3–5+ years of experience in one or more of: product management, HPC, ML infrastructure/MLOps, distributed systems, SRE, cloud architecture, or GPU platforms.
- Strong technical foundation in distributed systems, cloud infrastructure, or ML platforms.
- Hands-on familiarity with ML orchestration environments (e.g. Slurm, Kubernetes, Ray, or similar).
- Experience delivering technically complex initiatives with multiple engineering teams.
- Strong communication skills and ability to influence engineering, research, and customer stakeholders.
- Experience using analytics and data to prioritize roadmap decisions.
- High ownership, learning speed, and comfort in fast-evolving AI infrastructure environments.
Benefits
- Competitive salary and comprehensive benefits package.
- Opportunities for professional growth within Nebius.
- Flexible working arrangements.
- A dynamic and collaborative work environment that values initiative and innovation.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
product managementHPCML infrastructureMLOpsdistributed systemsSREcloud architectureGPU platformsML orchestrationanalytics
Soft Skills
strong communicationinfluencehigh ownershiplearning speedadaptability