Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Mirantis

Senior AI Infrastructure, Platform Operations Engineer

Mirantis

Senior AI Infrastructure Engineer at Mirantis managing complex AI infrastructure environments powered by NVIDIA GPUs and Kubernetes. Leading operational excellence and technical investigations in high-performance settings.

Posted 6/10/2026full-timeRemote • 🇵🇱 PolandSeniorWebsite

Tech Stack

Tools & technologies
CloudDistributed SystemsKubernetesLinux

About the role

Key responsibilities & impact
  • Lead the investigation and resolution of complex infrastructure, networking, and platform-related incidents.
  • Act as a senior escalation point for operational teams during critical service-impacting events.
  • Support large-scale NVIDIA GPU infrastructure and high-performance networking environments.
  • Troubleshoot complex Linux, Kubernetes, networking, storage, and hardware-related issues.
  • Analyze platform performance, capacity, stability, and reliability trends to proactively identify risks.
  • Lead root cause analysis activities and drive long-term corrective actions.
  • Collaborate with engineering teams, hardware vendors, and datacenter personnel to resolve complex technical challenges.
  • Participate in major incident management and service restoration activities.
  • Provide technical leadership for Kubernetes platform operations and supporting infrastructure services.
  • Drive improvements in platform reliability, observability, monitoring, and operational processes.
  • Identify opportunities to automate repetitive operational activities and improve operational efficiency.
  • Contribute to operational readiness reviews, infrastructure changes, upgrades, and service introductions.
  • Support the adoption and operation of AI-powered infrastructure services and operational capabilities through k0rdent AI.
  • Evaluate emerging technologies and operational practices to improve service delivery and platform resilience.
  • Mentor and support AI Infrastructure & Platform Operations Engineers.
  • Share technical knowledge through documentation, training sessions, and operational reviews.
  • Develop and maintain operational standards, runbooks, troubleshooting guides, and best practices.
  • Help define operational processes, escalation paths, and service reliability standards.
  • Act as a trusted technical advisor during operational planning and service improvement initiatives.

Requirements

What you’ll need
  • 7+ years of experience in infrastructure operations, platform operations, site reliability engineering, network operations, cloud operations, datacenter operations, or related technical roles.
  • Expert-level Linux administration and troubleshooting skills.
  • Strong networking expertise, including experience diagnosing complex performance, connectivity, and reliability issues.
  • Strong experience operating Kubernetes in production environments.
  • Experience supporting large-scale production infrastructure and distributed systems.
  • Proven experience leading technical investigations and managing complex incidents.
  • Experience performing root cause analysis and driving long-term operational improvements.
  • Strong understanding of observability, monitoring, and service reliability practices.
  • Excellent troubleshooting and analytical skills across multiple infrastructure domains.
  • Strong communication, collaboration, and stakeholder management skills.

Benefits

Comp & perks
  • Operate some of the most advanced AI infrastructure environments in production today.
  • Work with the latest NVIDIA GPU technologies, Kubernetes platforms, and high-performance networking environments.
  • Help define operational standards and reliability practices for next-generation AI infrastructure services.
  • Influence the adoption of AI-powered operational capabilities through k0rdent AI.
  • Work alongside highly skilled engineers solving complex infrastructure and platform challenges at scale.
  • Join a growing organisation investing heavily in AI infrastructure, platform services, and operational innovation.

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Linux administrationKubernetesnetworkingtroubleshootingroot cause analysisobservabilitymonitoringAI infrastructuredistributed systemsperformance analysis
Soft Skills
communicationcollaborationstakeholder managementanalytical skillstechnical leadershipmentoringproblem-solvingoperational efficiencyincident managementprocess improvement