Senior HPC Cluster Engineer

Nebius Group

full-time

Posted on: 4/9/2026

Location Type: Remote

✨ AI Apply

About the role

Tuning the performance of GPU clusters and InfiniBand networks to ensure optimal operation in HPC and GPU-based environments.
Analyzing and troubleshooting the root cause of issues related to GPUs and InfiniBand networks, and proposing corrective actions.
Integrating new hardware into the existing infrastructure, including support for new GPU hardware through software stacks like Kubernetes, QEMU, and KVM.
Enhancing automation systems for proactive monitoring, detecting, and resolving issues in GPU and InfiniBand environments.
Configuring and managing GPU devices and InfiniBand fabrics, ensuring efficient and reliable operation.

5+ years of professional experience in system-level software development (focused on performance optimization, low-level programming).
3+ years of hands-on experience with Linux systems (administration, troubleshooting, and performance tuning).
In-depth understanding of server architecture, including PCIe devices, NICs, Linux OS/Kernel, and high-performance computing (HPC) systems.
Strong proficiency in one or more performance-oriented programming languages (C/C++, Go, Python).

Benefits

Competitive salary and comprehensive benefits package.
Opportunities for professional growth within Nebius.
Flexible working arrangements.
A dynamic and collaborative work environment that values initiative and innovation.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

performance optimizationlow-level programmingLinux systemsserver architecturePCIe devicesNICshigh-performance computingCC++Python