FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Lead Systems HPC Engineer
Nebius Group. Focus on understanding system behavior across multiple layers, identifying performance bottlenecks, and driving improvements that shape how our clusters are built, operated, tuned, and validated.
Tech Stack
Tools & technologiesGoLinuxPython
About the role
Key responsibilities & impact- Focus on understanding system behavior across multiple layers, identifying performance bottlenecks, and driving improvements that shape how our clusters are built, operated, tuned, and validated.
- Investigate and troubleshoot performance issues of GPU cluster under real workloads (training and inference).
- Evaluate and integrate new hardware, system configurations and tuning approaches through software stack.
- Support complex performance-related escalations from internal teams and customers.
- Work closely with infrastructure, software engineering and hardware vendor teams (e.g. NVIDIA, Mellanox, Intel).
- Contribute to hardware and cluster qualification (acceptance), ensuring systems meet performance expectations.
Requirements
What you’ll need- 5+ years of professional experience in system-level software development (focused on performance optimization, low-level programming).
- 3+ years of hands-on experience with Linux systems (administration, troubleshooting, and performance tuning).
- In-depth understanding of server architecture, including PCIe devices, NICs, Linux OS/Kernel, and high-performance computing (HPC) systems.
- Strong proficiency in one or more performance-oriented programming languages (C/C++, Go, Python).
Benefits
Comp & perks- Health insurance: 100% company-paid medical, dental and vision coverage for employees and families.
- 401(k) plan: Up to 4% company match with immediate vesting.
- Parental leave: 20 weeks paid for primary caregivers, 12 weeks for secondary caregivers.
- Remote work reimbursement: Up to $85/month for mobile and internet.
- Disability & life insurance: Company-paid short-term, long-term and life insurance coverage.
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
system-level software developmentperformance optimizationlow-level programmingLinux systems administrationLinux troubleshootingperformance tuningserver architectureperformance-oriented programming languageshigh-performance computingGPU cluster performance
Soft Skills
problem-solvingcollaborationcommunicationtroubleshootinginvestigation