Salary
💰 $184,000 - $356,500 per year
Tech Stack
CloudKubernetesLinuxPython
About the role
- Design and develop software solutions for data center servers including Linux kernel modifications, device drivers, and system optimizations for GB200 and next-gen platforms.
- Lead hardware bring-up activities, BSP development, and hardware-software co-design for Cloud Service Provider deployments.
- Partner directly with CSPs to deliver technical solutions, co-develop & co-debug features and optimizations, and provide support during new product introductions.
- Collaborate with cross-functional teams in designing end-to-end solutions spanning firmware, OS, middleware, and applications with focus on AI/ML and HPC workloads.
- Perform advanced system debugging, root cause analysis, and performance optimization for large-scale data center environments.
- Collaborate with AE, FAE, and Solution Architect teams to deliver integrated customer solutions and technical documentation.
- Work at the intersection of hardware and software, driving technical solutions from concept through deployment.
Requirements
- Deep expertise in data center server architectures, HPC systems, and hardware-software co-design.
- Expert knowledge of Linux kernel internals, device drivers, communication protocols (PCIe, USB, Ethernet).
- Deep understanding of computer architecture, microprocessor concepts, and expert knowledge of ARM (aarch64) and x86 architectures.
- Deep understanding of NUMA architectures including memory topology, processor-memory locality, and performance optimization for multi-CPU systems in data center environments.
- Strong programming skills in C/C++, Python.
- Experience with virtualization, Kubernetes, and cloud-native architectures.
- Skilled in complex system-level debugging, performance analysis, and test design.
- BS or MS in Computer Engineering, Computer Science, or related field (or equivalent experience).
- 8-12 years of system software development experience.
- (Ways to stand out) Experience with GPU computing (CUDA), deep learning workloads.
- (Ways to stand out) Expertise in Out of Band and In-band management architectures.
- (Ways to stand out) Knowledge of Memory fabric and CXL architectures.