Tech Stack
LinuxNode.jsPythonPyTorch
About the role
- Own end-to-end performance for distributed AI workloads across multi-node clusters
- Benchmark and tune open-source & industry workloads on current and future hardware
- Design and optimize distributed serving topologies and validation efforts
- Build crisp proof points comparing Cornelis Omni-Path to competing interconnects
- Instrument and visualize performance and evangelize best practices
Requirements
- B.S. in CS/EE/CE/Math or related
- 5–7+ years running AI/ML at cluster scale
- Proven ability to set up, run, and analyze AI benchmarks
- Hands-on with distributed training beyond single-GPU
- Practical experience across AI stacks & comms: PyTorch, DeepSpeed, Megatron-LM, etc.
- Comfortable with compilers and MPI stacks; Python + shell power user
- Familiarity with network architectures and Linux systems
- Excellent written and verbal communication
- Competitive compensation package including equity, cash, and incentives
- Health and retirement benefits
- Generous paid holidays
- 401(k) with company match
- Open Time Off (OTO) for regular full-time exempt employees
- Sick time, bonding leave, and pregnancy disability leave
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
AI workloadsML at cluster scaleAI benchmarksdistributed trainingPyTorchDeepSpeedMegatron-LMcompilersMPILinux systems
Soft skills
written communicationverbal communication
Certifications
B.S. in CSB.S. in EEB.S. in CEB.S. in Math