FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesPythonPyTorchRust
About the role
Key responsibilities & impact- You’ll design, implement, and optimize large-scale machine learning systems for training
- You’ll improve all aspects of performance, including GPU utilization, communication overhead, and memory efficiency.
- You’ll partner with research and modeling teams to align systems with algorithmic needs.
- You’ll evaluate and apply best practices for distributed training using industry-leading frameworks.
- You’ll dive deep into low-level optimization, including custom CUDA or Triton kernels.
- You’ll debug, profile, and fine-tune training workflows to unlock new levels of scalability.
Requirements
What you’ll need- Strong background in LLMs, multimodal AI, or diffusion models.
- Proficiency in Python.
- Familiarity with a system programming language (e.g. C++ or Rust) is a plus.
- Deep knowledge of PyTorch or JAX as well as libraries such as Megatron-LM, NeMo, or DeepSpeed.
- Familiarity with common optimization techniques such as FSDP/ZeRO, gradient checkpointing, or low-precision data types.
- Hands-on experience writing custom GPU kernels in CUDA or Triton.
- Excellent communication and problem-solving skills, incl. full proficiency in English.
Benefits
Comp & perks- Employees can work remotely
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
machine learningPythonC++RustPyTorchJAXMegatron-LMNeMoDeepSpeedCUDA
Soft Skills
communicationproblem-solving
