Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Nebius Group

Senior ML Engineer – Token Factory

Nebius Group

Senior ML Engineer working on high-performance inference and fine-tuning platform at Nebius Cloud. Driving production speedups and optimising GPU workloads for AI applications.

Posted 7/1/2026full-timeRemote • 🇳🇱 NetherlandsSeniorWebsite

Tech Stack

Tools & technologies
CloudFlashPythonPyTorch

About the role

Key responsibilities & impact
  • Token Factory is a part of Nebius Cloud, one of the world's largest GPU clouds, running tens of thousands of GPUs.
  • We are building a high-performance inference and fine-tuning platform designed to push foundation models to their hardware limits.
  • Our mission is to maximize throughput, minimise latency, and optimise cost-per-token across tens of thousands of GPUs.
  • Inference Optimization: Identifying LLM inference bottlenecks to drive production speedups.
  • Squeezing the maximum performance for a wide range of LLM architectures at scale (e.g., GPT-OSS, Kimi K2.5, DeepSeek V3.1/V3.2, GLM-5).
  • Inference engines support: Implement novel speculative decoding architectures, optimise components of various LLM designs (dense/MoE, autoregressive/parallel), and contribute to open-source inference engines.
  • Low Precision Training & Inference: Design and productionise low-precision (FP8, NVFP4/MXFP4) training and inference pipelines with measurable gains in throughput and cost-efficiency.

Requirements

What you’ll need
  • A profound understanding of theoretical foundations of machine learning and transformer architecture.
  • Experience profiling GPU workloads using Nsight, PyTorch profiler, or similar tools
  • Understanding of GPU memory hierarchy and compute/memory tradeoffs
  • Familiarity with important ideas in LLM space, such as MHA, RoPE, KV-cache, Flash Attention, and quantisation
  • Understanding of performance aspects of large neural network training (sharding strategies, custom kernels, hardware features etc.)
  • Strong software engineering skills (we mostly use Python)
  • Deep experience with modern deep learning frameworks
  • Proficiency in contemporary software engineering approaches, including CI/CD, version control and unit testing
  • Strong communication and leadership abilities

Benefits

Comp & perks
  • Competitive compensation
  • Career growth and learning opportunities
  • Flexibility and ownership
  • Collaborative and innovative culture
  • Opportunity to work on impactful AI projects
  • International environment and talented teams

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Machine LearningTransformer ArchitecturePython ProgrammingLow Precision TrainingInference Optimization TechniquesProfiling ToolsNeural Network TrainingCI/CDVersion ControlUnit Testing
Soft Skills
Strong CommunicationLeadership Abilities