NVIDIA

Senior AI Engineer – World Foundation Models

NVIDIA

full-time

Posted on:

Location Type: Remote

Location: CaliforniaUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $184,000 - $287,500 per year

Job Level

About the role

  • Research, implement, and validate model architecture and algorithm changes that improve video generation fidelity, with emphasis on human-centric quality (identity preservation, anatomy, motion coherence, and interaction realism)
  • Explore and prototype improvements across spatial multimodal modeling, modality alignment, flow-based or diffusion-based video generation, and neural rendering-inspired representations to improve controllability and long-horizon consistency
  • Improve training and inference efficiency through architectural and post-training techniques (compute/memory optimizations, distillation, pruning, and compression)
  • Define model training objectives that improve sim-to-real and real-to-sim generalization, especially for human motion, contact, and interaction dynamics across real-world and synthetic/simulation data
  • Develop detailed, domain-specific benchmarks for evaluating world foundation models, especially generation and understanding world models that reason about video, simulation, and physical environments
  • Translate research results into robust implementations like training code, production-grade checkpoints, model integrations, and demos that clearly showcase capability gains across teams

Requirements

  • PhD in Computer Science, Graphics, Computer Engineering, or a closely related field (or equivalent experience)
  • 8+ years of applied research and/or industry experience in vision, graphics, or adjacent ML domains (or equivalent experience)
  • 4+ years of direct experience designing, training, and evaluating generative models for image/video/audio, with strong fundamentals in modern deep learning
  • Hands-on experience improving generative models with a focus on perceptual quality and temporal stability, especially for generating humans
  • Advanced proficiency in Python, PyTorch, C++, and CUDA with strong research-engineering practices (reproducibility, testing, profiling, experiment tracking)
  • Experience training and debugging large models in multi-GPU and/or multi-node environments and distributed training workflows
  • Practical knowledge of inference/runtime bottlenecks and optimization techniques (e.g., batching, parallelism strategies, low-precision/quantization awareness, attention/KV-cache efficiency)
  • Strong “eye for quality” and interest in diagnosing visual artifacts (sharpness, texture detail, temporal stability, etc.) using perceptual metrics, human preference signals, or learned evaluators
Benefits
  • equity
  • comprehensive benefits package
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
model architecturealgorithm changesvideo generationgenerative modelsdeep learningPythonPyTorchC++CUDAoptimization techniques
Soft Skills
research-engineering practiceseye for qualitydiagnosing visual artifacts
Certifications
PhD in Computer ScienceComputer Engineering