Cantina

AI Research Engineer, Computer Vision

Cantina

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $170,000 - $210,000 per year

About the role

  • Build and maintain end-to-end data pipelines for large-scale image and video datasets: collection, filtering, augmentation, conditioning alignment, and efficient storage/sampling.
  • Implement model architectures (diffusion, autoregressive, flow-based, diffusion transformers, etc.) and maintain high-throughput PyTorch training loops for large-scale image and video diffusion models.
  • Run and manage large-scale training experiments on multi-GPU and multi-node setups (DDP, FSDP, DeepSpeed). Debug training instabilities, loss spikes, and convergence issues.
  • Apply quantization, pruning, and knowledge distillation techniques to compress models without sacrificing quality.
  • Collaborate with researchers and translate state-of-the-art research papers into working implementations in our internal codebase (e.g., new attention mechanisms, sampling schedules, or conditioning methods).
  • Build and maintain evaluation pipelines of image quality, video consistency, and perceptual metrics.
  • Set up and maintain human annotation and evaluation pipelines using services like AWS GroundTruth.
  • Profile and optimize training speed, GPU memory utilization, and iteration time. Implement inference optimizations to reduce latency and compute cost.
  • Work with acceleration toolchains such as torch.compile, Triton, TensorRT, or ONNX where appropriate

Requirements

  • 2–5 years of hands-on experience building and training ML systems, with strong ownership of results
  • Fluency in PyTorch: comfortable reading, writing, and debugging both training and inference code.
  • Experience training or fine-tuning generative models (diffusion models, transformers, VAEs, or similar) from scratch or near-scratch
  • Solid understanding of distributed training workflows and practical debugging of large training runs
  • Demonstrated ability to read and implement AI research papers in computer vision. Familiarity with cutting-edge computer vision models and research literature in the image and video domain.
  • Experience building data pipelines for large-scale image or video datasets
  • Strong debugging skills: comfortable diagnosing both engineering bugs and training failures
  • Strong engineering mindset: writing clean, reliable, debuggable code; profiling tools; handling numerical issues at scale.
Benefits
  • Competitive salary and generous company equity
  • Medical, dental, and vision insurance – 99.99% of premiums covered by Cantina
  • 42 days of paid time off, including:
  • 15 PTO days
  • 10 sick days
  • 15 company holidays
  • 2 floating holidays
  • Generous parental leave & fertility support
  • 401(k) retirement savings plan
  • Lifestyle spending account – $500/month to use however you’d like
  • Complimentary lunch and snacks for in-office employees
  • One Medical membership, and more!
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
data pipelinesPyTorchmodel architecturesquantizationpruningknowledge distillationdistributed trainingdebugginggenerative modelscomputer vision
Soft Skills
ownership of resultscollaborationstrong debugging skillsengineering mindset