
AI Research Engineer, Computer Vision
Cantina
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $170,000 - $210,000 per year
About the role
- Build and maintain end-to-end data pipelines for large-scale image and video datasets: collection, filtering, augmentation, conditioning alignment, and efficient storage/sampling.
- Implement model architectures (diffusion, autoregressive, flow-based, diffusion transformers, etc.) and maintain high-throughput PyTorch training loops for large-scale image and video diffusion models.
- Run and manage large-scale training experiments on multi-GPU and multi-node setups (DDP, FSDP, DeepSpeed). Debug training instabilities, loss spikes, and convergence issues.
- Apply quantization, pruning, and knowledge distillation techniques to compress models without sacrificing quality.
- Collaborate with researchers and translate state-of-the-art research papers into working implementations in our internal codebase (e.g., new attention mechanisms, sampling schedules, or conditioning methods).
- Build and maintain evaluation pipelines of image quality, video consistency, and perceptual metrics.
- Set up and maintain human annotation and evaluation pipelines using services like AWS GroundTruth.
- Profile and optimize training speed, GPU memory utilization, and iteration time. Implement inference optimizations to reduce latency and compute cost.
- Work with acceleration toolchains such as torch.compile, Triton, TensorRT, or ONNX where appropriate
Requirements
- 2–5 years of hands-on experience building and training ML systems, with strong ownership of results
- Fluency in PyTorch: comfortable reading, writing, and debugging both training and inference code.
- Experience training or fine-tuning generative models (diffusion models, transformers, VAEs, or similar) from scratch or near-scratch
- Solid understanding of distributed training workflows and practical debugging of large training runs
- Demonstrated ability to read and implement AI research papers in computer vision. Familiarity with cutting-edge computer vision models and research literature in the image and video domain.
- Experience building data pipelines for large-scale image or video datasets
- Strong debugging skills: comfortable diagnosing both engineering bugs and training failures
- Strong engineering mindset: writing clean, reliable, debuggable code; profiling tools; handling numerical issues at scale.
Benefits
- Competitive salary and generous company equity
- Medical, dental, and vision insurance – 99.99% of premiums covered by Cantina
- 42 days of paid time off, including:
- 15 PTO days
- 10 sick days
- 15 company holidays
- 2 floating holidays
- Generous parental leave & fertility support
- 401(k) retirement savings plan
- Lifestyle spending account – $500/month to use however you’d like
- Complimentary lunch and snacks for in-office employees
- One Medical membership, and more!
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
data pipelinesPyTorchmodel architecturesquantizationpruningknowledge distillationdistributed trainingdebugginggenerative modelscomputer vision
Soft Skills
ownership of resultscollaborationstrong debugging skillsengineering mindset