Computer Vision Researcher – VLM

Niantic Spatial, Inc.

Computer Vision Researcher at Niantic Spatial developing spatial intelligence through LLMs. Leading research in multimodal AI and mentoring new researchers in London R&D hub.

Posted 6/29/2026full-timeLondon • 🇬🇧 United KingdomMid-LevelSeniorWebsite

Tech Stack

Tools & technologies

PyTorch

About the role

Key responsibilities & impact

Architect Semantic Grounding: Lead research into cross-modal grounding that connects 3D spatial features with language embeddings, enabling the LGM to "understand" object relationships and environmental context.
Scale "Understand" Capabilities: Develop and deploy algorithms for continuous semantics, allowing our 3D maps to evolve and improve their situational awareness as new ground-level and aerial data is ingested.
Agentic Frameworks: Build the "spatial brain" for Embodied AI, enabling robots, Drones and other Machines to move beyond simple navigation to mission-level reasoning.
Multimodal Benchmarking: Define the standards for measuring "spatial common sense" in LLMs, creating evaluations that test a model’s ability to interpret and operate within complex 3D scenes.
Technical Mentorship: Serve as the technical anchor for the London R&D hub, resolving architectural disagreements and mentoring the next generation of researchers in the fusion of 3D CV and NLP.
Collaborative Innovation: Partner with Product leads to ensure the "Understand" API delivers high business value for enterprise customers in robotics, logistics, and field operations.

Requirements

What you’ll need

PhD (or equivalent) in Computer Vision, Machine Learning, or Robotics with a focus on Multimodal/Semantic understanding.
4+ years of experience in ML research, with a proven track record of shipping models that bridge 3D Vision and Language.
Expert knowledge of 3D Geometry (SfM, SLAM, VPS) and Transformer-based architectures (VLMs).
Multiple first-author publications at top-tier venues (CVPR, NeurIPS, ICLR) focusing on VLMs, scene understanding or semantic segmentation.
Ability to write production-quality research code in PyTorch or JAX and manage large-scale data pipelines.
Required In-Office Days: 3 days per week
Experience with Gaussian Splatting or NeRFs for semantic scene representation.
Background in robotics (ROS) or building agentic systems that interact with physical environments.
Experience with "open-set" recognition and Zero-Shot learning.

Benefits

Comp & perks

Health insurance
Flexible work arrangements
Professional development opportunities

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Machine Learning3D VisionSemantic UnderstandingTransformer-Based ArchitecturesScene UnderstandingSemantic SegmentationOpen-Set RecognitionZero-Shot LearningContinuous SemanticsData Pipeline Management

Soft Skills

Technical MentorshipCollaborative Innovation