Optimize TwelveLabs' video foundation models for deployment on model inference platforms across public clouds (AWS, Azure, GCP, OCI) and data platforms (Databricks, Snowflake)
Conduct experiments to benchmark and optimize model performance across inference stacks — measuring latency, throughput, and cost across different accelerator and serving configurations
Collaborate with platform partner engineering teams as a peer to resolve inference-level technical challenges and inform how their infrastructure evolves to support multimodal workloads
Work closely with TwelveLabs' core ML research team to ensure model architecture decisions account for multi-platform deployment requirements

Requirements

8+ years building ML systems in production, with deep experience in model serving, inference optimization, capacity planning, and GPU compute
Deep understanding of the full model inference stack — from model weights and tensor operations through serving runtimes to accelerator hardware
Designed production services using Python, Postgres, FastAPI, SQLAlchemy, Pydantic (and friends)
Strong hands-on experience with cloud infrastructure (AWS, GCP or Azure), Docker, Kubernetes, and distributed systems in real-world environments — specifically in the context of ML inference and model hosting capabilities
Defined technical roadmap and prioritization for large, ambiguous, cross-functional projects, driving high-impact technical decisions

Benefits

Full health, dental, and vision benefits
Extremely flexible PTO and parental leave policy. Office closed the week of Christmas and New Years.
VISA support where applicable

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

model servinginference optimizationcapacity planningGPU computePythonPostgresFastAPISQLAlchemyPydanticdistributed systems

Soft Skills

collaborationproblem-solvingtechnical decision-makingproject prioritization