
AI Systems Engineer – Inference Frameworks
adaption
full-time
Posted on:
Location Type: Hybrid
Location: San Francisco • California • United States
Visit company websiteExplore more
Tech Stack
About the role
- You’ll work directly with our founders to design and build the inference and optimization systems that power our core product.
- This role bridges research and production, combining deep exploration of inference techniques with hands-on ownership of scalable, high-performance serving infrastructure.
- You’ll own the full lifecycle of LLM inference—from experimentation and performance analysis to deployment and iteration in production—thriving in a zero-to-one environment and helping define the technical foundations of our inference stack.
- design and build our LLM inference stack from zero to one, exploring and implementing advanced techniques for low-latency, high-throughput serving of language and multimodal models.
- develop and optimize inference using modern frameworks (e.g., vLLM, SGLang, TensorRT-LLM), experimenting with batching strategies, KV-cache management, parallelism, and GPU utilization to push performance and cost efficiency.
- collaborate closely with founders and model developers to analyze bottlenecks across the stack, co-optimizing model execution, infrastructure, and deployment pipelines.
Requirements
- Strong experience building and optimizing LLM inference systems in production or research environments
- Hands-on expertise with inference frameworks such as vLLM, SGLang, TensorRT-LLM, or similar
- Deep performance mindset with experience in GPU-backed systems, latency/throughput optimization, and resource efficiency
- Solid understanding of transformer inference, serving architectures, and KV-cache–based execution
- Strong programming skills in Python; experience with CUDA, Triton, or C++ a plus
- Comfort working in ambiguous, zero-to-one environments and driving research ideas into production systems
- Nice to have: experience with model quantization or pruning, speculative decoding, multimodal inference, open-source contributions, or prior work in systems or ML research labs
Benefits
- Flexible work: In-person collaboration in the Bay Area, a distributed global-first team, and quarterly offsites.
- Adaption Passport: Annual travel stipend to explore a country you've never visited. We're building intelligence that evolves alongside you, so we encourage you to keep expanding your horizons.
- Lunch Stipend: Weekly meal allowance for take-out or grocery delivery.
- Well-Being: Comprehensive medical benefits and generous paid time off.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
LLM inference systemsperformance analysislatency optimizationthroughput optimizationGPU utilizationPythonCUDATritonC++model quantization
Soft Skills
collaborationproblem-solvingadaptabilityownershipcommunication