
Technical Staff Member, Model Efficiency
Cohere
full-time
Posted on:
Location Type: Remote
Location: New York • United States
Visit company websiteExplore more
Job Level
Tech Stack
About the role
- Work across the inference stack to improve core performance metrics
- Dive deep into model execution
- Identify bottlenecks and develop innovative optimizations
- Collaborate closely with modeling and systems teams
- Experiment, measure, and ship improvements that accelerate inference
- Build expertise in advanced performance techniques, including GPU/CUDA optimizations, kernel-level improvements, and model execution strategies for MoE and large-scale architectures
Requirements
- 5+ years of experience writing high-performance, production-quality code
- Strong programming skills in C++ or Python (Rust/Go also welcome)
- Experience working with large language models and familiarity with the LLM inference ecosystem (e.g., vLLM, SGLang, etc.)
- Ability to diagnose and resolve performance bottlenecks across the model execution stack
- A strong bias for action — you ship fast, measure impact, and iterate
- It’s a big plus if you have experience with GPU programming, CUDA, or low-level systems optimization
- Language modeling with transformers (MoE, speculative decoding, KV-cache optimizations)
- Scaling performance-critical distributed systems (e.g., computation, search, storage)
Benefits
- An open and inclusive culture and work environment
- Work closely with a team on the cutting edge of AI research
- Weekly lunch stipend, in-office lunches & snacks
- Full health and dental benefits, including a separate budget to take care of your mental health
- 100% Parental Leave top-up for up to 6 months
- Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
- Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
- 6 weeks of vacation (30 working days!)
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
C++PythonRustGoGPU programmingCUDAperformance optimizationlarge language modelstransformersdistributed systems
Soft Skills
collaborationproblem-solvingaction-orientedmeasurementiteration