Cohere

Technical Staff Member, Model Efficiency

Cohere

full-time

Posted on:

Location Type: Remote

Location: New YorkUnited States

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Work across the inference stack to improve core performance metrics
  • Dive deep into model execution
  • Identify bottlenecks and develop innovative optimizations
  • Collaborate closely with modeling and systems teams
  • Experiment, measure, and ship improvements that accelerate inference
  • Build expertise in advanced performance techniques, including GPU/CUDA optimizations, kernel-level improvements, and model execution strategies for MoE and large-scale architectures

Requirements

  • 5+ years of experience writing high-performance, production-quality code
  • Strong programming skills in C++ or Python (Rust/Go also welcome)
  • Experience working with large language models and familiarity with the LLM inference ecosystem (e.g., vLLM, SGLang, etc.)
  • Ability to diagnose and resolve performance bottlenecks across the model execution stack
  • A strong bias for action — you ship fast, measure impact, and iterate
  • It’s a big plus if you have experience with GPU programming, CUDA, or low-level systems optimization
  • Language modeling with transformers (MoE, speculative decoding, KV-cache optimizations)
  • Scaling performance-critical distributed systems (e.g., computation, search, storage)
Benefits
  • An open and inclusive culture and work environment
  • Work closely with a team on the cutting edge of AI research
  • Weekly lunch stipend, in-office lunches & snacks
  • Full health and dental benefits, including a separate budget to take care of your mental health
  • 100% Parental Leave top-up for up to 6 months
  • Personal enrichment benefits towards arts and culture, fitness and well-being, quality time, and workspace improvement
  • Remote-flexible, offices in Toronto, New York, San Francisco, London and Paris, as well as a co-working stipend
  • 6 weeks of vacation (30 working days!)
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
C++PythonRustGoGPU programmingCUDAperformance optimizationlarge language modelstransformersdistributed systems
Soft Skills
collaborationproblem-solvingaction-orientedmeasurementiteration