Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Cerence Inc.

Senior Principal Software Engineer

Cerence Inc.

Senior Principal Software Engineer at Cerence AI managing ML inference performance across multiple platforms. Collaborating to enhance performance and deployment of AI technologies in the automotive sector.

Posted 6/29/2026full-timeRemote • Massachusetts • 🇺🇸 United StatesSenior💰 $141,400 - $226,300 per yearWebsite

Tech Stack

Tools & technologies
C++

About the role

Key responsibilities & impact
  • Optimize and deploy high ‑ performance LLM inference pipelines
  • Own inference runtimes across data center, edge, and embedded platforms
  • Push model performance through quantization, kernel fusion, and cache optimization
  • Drive latency and throughput improvements that directly impact production products
  • Enable efficient, reliable deployment without external vendor dependency
  • Build deep expertise and ownership of: vLLM TensorRT‑LLM llama.cpp QAIRT
  • Extend and tune inference engines using custom CUDA kernels
  • Adapt runtimes for constrained and embedded deployment environments
  • Implement and evaluate quantization strategies: INT8, INT4, FP4, FP8, mixed precision AWQ GPTQ
  • Balance accuracy, latency, memory footprint, and throughput
  • Optimize key–value cache performance through: Paging Prefix caching Cache ‑ aware memory layout design
  • Design and tune: Batching strategies Continuous batching Speculative decoding

Requirements

What you’ll need
  • Proven experience optimizing ML inference performance in production
  • Deep understanding of GPU architecture and memory hierarchies
  • Hands ‑ on experience with CUDA and low ‑ level performance tuning
  • Experience deploying models beyond research environments
  • Critical Technical Skills
  • Inference engines: vLLM, TensorRT ‑ LLM, llama.cpp, QAIRT
  • CUDA kernel development and profiling
  • Quantization techniques: INT8/INT4/FP4/FP8, AWQ, GPTQ
  • KV cache optimisation and memory layout design
  • Latency optimisation: batching, speculative decoding, continuous batching

Benefits

Comp & perks
  • Annual bonus opportunity
  • Insurance coverage (medical, dental, vision, life, and disability)
  • Paid time off
  • Paid holidays
  • Company contribution to the RRSP (Registered Retirement Savings Plan)
  • Equity awards for certain positions and levels
  • Remote and/or hybrid work available depending on the position

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
LLM Inference OptimizationCUDA ProgrammingQuantization: INT8, INT4, FP4, FP8Kernel FusionCache OptimizationBatching StrategiesSpeculative DecodingKV Cache OptimizationMemory Layout DesignPerformance Tuning