Senior Principal Software Engineer

Cerence Inc.

Senior Principal Software Engineer at Cerence AI managing ML inference performance across multiple platforms. Collaborating to enhance performance and deployment of AI technologies in the automotive sector.

Posted 6/29/2026full-timeRemote • Massachusetts • 🇺🇸 United StatesSenior💰 $141,400 - $226,300 per yearWebsite

Tech Stack

Tools & technologies

C++

About the role

Key responsibilities & impact

Optimize and deploy high ‑ performance LLM inference pipelines
Own inference runtimes across data center, edge, and embedded platforms
Push model performance through quantization, kernel fusion, and cache optimization
Drive latency and throughput improvements that directly impact production products
Enable efficient, reliable deployment without external vendor dependency
Build deep expertise and ownership of: vLLM TensorRT‑LLM llama.cpp QAIRT
Extend and tune inference engines using custom CUDA kernels
Adapt runtimes for constrained and embedded deployment environments
Implement and evaluate quantization strategies: INT8, INT4, FP4, FP8, mixed precision AWQ GPTQ
Balance accuracy, latency, memory footprint, and throughput
Optimize key–value cache performance through: Paging Prefix caching Cache ‑ aware memory layout design
Design and tune: Batching strategies Continuous batching Speculative decoding

Requirements

What you’ll need

Proven experience optimizing ML inference performance in production
Deep understanding of GPU architecture and memory hierarchies
Hands ‑ on experience with CUDA and low ‑ level performance tuning
Experience deploying models beyond research environments
Critical Technical Skills
Inference engines: vLLM, TensorRT ‑ LLM, llama.cpp, QAIRT
CUDA kernel development and profiling
Quantization techniques: INT8/INT4/FP4/FP8, AWQ, GPTQ
KV cache optimisation and memory layout design
Latency optimisation: batching, speculative decoding, continuous batching

Benefits

Comp & perks

Annual bonus opportunity
Insurance coverage (medical, dental, vision, life, and disability)
Paid time off
Paid holidays
Company contribution to the RRSP (Registered Retirement Savings Plan)
Equity awards for certain positions and levels
Remote and/or hybrid work available depending on the position

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

LLM Inference OptimizationCUDA ProgrammingQuantization: INT8, INT4, FP4, FP8Kernel FusionCache OptimizationBatching StrategiesSpeculative DecodingKV Cache OptimizationMemory Layout DesignPerformance Tuning