FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior Principal Software Engineer
Cerence Inc.Senior Principal Software Engineer at Cerence AI managing ML inference performance across multiple platforms. Collaborating to enhance performance and deployment of AI technologies in the automotive sector.
Posted 6/29/2026full-timeRemote • Massachusetts • 🇺🇸 United StatesSenior💰 $141,400 - $226,300 per yearWebsite
Tech Stack
Tools & technologiesC++
About the role
Key responsibilities & impact- Optimize and deploy high ‑ performance LLM inference pipelines
- Own inference runtimes across data center, edge, and embedded platforms
- Push model performance through quantization, kernel fusion, and cache optimization
- Drive latency and throughput improvements that directly impact production products
- Enable efficient, reliable deployment without external vendor dependency
- Build deep expertise and ownership of: vLLM TensorRT‑LLM llama.cpp QAIRT
- Extend and tune inference engines using custom CUDA kernels
- Adapt runtimes for constrained and embedded deployment environments
- Implement and evaluate quantization strategies: INT8, INT4, FP4, FP8, mixed precision AWQ GPTQ
- Balance accuracy, latency, memory footprint, and throughput
- Optimize key–value cache performance through: Paging Prefix caching Cache ‑ aware memory layout design
- Design and tune: Batching strategies Continuous batching Speculative decoding
Requirements
What you’ll need- Proven experience optimizing ML inference performance in production
- Deep understanding of GPU architecture and memory hierarchies
- Hands ‑ on experience with CUDA and low ‑ level performance tuning
- Experience deploying models beyond research environments
- Critical Technical Skills
- Inference engines: vLLM, TensorRT ‑ LLM, llama.cpp, QAIRT
- CUDA kernel development and profiling
- Quantization techniques: INT8/INT4/FP4/FP8, AWQ, GPTQ
- KV cache optimisation and memory layout design
- Latency optimisation: batching, speculative decoding, continuous batching
Benefits
Comp & perks- Annual bonus opportunity
- Insurance coverage (medical, dental, vision, life, and disability)
- Paid time off
- Paid holidays
- Company contribution to the RRSP (Registered Retirement Savings Plan)
- Equity awards for certain positions and levels
- Remote and/or hybrid work available depending on the position
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
LLM Inference OptimizationCUDA ProgrammingQuantization: INT8, INT4, FP4, FP8Kernel FusionCache OptimizationBatching StrategiesSpeculative DecodingKV Cache OptimizationMemory Layout DesignPerformance Tuning