Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
d-Matrix

Principal LLM Inference Engineer

d-Matrix

Principal LLM Inference Engineer optimizing generative AI architectures for D-Matrix. Responsible for end-to-end system deployment and collaboration with product teams.

Posted 7/1/2026full-timeSanta Clara • California • 🇺🇸 United StatesLead💰 $195,000 - $285,000 per yearWebsite

Tech Stack

Tools & technologies
Python

About the role

Key responsibilities & impact
  • Identify and prototype emerging LLM inference use cases suited to heterogeneous hardware deployments.
  • Build compelling proof-of-concept systems that demonstrate D-Matrix capabilities to customers, partners, and internal stakeholders.
  • Develop and tune custom kernels and operator-level optimizations to maximize throughput and minimize latency.
  • Drive quantization, sparsity, and batching strategies tailored to D-Matrix computational model.
  • Build and maintain inference runtimes, serving frameworks, and evaluation tooling.
  • Contribute to distributed inference systems: tensor/pipeline parallelism, disaggregated prefill/decode, KV-cache management.
  • Work closely with hardware architects to provide firmware and compiler teams with actionable inference workload insights.
  • Partner with product and business development to translate POCs into customer-facing demonstrations.
  • Contribute to technical publications, whitepapers, and open-source projects that advance D-Matrix visibility.

Requirements

What you’ll need
  • Bachelor’s degree in Computer Science, Electrical Engineering, or a related field, and 10+ years of relevant engineering experience; or equivalent demonstrated experience.
  • Master’s or PhD in Computer Science, Electrical Engineering, or a related field preferred, with 6+ years of relevant industry experience.
  • Strong proficiency in Python and C/C++.
  • Hands-on experience optimizing LLM inference — attention kernels, KV cache, batching strategies, quantization (INT8/FP8/INT4).
  • Experience with at least one major inference framework (vLLM, SGLang, TensorRT-LLM, ONNX Runtime, or similar) at a contributor level.
  • Familiarity with GPU kernel programming (CUDA/Triton) and performance profiling tools.

Benefits

Comp & perks
  • Competitive compensation
  • Equity
  • Bonus

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Custom Kernel DevelopmentOperator-Level OptimizationQuantization StrategiesSparsity TechniquesBatching StrategiesDistributed Inference SystemsPerformance ProfilingAttention KernelsKV Cache ManagementTensor/Pipeline Parallelism