Inference Optimization Engineer – Local, Edge Runtime

Intel Corporation

Inference Optimization Engineer optimizing inference engines for local and edge environments at Intel. Focus on model performance enhancement and efficient hardware utilization.

Posted 6/16/2026full-timeSanta Clara • Arizona, California, Oregon • 🇺🇸 United StatesMid-LevelSenior💰 $170,500 - $315,490 per yearWebsite

ATS Keywords

Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills

C++PythonLLM inferenceKV cacheGPU programmingVulkanCUDASYCLMetallow-level debugging

Tools & Technologies

llama.cppvLLMggmlLinuxbuild systemsinference engines

Certifications & Qualifications

BS in Computer ScienceMS in Computer ScienceBS in Electrical EngineeringMS in Electrical EngineeringBS in MathematicsMS in Mathematics

Industry Keywords

performance optimizationquantization strategyedge environmentsinteractive agent workloadsbenchmarkingopen-source contributions

Tech Stack

Tools & technologies

C++LinuxPython

About the role

Key responsibilities & impact

Optimize inference engines (llama.cpp, vLLM) for constrained local and edge environments
Profile and optimize local inference for latency, throughput, and memory on edge hardware
Tune KV cache, continuous batching, and scheduling for interactive agent workloads
Drive quantization strategy and validate quality impact
Cut CPU overhead and improve engine startup, model load, and lifecycle
Benchmark across hardware tiers and publish performance comparisons
Upstream fixes and patches to open-source engines

Requirements

What you’ll need

BS/MS in CS, EE, Math or related STEM field
5+ years software development background
Strong in C++ and/or Python; comfortable reading systems-level code
Understands how LLM inference works (attention, KV cache, decoding)
Has profiled and optimized real performance problems (CPU or GPU) and can prove the speedup
Linux, build systems, and low-level debugging expertise
Hands-on with llama.cpp, vLLM, ggml, or similar engines (preferred)
Experience with GPU / accelerator programming (Vulkan, CUDA, SYCL, Metal) or SIMD / CPU kernels (preferred)
Familiarity with quantization formats and their quality trade-offs (preferred)
Open-source contributions to inference engines (preferred)

Benefits

Comp & perks

Competitive pay
Stock bonuses
Health insurance
Retirement plans
Vacation