
Technical Staff Member – Edge Inference Engineer
Liquid AI
full-time
Posted on:
Location Type: Hybrid
Location: San Francisco • California • United States
Visit company websiteExplore more
Job Level
About the role
- Implement and optimize inference kernels for CPU, NPU, and GPU architectures across diverse edge hardware
- Develop quantization strategies (INT4, INT8, FP8) that maximize compression while preserving model quality under strict memory budgets
- Contribute to llama.cpp and other open-source inference frameworks, including new model architectures (audio, vision)
- Profile and optimize end-to-end inference pipelines to achieve sub-100ms time-to-first-token on target devices
- Collaborate with ML researchers to understand model architectures and identify optimization opportunities specific to Liquid Foundation Models
Requirements
- 5+ years of experience in systems programming with strong C++ proficiency
- Embedded software engineering experience or work on resource-constrained systems
- Understanding of ML fundamentals at the linear algebra level (how matrix operations, attention, and quantization work)
- Experience with hardware architecture concepts: cache hierarchies, memory bandwidth, SIMD/vectorization
- Contributions to llama.cpp, ExecuTorch, or similar inference frameworks (nice-to-have)
- Experience with Rust for systems programming (nice-to-have)
- Background in custom accelerator development (TPU, NPU) or work at companies like SambaNova, Cerebras, Groq, or Google/Amazon accelerator teams (nice-to-have)
- Quantitative degree (mathematics, physics, or similar) combined with engineering experience (nice-to-have)
Benefits
- Competitive base salary with equity in a unicorn-stage company
- We pay 100% of medical, dental, and vision premiums for employees and dependents
- 401(k) matching up to 4% of base pay
- Unlimited PTO plus company-wide Refill Days throughout the year
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
C++Rustquantization strategiesinference kernelsend-to-end inference pipelinesmatrix operationsattention mechanismsSIMD/vectorizationembedded software engineeringcustom accelerator development