Technical Staff Member – Edge Inference Engineer

Liquid AI

full-time

Posted on: 3/28/2026

Location Type: Hybrid

Location: San Francisco • California • United States

Visit company website

Explore more

Software Engineer jobs

✨ AI Apply

Apply

Job Level

Lead

Tech Stack

C++Rust

About the role

Implement and optimize inference kernels for CPU, NPU, and GPU architectures across diverse edge hardware
Develop quantization strategies (INT4, INT8, FP8) that maximize compression while preserving model quality under strict memory budgets
Contribute to llama.cpp and other open-source inference frameworks, including new model architectures (audio, vision)
Profile and optimize end-to-end inference pipelines to achieve sub-100ms time-to-first-token on target devices
Collaborate with ML researchers to understand model architectures and identify optimization opportunities specific to Liquid Foundation Models

Requirements

5+ years of experience in systems programming with strong C++ proficiency
Embedded software engineering experience or work on resource-constrained systems
Understanding of ML fundamentals at the linear algebra level (how matrix operations, attention, and quantization work)
Experience with hardware architecture concepts: cache hierarchies, memory bandwidth, SIMD/vectorization
Contributions to llama.cpp, ExecuTorch, or similar inference frameworks (nice-to-have)
Experience with Rust for systems programming (nice-to-have)
Background in custom accelerator development (TPU, NPU) or work at companies like SambaNova, Cerebras, Groq, or Google/Amazon accelerator teams (nice-to-have)
Quantitative degree (mathematics, physics, or similar) combined with engineering experience (nice-to-have)

Benefits

Competitive base salary with equity in a unicorn-stage company
We pay 100% of medical, dental, and vision premiums for employees and dependents
401(k) matching up to 4% of base pay
Unlimited PTO plus company-wide Refill Days throughout the year

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

C++Rustquantization strategiesinference kernelsend-to-end inference pipelinesmatrix operationsattention mechanismsSIMD/vectorizationembedded software engineeringcustom accelerator development