
On-Device Machine Learning Engineer
webAI™
full-time
Posted on:
Location Type: Hybrid
Location: Austin • Texas • 🇺🇸 United States
Visit company websiteJob Level
Mid-LevelSenior
About the role
- On-device model optimization and deployment
- Convert, optimize, and deploy models to run efficiently on-device using Core ML and/or MLX.
- Implement quantization strategies (e.g., 8-bit / 4-bit where applicable), compression, pruning, distillation, and other techniques to meet performance targets.
- Profile and improve model execution across compute backends (CPU/GPU/Neural Engine where relevant), and reduce memory footprint.
- Local RAG + memory systems
- Build and optimize local retrieval pipelines (embeddings, indexing, caching, ranking) that work offline and under tight resource constraints.
- Implement local memory systems (short/long-term) with careful attention to privacy, durability, and performance.
- Collaborate with product/design to translate “memory” behavior into concrete technical architectures and measurable quality targets.
- Model lifecycle on consumer hardware
- Own the on-device model lifecycle: packaging, versioning, updates, rollback strategies, on-device A/B testing approaches, telemetry, and quality monitoring.
- Build robust evaluation and regression suites that reflect real device constraints and user workflows.
- Ensure models degrade gracefully (low-power mode, thermals, backgrounding, OS interruptions).
- Performance, reliability, and user experience
- Treat battery, thermal, and latency as first-class product requirements: instrument, benchmark, and optimize continuously.
- Design inference pipelines and scheduling strategies that respect app responsiveness, animations, and UI smoothness.
- Partner with platform engineers to integrate ML into production apps with clean APIs and stable runtime behavior.
Requirements
- Strong experience shipping ML features into production, ideally including mobile / edge / consumer devices.
- Hands-on proficiency with Core ML and/or MLX, and the practical realities of running models locally.
- Solid understanding of quantization and optimization techniques for inference (accuracy/perf tradeoffs, calibration, benchmarking).
- Experience building or operating retrieval systems (embedding generation, vector search/indexing, caching strategies)—especially under resource constraints.
- Fluency in performance engineering: profiling, latency breakdowns, memory analysis, and tuning on real devices.
- Strong software engineering fundamentals: maintainable code, testing, CI, and debugging across complex systems.
- Nice to Have:
- Experience with on-device LLMs, multimodal models, or real-time interactive ML features.
- Familiarity with Metal / GPU compute, or performance tuning of ML workloads on Apple platforms.
- Experience designing privacy-preserving personalization and memory (local-first data handling, encryption, retention policies).
- Experience building developer tooling for model packaging, benchmarking, and release management.
- Prior work on offline-first architectures, edge inference, or battery/thermal-aware scheduling.
Benefits
- Competitive salary and performance-based incentives.
- Comprehensive health, dental, and vision benefits package.
- 401k Match (US-based only)
- $200/mos Health and Wellness Stipend
- $400/year Continuing Education Credit
- $500/year Function Health subscription (US-based only)
- Free parking, for in-office employees
- Unlimited Approved PTO
- Parental Leave for Eligible Employees
- Supplemental Life Insurance
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
on-device model optimizationCore MLMLXquantizationcompressionpruningdistillationembedding generationvector searchperformance engineering
Soft skills
collaborationcommunicationproblem-solvingattention to detailadaptability