webAI™

On-Device Machine Learning Engineer

webAI™

full-time

Posted on:

Location Type: Hybrid

Location: Austin • Texas • 🇺🇸 United States

Visit company website
AI Apply
Apply

Job Level

Mid-LevelSenior

About the role

  • On-device model optimization and deployment
  • Convert, optimize, and deploy models to run efficiently on-device using Core ML and/or MLX.
  • Implement quantization strategies (e.g., 8-bit / 4-bit where applicable), compression, pruning, distillation, and other techniques to meet performance targets.
  • Profile and improve model execution across compute backends (CPU/GPU/Neural Engine where relevant), and reduce memory footprint.
  • Local RAG + memory systems
  • Build and optimize local retrieval pipelines (embeddings, indexing, caching, ranking) that work offline and under tight resource constraints.
  • Implement local memory systems (short/long-term) with careful attention to privacy, durability, and performance.
  • Collaborate with product/design to translate “memory” behavior into concrete technical architectures and measurable quality targets.
  • Model lifecycle on consumer hardware
  • Own the on-device model lifecycle: packaging, versioning, updates, rollback strategies, on-device A/B testing approaches, telemetry, and quality monitoring.
  • Build robust evaluation and regression suites that reflect real device constraints and user workflows.
  • Ensure models degrade gracefully (low-power mode, thermals, backgrounding, OS interruptions).
  • Performance, reliability, and user experience
  • Treat battery, thermal, and latency as first-class product requirements: instrument, benchmark, and optimize continuously.
  • Design inference pipelines and scheduling strategies that respect app responsiveness, animations, and UI smoothness.
  • Partner with platform engineers to integrate ML into production apps with clean APIs and stable runtime behavior.

Requirements

  • Strong experience shipping ML features into production, ideally including mobile / edge / consumer devices.
  • Hands-on proficiency with Core ML and/or MLX, and the practical realities of running models locally.
  • Solid understanding of quantization and optimization techniques for inference (accuracy/perf tradeoffs, calibration, benchmarking).
  • Experience building or operating retrieval systems (embedding generation, vector search/indexing, caching strategies)—especially under resource constraints.
  • Fluency in performance engineering: profiling, latency breakdowns, memory analysis, and tuning on real devices.
  • Strong software engineering fundamentals: maintainable code, testing, CI, and debugging across complex systems.
  • Nice to Have:
  • Experience with on-device LLMs, multimodal models, or real-time interactive ML features.
  • Familiarity with Metal / GPU compute, or performance tuning of ML workloads on Apple platforms.
  • Experience designing privacy-preserving personalization and memory (local-first data handling, encryption, retention policies).
  • Experience building developer tooling for model packaging, benchmarking, and release management.
  • Prior work on offline-first architectures, edge inference, or battery/thermal-aware scheduling.
Benefits
  • Competitive salary and performance-based incentives.
  • Comprehensive health, dental, and vision benefits package.
  • 401k Match (US-based only)
  • $200/mos Health and Wellness Stipend
  • $400/year Continuing Education Credit
  • $500/year Function Health subscription (US-based only)
  • Free parking, for in-office employees
  • Unlimited Approved PTO
  • Parental Leave for Eligible Employees
  • Supplemental Life Insurance

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
on-device model optimizationCore MLMLXquantizationcompressionpruningdistillationembedding generationvector searchperformance engineering
Soft skills
collaborationcommunicationproblem-solvingattention to detailadaptability