Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Wizard

Senior Machine Learning Engineer – Inference Platform

Wizard

Senior Machine Learning Engineer managing production ML serving systems for an advanced AI shopping platform. Collaborating with teams to ensure model efficacy and scalability in real-time environments.

Posted 6/3/2026full-timeRemote • 🇺🇸 United StatesSeniorWebsite

Tech Stack

Tools & technologies
AWSAzureCloudGoogle Cloud PlatformPython

About the role

Key responsibilities & impact
  • Own and evolve our multi-engine inference platform, supporting a variety of model types and serving requirements.
  • Build and improve production ML pipelines — taking models from experimentation to reliable, high-throughput serving.
  • Define and implement model versioning, rollout, rollback, and lifecycle management strategies that ensure reproducibility and operational reliability.
  • Define and enforce serving-layer SLAs, including latency, availability, GPU utilization, Time-to-First-Token (TTFT), and Inter-Token Latency (ITL).
  • Build observability, monitoring, alerting, and operational tooling for production inference systems.
  • Apply software engineering best practices, including testing, CI/CD integration, and reproducibility across ML workflows.
  • Optimize inference performance through efficient resource utilization, hardware-aware serving strategies, and cost-conscious infrastructure design.
  • Ensure ML serving systems are secure, scalable, and operationally resilient.
  • Partner with ML, Data, Product, and DevOps teams to turn ideas into production systems, driving the technical decisions on serving and scale.

Requirements

What you’ll need
  • Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or a related field, or equivalent practical experience.
  • 5–8+ years of experience in Software Engineering, ML Engineering, Platform Engineering, or Infrastructure Engineering, with direct ownership of production ML serving systems.
  • Hands-on experience running an LLM serving engine (vLLM, TGI, TensorRT-LLM, or SGLang) in production under real load — not just managed or hosted endpoints.
  • Strong Python skills and software engineering fundamentals, combined with deep systems and infrastructure knowledge.
  • Experience with cloud platforms such as AWS, GCP, or Azure, and familiarity with ML lifecycle tooling, experimentation platforms, and model registries.
  • Strong grasp of inference performance — continuous batching, KV-cache and GPU-memory behavior, quantization, and CPU-versus-GPU bottlenecks — with the instinct to profile before tuning.
  • Experience serving heterogeneous workloads, including LLMs, embedding models, and extraction models, each with distinct latency, throughput, and scaling requirements.
  • Demonstrated ability to balance latency, throughput, reliability, and infrastructure cost while operating production-scale ML systems.
  • Experience in high-growth startup environments and comfort operating in fast-moving, evolving technical landscapes.

Benefits

Comp & perks
  • Health insurance
  • Flexible work arrangements
  • Professional development opportunities

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PythonML pipelinesmodel versioningCI/CDinference performancequantizationGPU utilizationhigh-throughput servingproduction ML serving systemsinfrastructure design
Soft Skills
collaborationtechnical decision-makingproblem-solvingadaptabilitycommunication
Certifications
Bachelor's degree in Computer ScienceMaster's degree in Data ScienceEngineering degree