Senior Machine Learning Engineer – Inference Platform

Wizard

Senior Machine Learning Engineer managing production ML serving systems for an advanced AI shopping platform. Collaborating with teams to ensure model efficacy and scalability in real-time environments.

Posted 6/3/2026full-timeRemote • 🇺🇸 United StatesSeniorWebsite

ATS Keywords

Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills

PythonML pipelinesmodel versioningCI/CDinference performancequantizationGPU utilizationhigh-throughput servingproduction ML serving systemsinfrastructure design

Soft Skills

collaborationtechnical decision-makingproblem-solvingadaptabilitycommunication

Tools & Technologies

AWSGCPAzurevLLMTGITensorRT-LLMSGLangmodel registriesmonitoring toolsalerting systems

Certifications & Qualifications

Bachelor's degree in Computer ScienceMaster's degree in Data ScienceEngineering degree

Industry Keywords

ML lifecycle toolingproduction-scale ML systemslatencyavailabilityoperational reliability

Tech Stack

Tools & technologies

AWSAzureCloudGoogle Cloud PlatformPython

About the role

Key responsibilities & impact

Own and evolve our multi-engine inference platform, supporting a variety of model types and serving requirements.
Build and improve production ML pipelines — taking models from experimentation to reliable, high-throughput serving.
Define and implement model versioning, rollout, rollback, and lifecycle management strategies that ensure reproducibility and operational reliability.
Define and enforce serving-layer SLAs, including latency, availability, GPU utilization, Time-to-First-Token (TTFT), and Inter-Token Latency (ITL).
Build observability, monitoring, alerting, and operational tooling for production inference systems.
Apply software engineering best practices, including testing, CI/CD integration, and reproducibility across ML workflows.
Optimize inference performance through efficient resource utilization, hardware-aware serving strategies, and cost-conscious infrastructure design.
Ensure ML serving systems are secure, scalable, and operationally resilient.
Partner with ML, Data, Product, and DevOps teams to turn ideas into production systems, driving the technical decisions on serving and scale.

Requirements

What you’ll need

Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or a related field, or equivalent practical experience.
5–8+ years of experience in Software Engineering, ML Engineering, Platform Engineering, or Infrastructure Engineering, with direct ownership of production ML serving systems.
Hands-on experience running an LLM serving engine (vLLM, TGI, TensorRT-LLM, or SGLang) in production under real load — not just managed or hosted endpoints.
Strong Python skills and software engineering fundamentals, combined with deep systems and infrastructure knowledge.
Experience with cloud platforms such as AWS, GCP, or Azure, and familiarity with ML lifecycle tooling, experimentation platforms, and model registries.
Strong grasp of inference performance — continuous batching, KV-cache and GPU-memory behavior, quantization, and CPU-versus-GPU bottlenecks — with the instinct to profile before tuning.
Experience serving heterogeneous workloads, including LLMs, embedding models, and extraction models, each with distinct latency, throughput, and scaling requirements.
Demonstrated ability to balance latency, throughput, reliability, and infrastructure cost while operating production-scale ML systems.
Experience in high-growth startup environments and comfort operating in fast-moving, evolving technical landscapes.

Benefits

Comp & perks

Health insurance
Flexible work arrangements
Professional development opportunities