FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior Machine Learning Engineer – Inference Platform
WizardSenior Machine Learning Engineer managing production ML serving systems for an advanced AI shopping platform. Collaborating with teams to ensure model efficacy and scalability in real-time environments.
Tech Stack
Tools & technologiesAWSAzureCloudGoogle Cloud PlatformPython
About the role
Key responsibilities & impact- Own and evolve our multi-engine inference platform, supporting a variety of model types and serving requirements.
- Build and improve production ML pipelines — taking models from experimentation to reliable, high-throughput serving.
- Define and implement model versioning, rollout, rollback, and lifecycle management strategies that ensure reproducibility and operational reliability.
- Define and enforce serving-layer SLAs, including latency, availability, GPU utilization, Time-to-First-Token (TTFT), and Inter-Token Latency (ITL).
- Build observability, monitoring, alerting, and operational tooling for production inference systems.
- Apply software engineering best practices, including testing, CI/CD integration, and reproducibility across ML workflows.
- Optimize inference performance through efficient resource utilization, hardware-aware serving strategies, and cost-conscious infrastructure design.
- Ensure ML serving systems are secure, scalable, and operationally resilient.
- Partner with ML, Data, Product, and DevOps teams to turn ideas into production systems, driving the technical decisions on serving and scale.
Requirements
What you’ll need- Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or a related field, or equivalent practical experience.
- 5–8+ years of experience in Software Engineering, ML Engineering, Platform Engineering, or Infrastructure Engineering, with direct ownership of production ML serving systems.
- Hands-on experience running an LLM serving engine (vLLM, TGI, TensorRT-LLM, or SGLang) in production under real load — not just managed or hosted endpoints.
- Strong Python skills and software engineering fundamentals, combined with deep systems and infrastructure knowledge.
- Experience with cloud platforms such as AWS, GCP, or Azure, and familiarity with ML lifecycle tooling, experimentation platforms, and model registries.
- Strong grasp of inference performance — continuous batching, KV-cache and GPU-memory behavior, quantization, and CPU-versus-GPU bottlenecks — with the instinct to profile before tuning.
- Experience serving heterogeneous workloads, including LLMs, embedding models, and extraction models, each with distinct latency, throughput, and scaling requirements.
- Demonstrated ability to balance latency, throughput, reliability, and infrastructure cost while operating production-scale ML systems.
- Experience in high-growth startup environments and comfort operating in fast-moving, evolving technical landscapes.
Benefits
Comp & perks- Health insurance
- Flexible work arrangements
- Professional development opportunities
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonML pipelinesmodel versioningCI/CDinference performancequantizationGPU utilizationhigh-throughput servingproduction ML serving systemsinfrastructure design
Soft Skills
collaborationtechnical decision-makingproblem-solvingadaptabilitycommunication
Certifications
Bachelor's degree in Computer ScienceMaster's degree in Data ScienceEngineering degree