
Senior ML Ops Engineer
Wizard
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $200,000 - $250,000 per year
Job Level
About the role
- Build, maintain, and optimize production-grade ML pipelines, enabling seamless transitions from experimentation to production.
- Define and implement strategies for model versioning, rollout, rollback, and lifecycle management to ensure robust and reproducible ML systems
- Define and enforce serving-layer SLAs – latency, availability, GPU utilization, TTFT, ITL – and build observability and alerting
- Apply software engineering best practices including testing, CI/CD integration, and reproducibility to ML workflows, improving iteration speed for ML engineers without compromising reliability.
- Ensure ML systems are secure, cost-efficient, and scalable, partnering with DevOps on infrastructure standards while owning ML-specific operational concerns.
- Collaborate cross-functionally with ML, Data, Product, and DevOps teams to translate ML requirements into production-ready systems and influence technical planning and roadmap decisions.
Requirements
- Bachelor’s or Master’s degree in Computer Science, Data Science, or a related field, or equivalent experience.
- 5-8+ years of experience in Software Engineering, ML Engineering, Platform Engineering, or Infrastructure Engineering with direct ownership of production ML serving systems.
- Hands-on experience deploying and maintaining LLMs and deep learning models, in production environments.
- Strong Python skills and software engineering fundamentals with infrastructure depth. Familiarity with ML frameworks (PyTorch, Tensorflow or similar) is preferred.
- Experience with cloud platforms such as AWS, GCP, or Azure, and familiarity with ML lifecycle tooling, including model registries and experimentation platforms.
- Familiarity with inference optimization at the hardware and systems level – batching strategies, memory management, quantization tradeoffs, CPU/GPU interaction patterns.
- Demonstrated ability to reason about tradeoffs between latency, cost, throughput, and reliability at the systems as well as operational level.
- Experience in high-growth startup environments and an ability to thrive in a fast-paced, evolving technical landscape.
Benefits
- Equity in the form of stock options
- Medical, dental, and vision coverage
- 401(k) plan
- Flexible PTO and company holidays
- Fully remote work within the United States
- Periodic company offsites and team gatherings
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
machine learning pipelinesmodel versioningCI/CD integrationPythondeep learning modelsML frameworksinference optimizationlatency managementcost efficiencyscalability
Soft Skills
collaborationcross-functional teamworktechnical planningproblem-solvingadaptability
Certifications
Bachelor’s degree in Computer ScienceMaster’s degree in Data Science