FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior Machine Learning Operations Engineer
BetMGMSenior MLOps Engineer working at BetMGM on AWS ML platform and operationalizing ML systems. Designing batch scoring and real-time inference pipelines while ensuring cost and performance targets.
Tech Stack
Tools & technologiesAWSDistributed SystemsDockerKubernetesPythonTerraform
About the role
Key responsibilities & impact- Stand up and operate BetMGM's ML platform on AWS (SageMaker Training, Model Registry, Pipelines, Endpoints, Batch Transform) and Snowflake (Snowpark ML, Cortex), with Terraform-managed infrastructure.
- Build self-service scaffolds that let data scientists ship a model end-to-end without a ticket queue — cookie-cutter project templates with CI, drift monitoring, alerting, IaC, and Snowflake connectivity pre-baked.
- Design and operate batch scoring pipelines — SageMaker Batch Transform, dbt-orchestrated scoring against Snowflake, Snowpark ML — with explicit freshness and cost SLAs.
- Design and operate real-time inference paths — SageMaker real-time endpoints, Lambda + Bedrock for GenAI, API Gateway — with stated latency budgets (typically sub-100ms) and graceful degradation under load.
- Own the feature store (SageMaker Feature Store, Tecton, or Feast) with guaranteed online/offline parity — training-serving skew is treated as an incident, not a tradeoff.
- Build CI/CD for ML — model registry, automated retraining triggers, model versioning, lineage from feature → training run → deployed model → live prediction.
- Implement champion/challenger, shadow deployments, and canary releases as platform primitives so individual model teams do not reinvent them per project.
- Stand up drift detection, data quality, and model performance monitoring (Evidently, Arize, or SageMaker Model Monitor — pick one and standardize) with paging that routes to humans who can fix it.
- Own MLOps incident response — production model failures are SEV events with postmortems.
- Right-size endpoints, batch caching, request batching, and autoscaling. State cost-per-prediction targets up front and meet them.
Requirements
What you’ll need- BS or MS in Computer Science, Math, Statistics, Machine Learning, or other STEM field — or equivalent practical experience.
- 5+ years shipping software in production — Python, Docker, Kubernetes or ECS, CI/CD, distributed systems debugging — including time on-call.
- 3+ years operating ML in production — you have owned a model in prod that served real traffic, with stated latency and cost budgets and a runbook you wrote.
- AWS depth across the SageMaker surface (Training, Endpoints, Batch Transform, Model Registry, Pipelines) plus the supporting cast (IAM, Lambda, ECS, S3, Secrets Manager, VPC).
- Snowflake fluency — Snowpark ML, Cortex, dbt-orchestrated batch scoring, RBAC for ML workloads.
- IaC for ML — Terraform + SageMaker Pipelines or equivalent. No manual console deployments to production.
- Feature store experience — SageMaker Feature Store, Tecton, or Feast — with explicit ownership of online/offline parity.
- Champion/challenger, shadow, and canary deployment patterns as production muscle, not blog-post familiarity.
- Drift and model monitoring — Evidently, Arize, WhyLabs, or SageMaker Model Monitor — wired to a paging path.
- Software-engineering-first mindset — you treat ML systems as systems, not notebooks.
Benefits
Comp & perks- Medical, Dental, Vision, Life, and Disability Insurance
- 401(k) with company match
- Pre-tax spending accounts including health care FSA and commuter savings
- Flexible paid time off
- Professional development reimbursement and ongoing skills training opportunities
- Employee resource groups
- Swag, ticket giveaways, and more!
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonDockerKubernetesAWSSageMakerTerraformSnowflakeCI/CDMLOpsMachine Learning
Soft Skills
incident responseproduction ownershipdebuggingcollaborationproblem-solvingcommunicationproject managementattention to detailadaptabilitycritical thinking
Certifications
BS in Computer ScienceMS in Computer ScienceMachine Learning certificationAWS Certified Solutions ArchitectAWS Certified Machine Learning