Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Modulate

Machine Learning Operations Engineer

Modulate

ML Operations Engineer responsible for reliability and efficiency of production systems at Modulate. Working on scaling machine learning models and collaborating with engineering teams.

Posted 5/13/2026full-timeSomerville • Massachusetts • 🇺🇸 United StatesMid-LevelSenior💰 $150,000 - $200,000 per yearWebsite

Tech Stack

Tools & technologies
AWSLinuxPythonPyTorchTerraform

About the role

Key responsibilities & impact
  • Own the reliability and performance of ML model inference systems in production
  • Ensure high availability of deployed models across APIs and enterprise products
  • Build systems to handle scaling, load variability, and production traffic growth
  • Reduce operational burden through better tooling, automation, and processes
  • Help define how Modulate runs ML systems at scale with reliability and efficiency
  • Deploy, monitor, and maintain production machine learning inference systems
  • Oversee fleets of inference machines and ensure system health and performance
  • Design monitoring, alerting, and incident response systems for ML workloads
  • Participate in on-call rotations and lead incident response and debugging
  • Build systems and processes for scaling inference infrastructure under variable load
  • Improve reliability and observability of production ML services
  • Collaborate on infrastructure-as-code for production deployments
  • Support or contribute to GPU-based training and inference infrastructure
  • Work closely with ML and engineering teams to ensure smooth model deployments
  • (Optional growth area) Optimize model inference performance and latency

Requirements

What you’ll need
  • Experience deploying and maintaining production software systems
  • Experience building monitoring and alerting systems for production environments
  • Experience with on-call rotations and incident response
  • Strong experience with AWS, Python, and Linux
  • Exposure to PyTorch or similar ML frameworks
  • Experience working with GPU-based applications and basic GPU tooling (drivers, runtime, monitoring)
  • Strong debugging and systems thinking skills
  • Ability to operate calmly in production incident environments
  • Nice to Have
  • Experience with ML model serving systems or dedicated model servers
  • Experience monitoring GPU performance for inference workloads
  • Experience optimizing machine learning model inference
  • Familiarity with audio or multimedia data (codecs, streaming, real-time systems)
  • Experience with infrastructure-as-code (e.g., Terraform, CloudFormation)

Benefits

Comp & perks
  • Competitive salary + equity
  • Full health, dental, and vision coverage
  • Flexible PTO with strong culture of taking it
  • Weekly team lunches with dietary accommodations
  • Hybrid work with core in-office days and flexible remote options
  • Leadership and technical learning sessions
  • Career development and continued learning support
  • Up to 8 weeks work-from-anywhere policy
  • A deeply inclusive, human-centered culture

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
machine learningmodel inferenceAWSPythonLinuxPyTorchGPU-based applicationsinfrastructure-as-codemonitoring systemsincident response
Soft Skills
debuggingsystems thinkingcalmness in production incidentscollaboration