Serve Robotics

Senior DevOps Engineer – ML Infrastructure

Serve Robotics

full-time

Posted on:

Location Type: Remote

Location: Remote • 🇨🇦 Canada

Visit company website
AI Apply
Apply

Salary

💰 CA$155,000 - CA$195,000 per year

Job Level

Senior

Tech Stack

AWSAzureCloudDockerGoogle Cloud PlatformJenkinsKubernetesPythonSQLTerraform

About the role

  • Deploy and maintain our ML training orchestration system that operates across multiple platforms.
  • Manage cloud and on-premise environments for large-scale distributed data processing and ml training/inference systems.
  • Automate deployment pipelines, monitoring, and alerting for ML and data services.
  • Collaborate closely with data scientists, ML engineers, and autonomy teams to streamline experimentation and model deployment.
  • Maintain and improve CI/CD systems to support rapid development and testing.
  • Implement best practices for system security, reliability, and observability.
  • Optimize infrastructure costs and ensure efficient resource utilization.
  • Support internal developer productivity through tooling, documentation, and support.

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or equivalent experience.
  • 5+ years of experience as a DevOps, SRE, or Infrastructure Engineer, preferably supporting ML or data-intensive systems.
  • Strong experience with cloud platforms (AWS, GCP, or Azure) and container orchestration (Kubernetes, Docker).
  • Proficiency in infrastructure-as-code tools such as Terraform or Helm.
  • Solid understanding of CI/CD systems (GitLab CI, Jenkins, ArgoCD, etc.).
  • Experience with Python and SQL
  • Experience with cloud security, IAM (Identity and Access Management), and access control
  • Experience analysing and optimizing hardware performance
  • Experience with GPU cluster management
Benefits
  • Offers Equity 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
ML training orchestrationcloud environmentsdistributed data processingautomationCI/CD systemsinfrastructure-as-codePythonSQLGPU cluster managementcloud security
Soft skills
collaborationstreamlining experimentationsupporting internal developer productivity
Certifications
Bachelor’s degree in Computer ScienceMaster’s degree in Engineering