Serve Robotics

Lead Machine Learning Engineer

Serve Robotics

full-time

Posted on:

Location Type: Remote

Location: Canada

Visit company website

Explore more

AI Apply
Apply

Salary

💰 CA$177,000 - CA$215,000 per year

Job Level

About the role

  • Design and maintain training systems that can process and learn from petabyte-scale multimodal datasets (e.g., video and point cloud data). This includes ensuring data is efficiently loaded, distributed, and processed across large GPU clusters.
  • Identify and resolve bottlenecks in the training pipeline, including data loading, preprocessing, model computation, and inter-node communication, to maximize GPU utilization and reduce training time.
  • Work with the ML team to develop and refine neural network architectures suitable for autonomy tasks, particularly those handling high-dimensional and sequential sensor data.
  • Create and adjust loss functions and training strategies that help the model learn effectively from complex multimodal inputs and improve autonomy performance.
  • Configure, monitor, and maintain large-scale distributed training jobs across multiple machines and GPUs, ensuring stability, fault tolerance, and efficient resource usage.
  • Implement scalable systems to preprocess, transform, and augment large robotics datasets so that they are suitable for model training.
  • Work closely with ML scientists and other engineers to integrate new models, experiments, and training approaches into the production training pipeline.
  • Analyze training metrics, model outputs, and experiment logs to assess model performance and guide improvements in architecture, data usage, or training strategies.
  • Develop tools and workflows that allow teams to run experiments, track results, and iterate quickly on new model ideas or training approaches.

Requirements

  • Master’s or PhD in Computer Science, Robotics, Electrical Engineering, Machine Learning, or a closely related technical discipline.
  • Minimum of 5 years of professional experience developing, training, and deploying machine learning models in production environments.
  • Hands-on experience training machine learning models across multiple GPUs or compute nodes, including familiarity with distributed training frameworks and large dataset handling.
  • Strong programming skills in Python for implementing machine learning models, data pipelines, and training workflows.
  • Solid knowledge of core concepts such as neural networks, optimization algorithms, loss functions, model evaluation, and training methodologies.
Benefits
  • Offers Equity 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
machine learningneural networksdata preprocessingmodel computationloss functionstraining strategiesdistributed trainingGPU utilizationPythonoptimization algorithms
Certifications
Master’s in Computer SciencePhD in Computer ScienceMaster’s in RoboticsPhD in RoboticsMaster’s in Electrical EngineeringPhD in Electrical EngineeringMaster’s in Machine LearningPhD in Machine Learning