Canva

Senior Machine Learning Engineer – Evaluations, Design Generation

Canva

full-time

Posted on:

Location Type: Remote

Location: Remote • 🇦🇺 Australia

Visit company website
AI Apply
Apply

Job Level

Senior

Tech Stack

Distributed SystemsPythonSQL

About the role

  • Understand and optimize existing evaluation systems including LLM-as-Judge frameworks, visual quality models , and multi-dimensional scoring approaches - analyzing their strengths, limitations, and trade-offs to identify gaps in coverage and opportunities for improvement across brand adherence, visual appeal, layout quality, and functional correctness
  • Design and implement automated evaluation pipelines that score generated designs at scale, balancing accuracy with computational cost
  • Define evaluation strategies for different scenarios: pre-deployment validation, continuous monitoring, A/B experiment analysis, and model comparison
  • Curate high-quality evaluation datasets and benchmark suites that represent diverse use cases, edge cases, and quality dimensions
  • Integrate evaluation systems into continuous deployment pipelines, creating automated quality gates that catch regressions before production
  • Reduce evaluation cycle time to enable teams to iterate faster on model improvements and launch experiments earlier
  • Partner with research teams to understand evaluation needs for new model architectures and capabilities
  • Define the evaluation ecosystem strategy: how different evaluation tools and methods compose together for Design Generation
  • Guide teams on evaluation best practices, appropriate methodologies for their use cases, and interpretation of results

Requirements

  • Strong ML engineering fundamentals with experience building and maintaining production ML systems at scale
  • Proven ability to build robust, scalable infrastructure (not just models) - you're a platform engineer who speaks ML
  • Deep understanding of distributed systems, observability patterns, and monitoring best practices
  • Python proficiency with production-quality coding standards, code reviews, and testing practices
  • Experience with data pipelines, time-series data, and statistical analysis for detecting anomalies
  • SQL fluency for querying and analyzing large datasets across data warehouse and analytics systems
  • Track record of building self-service platforms or developer tooling that gets adoption
  • Excellent collaboration skills - this role requires working across teams to understand needs and deliver solutions
  • Experience with evaluation of Gen AI systems at scale (even better if that’s evaluation of systems with creative outputs!)
Benefits
  • Equity packages - we want our success to be yours too
  • Inclusive parental leave policy that supports all parents & carers
  • An annual Vibe & Thrive allowance to support your wellbeing, social connection, office setup & more
  • Flexible leave options that empower you to be a force for good, take time to recharge and supports you personally

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
ML engineeringproduction ML systemsdistributed systemsobservability patternsmonitoring best practicesPythondata pipelinestime-series datastatistical analysisSQL
Soft skills
collaborationcommunicationproblem-solvingteamworkadaptabilityguidanceinterpretation of resultsbest practices