
Senior Machine Learning Engineer – Evaluations, Design Generation
Canva
full-time
Posted on:
Location Type: Remote
Location: Remote • 🇦🇺 Australia
Visit company websiteJob Level
Senior
Tech Stack
Distributed SystemsPythonSQL
About the role
- Understand and optimize existing evaluation systems including LLM-as-Judge frameworks, visual quality models , and multi-dimensional scoring approaches - analyzing their strengths, limitations, and trade-offs to identify gaps in coverage and opportunities for improvement across brand adherence, visual appeal, layout quality, and functional correctness
- Design and implement automated evaluation pipelines that score generated designs at scale, balancing accuracy with computational cost
- Define evaluation strategies for different scenarios: pre-deployment validation, continuous monitoring, A/B experiment analysis, and model comparison
- Curate high-quality evaluation datasets and benchmark suites that represent diverse use cases, edge cases, and quality dimensions
- Integrate evaluation systems into continuous deployment pipelines, creating automated quality gates that catch regressions before production
- Reduce evaluation cycle time to enable teams to iterate faster on model improvements and launch experiments earlier
- Partner with research teams to understand evaluation needs for new model architectures and capabilities
- Define the evaluation ecosystem strategy: how different evaluation tools and methods compose together for Design Generation
- Guide teams on evaluation best practices, appropriate methodologies for their use cases, and interpretation of results
Requirements
- Strong ML engineering fundamentals with experience building and maintaining production ML systems at scale
- Proven ability to build robust, scalable infrastructure (not just models) - you're a platform engineer who speaks ML
- Deep understanding of distributed systems, observability patterns, and monitoring best practices
- Python proficiency with production-quality coding standards, code reviews, and testing practices
- Experience with data pipelines, time-series data, and statistical analysis for detecting anomalies
- SQL fluency for querying and analyzing large datasets across data warehouse and analytics systems
- Track record of building self-service platforms or developer tooling that gets adoption
- Excellent collaboration skills - this role requires working across teams to understand needs and deliver solutions
- Experience with evaluation of Gen AI systems at scale (even better if that’s evaluation of systems with creative outputs!)
Benefits
- Equity packages - we want our success to be yours too
- Inclusive parental leave policy that supports all parents & carers
- An annual Vibe & Thrive allowance to support your wellbeing, social connection, office setup & more
- Flexible leave options that empower you to be a force for good, take time to recharge and supports you personally
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
ML engineeringproduction ML systemsdistributed systemsobservability patternsmonitoring best practicesPythondata pipelinestime-series datastatistical analysisSQL
Soft skills
collaborationcommunicationproblem-solvingteamworkadaptabilityguidanceinterpretation of resultsbest practices