Canva

Engineering Manager, AI Reliability

Canva

full-time

Posted on:

Location Type: Remote

Location: Australia

Visit company website

Explore more

AI Apply
Apply

About the role

  • Building world-class AI infrastructure to support a 100+ person research team at the forefront of creative AI
  • Designing and scaling multi-cloud systems that support high-performance model training and inference
  • Partnering across AWS, GCP, Cloudflare and GCore to optimise GPU compute environments
  • Enhancing CI/CD pipelines and developer velocity within our AI platform teams
  • Improving monitoring, alerting and system observability for AI workloads
  • Driving alignment in DevOps best practices across the AI platform and CORE engineering teams
  • Leading a high-impact engineering team in a fast-paced, cutting-edge environment

Requirements

  • You’ve led DevOps or infrastructure teams, ideally in AI or high-performance computing environments
  • You’re experienced with AWS (ECS, EC2, S3, IAM) and multi-cloud environments like GCP, Cloudflare or GCore
  • You’ve worked with Kubernetes, SLURM, or similar distributed training infrastructure
  • You’re fluent in infrastructure as code tools like Terraform
  • You understand the lifecycle of AI models and how to support R&D at scale
  • You have a strong grasp of containerisation, Linux fundamentals, and cloud networking
  • You’re collaborative, curious, and passionate about enabling others to move fast and safely
Benefits
  • Equity packages - we want our success to be yours too
  • Inclusive parental leave policy that supports all parents & carers
  • An annual Vibe & Thrive allowance to support your wellbeing, social connection, office setup & more
  • Flexible leave options that empower you to be a force for good, take time to recharge and supports you personally
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
AI infrastructuremulti-cloud systemsGPU compute environmentsCI/CD pipelinesmonitoringalertingsystem observabilityDevOps best practicesKubernetesTerraform
Soft Skills
collaborativecuriouspassionateleadershipcommunication