Principal Production Engineer

Canva

Principal Production Engineer focusing on software engineering and production reliability solutions at Canva. Working in high-risk technical areas with product and infrastructure teams to enhance production performance.

Posted 6/29/2026full-timeRemote • 🇦🇺 AustraliaLeadWebsite

Tech Stack

Tools & technologies

AWSCloudDistributed SystemsGoJavaKubernetesLinuxRust

About the role

Key responsibilities & impact

Join the team redefining how the world experiences design.
The Production Engineering team sits at the intersection of software engineering and the hardest reliability problems in Canva's infrastructure.
Writing software. Changing how production behaves. When it works, every team ships with more confidence and Canva gets faster and more resilient for the people who use it every day.
The strategic bet is a different model entirely. Canva's own take on what production reliability looks like, built for how we work.
Not operationalising systems. Not running alerts. Writing software that changes how production behaves.
Leading the hardest engagements: Taking personal ownership of the most technically complex areas, sharding, multi-region architecture, JVM performance at scale, while the team builds depth in adjacent domains.
Setting the technical bar: What it means to be a production engineer at Canva. The standard for technical credibility. The archetype that future hiring calibrates against.
Pairing strategy across the team: Deciding how staff and mid-level engineers are paired and what they should be learning from each engagement.
Building the measurement story: Incident severity and duration trending down. Feature launches going to production cleanly. You define what the metrics are and how they're tracked.

Requirements

What you’ll need

Experience Production at scale: You've owned reliability in large-scale distributed systems. When things went brake, you investigate how and shipped the solution that lasts forever.
Technical leadership in embedded models: You've led or helped shape a function where engineers work across team boundaries rather than within a single one. You know what makes that model work and what makes it fail.
Hands-on through seniority: You've stayed close to the code. At this level, you're the engineer others consult when the problem is genuinely hard.
Cross-org influence: You've shaped how teams outside your own make technical decisions because your technical judgement is trusted.
JVM or systems depth: You've built real things in Java, Go, Rust, C++, or a comparable systems language at production scale. Language matters less than depth.
Distributed systems in practice: You've navigated sharding, replication, failure modes, and consistency tradeoffs in production.
Technical knowledge Linux internals: You can reason about process scheduling, memory, I/O, and the network stack when a system misbehaves.
Distributed systems: You've navigated sharding, replication, failure modes, and consistency tradeoffs in production. As well as consistent hashing, leader election, consensus, backpressure, circuit breakers
Observability tooling: You've built the tracing, dashboards, and alerting that tells you what's wrong.
Containerisation and orchestration: Kubernetes in production, at the scheduler level.
Performance analysis: You've profiled JVM applications or systems-level processes and fixed what you found.
Cloud infrastructure: AWS in production, across the failure modes that matter at scale.
Incident response: You've been on-call and have opinions about what good looks like.
Nice to have Enterprise SaaS background: You've done this specific kind of work at an org that's done it well. You know what 'production engineering' means when it's not just a job title.
JVM internals: You've tuned GC and profiled threads in production.
Multi-region or sharding experience: You've been involved in a data store migration or multi-region architecture where getting it wrong was not an option.

Benefits

Comp & perks

Equity packages — we want our success to be yours too
Inclusive parental leave policy that supports all parents & carers
An annual Vibe & Thrive allowance to support your wellbeing, social connection, office setup & more
Flexible leave options that empower you to be a force for good, take time to recharge and supports you personally

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

JavaGoRustC++Linux InternalsPerformance AnalysisIncident ResponseMulti-Region ArchitectureShardingConsistent Hashing

Soft Skills

Cross-Org InfluenceTechnical Judgement