FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Principal Production Engineer
CanvaPrincipal Production Engineer focusing on software engineering and production reliability solutions at Canva. Working in high-risk technical areas with product and infrastructure teams to enhance production performance.
Tech Stack
Tools & technologiesAWSCloudDistributed SystemsGoJavaKubernetesLinuxRust
About the role
Key responsibilities & impact- Join the team redefining how the world experiences design.
- The Production Engineering team sits at the intersection of software engineering and the hardest reliability problems in Canva's infrastructure.
- Writing software. Changing how production behaves. When it works, every team ships with more confidence and Canva gets faster and more resilient for the people who use it every day.
- The strategic bet is a different model entirely. Canva's own take on what production reliability looks like, built for how we work.
- Not operationalising systems. Not running alerts. Writing software that changes how production behaves.
- Leading the hardest engagements: Taking personal ownership of the most technically complex areas, sharding, multi-region architecture, JVM performance at scale, while the team builds depth in adjacent domains.
- Setting the technical bar: What it means to be a production engineer at Canva. The standard for technical credibility. The archetype that future hiring calibrates against.
- Pairing strategy across the team: Deciding how staff and mid-level engineers are paired and what they should be learning from each engagement.
- Building the measurement story: Incident severity and duration trending down. Feature launches going to production cleanly. You define what the metrics are and how they're tracked.
Requirements
What you’ll need- Experience Production at scale: You've owned reliability in large-scale distributed systems. When things went brake, you investigate how and shipped the solution that lasts forever.
- Technical leadership in embedded models: You've led or helped shape a function where engineers work across team boundaries rather than within a single one. You know what makes that model work and what makes it fail.
- Hands-on through seniority: You've stayed close to the code. At this level, you're the engineer others consult when the problem is genuinely hard.
- Cross-org influence: You've shaped how teams outside your own make technical decisions because your technical judgement is trusted.
- JVM or systems depth: You've built real things in Java, Go, Rust, C++, or a comparable systems language at production scale. Language matters less than depth.
- Distributed systems in practice: You've navigated sharding, replication, failure modes, and consistency tradeoffs in production.
- Technical knowledge Linux internals: You can reason about process scheduling, memory, I/O, and the network stack when a system misbehaves.
- Distributed systems: You've navigated sharding, replication, failure modes, and consistency tradeoffs in production. As well as consistent hashing, leader election, consensus, backpressure, circuit breakers
- Observability tooling: You've built the tracing, dashboards, and alerting that tells you what's wrong.
- Containerisation and orchestration: Kubernetes in production, at the scheduler level.
- Performance analysis: You've profiled JVM applications or systems-level processes and fixed what you found.
- Cloud infrastructure: AWS in production, across the failure modes that matter at scale.
- Incident response: You've been on-call and have opinions about what good looks like.
- Nice to have Enterprise SaaS background: You've done this specific kind of work at an org that's done it well. You know what 'production engineering' means when it's not just a job title.
- JVM internals: You've tuned GC and profiled threads in production.
- Multi-region or sharding experience: You've been involved in a data store migration or multi-region architecture where getting it wrong was not an option.
Benefits
Comp & perks- Equity packages — we want our success to be yours too
- Inclusive parental leave policy that supports all parents & carers
- An annual Vibe & Thrive allowance to support your wellbeing, social connection, office setup & more
- Flexible leave options that empower you to be a force for good, take time to recharge and supports you personally
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
JavaGoRustC++Linux InternalsPerformance AnalysisIncident ResponseMulti-Region ArchitectureShardingConsistent Hashing
Soft Skills
Cross-Org InfluenceTechnical Judgement