FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesAWSCloudDistributed SystemsGoJavaKubernetesLinuxRust
About the role
Key responsibilities & impact- Owning an engagement area: Taking long-term accountability for one of Canva's highest-risk technical domains
- Writing production software: The work is code, not process
- Instrumenting, refactoring, rebuilding the pieces that cause problems at scale
- You're a software engineer first; the reliability outcome at scale is what you're optimising for
- Opportunity to pair, mentor and learn from fellow production engineers
- Striving for fewer incidents, faster recovery, lower severity, latency that bends in the right direction
- Taking pride in moving needle metrics, that positively impacts the quality of the customer experience
Requirements
What you’ll need- Owned reliability work within large-scale distributed systems
- Previously worked as an engineer embedded in or partnering closely with a product or feature team, not siloed in a platform org that throws tools over the fence
- You've built real things in Java, Go, Rust, C++, or a comparable systems language at production scale; commercial depth, not academic familiarity
- Navigated sharding, replication, failure modes, consistency tradeoffs in real systems
- Ability to parachute into an unfamiliar codebase, orient quickly, find where the problem actually lives, and fix it
- Proven to have made things better in systems through wisdom and trust
- You know the network stack and what traffic looks like a scale
- Enough kernel-level understanding to reason about what's actually happening when a system misbehaves process scheduling, memory, I/O, network stack
- Consistent hashing, leader election, consensus, backpressure, circuit breakers
- You've instrumented systems for real, built the tracing, the dashboards, the alerting that actually tells you what's wrong
- You've profiled JVM applications or systems-level processes, found the thing nobody was looking at, and fixed it in a way that lasted
- AWS at meaningful depth, so you understand how they behave under load and at the edges
- You've been on-call in a serious production environment and have opinions about what good incident management actually looks like
Benefits
Comp & perks- Equity packages — we want our success to be yours too
- Inclusive parental leave policy that supports all parents & carers
- An annual Vibe & Thrive allowance to support your wellbeing, social connection, office setup & more
- Flexible leave options that empower you to be a force for good, take time to recharge and supports you personally
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Production Software DevelopmentSystem OptimizationCodebase NavigationKernel-Level UnderstandingProfiling JVM ApplicationsConsistent HashingLeader ElectionCircuit BreakersFailure Modes AnalysisNetwork Stack Knowledge
Soft Skills
MentoringCollaborationProblem-Solving
