
Staff Software Engineer – Replication Foundations
Temporal Technologies
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $180,000 - $304,000 per year
Job Level
Tech Stack
About the role
- Lead the design and implementation of core components of Temporal’s OSS replication stack, from initial design through rollout and long-term operational ownership.
- Design and evolve replication protocols that power High Availability namespaces, Cross-cluster and cross-region replication, Migration between Temporal clusters (cloud ↔ self-hosted, cloud ↔ cloud).
- Build scalability and reliability capabilities such as Multi-cell namespaces, Protocols enabling a single namespace to span multiple clusters, Dynamic split/merge strategies based on usage, hot spots, and capacity needs.
- Reason deeply about correctness: consistency models, ordering guarantees, idempotency, failure recovery, and safe rollouts of protocol changes.
- Drive cross-team alignment with Cloud Enablement and other CGS teams to ensure OSS foundations support current and future cloud products.
- Author high-quality design docs that clarify invariants, trade-offs, failure modes, and operational playbooks for complex changes.
- Raise engineering standards through reviews, mentorship, and technical leadership—improving correctness testing, fault injection, and incident readiness.
- Participate in on-call/incident response related to replication and core system behavior, helping build durable fixes and prevention mechanisms.
Requirements
- 10+ years building production systems, including significant experience with distributed systems and correctness-critical infrastructure.
- Strong experience with replication, consistency, fault tolerance, and failure recovery in distributed environments.
- Demonstrated ability to design and implement concurrent, correctness-critical systems with clear invariants and safety guarantees.
- Proven track record of leading complex technical projects across teams—setting direction, driving execution, and landing changes safely in production.
- Hands-on experience debugging complex production issues involving race conditions, data consistency, partial failures, and performance degradation.
- Proficiency writing production-quality concurrent code, preferably in Go (Java/C++ or similar systems languages also welcome).
- Solid understanding of distributed systems fundamentals such as replication, sharding/partitioning, backpressure, failure detection, and durability mechanisms.
- Ability to operate with high ownership and minimal oversight, balancing deep technical rigor with pragmatic delivery.
- Curiosity and rigor in understanding how systems behave under stress, failure, and scale.
Benefits
- Unlimited PTO, 12 Holidays + 2 Floating Holidays
- 100% Premiums Coverage for Medical, Dental, and Vision
- AD&D, LT & ST Disability, and Life Insurance (Standard & Supplemental Available)
- Empower 401K Plan
- Additional Perks for Learning & Development, Lifestyle Spending, In-Home Office Setup, Professional Memberships, WFH Meals, Internet Stipend and more!
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
GoJavaC++distributed systemsreplicationfault tolerancefailure recoveryconcurrent programmingcorrectness modelsscalability
Soft Skills
technical leadershipmentorshipcross-team alignmentproblem-solvingcommunicationcuriosityownershipexecutiondesign documentationincident response