FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

Senior Cloud Resilience Architect
Blink Health. Evaluate and mature the organization’s disaster recovery posture, including recovery objectives (RTO/RPO), dependency mapping, and failure domain analysis across applications, data, and infrastructure.
Tech Stack
Tools & technologiesAnsibleAWSAzureCloudDNSGoogle Cloud PlatformKubernetesTerraform
About the role
Key responsibilities & impact- Evaluate and mature the organization’s disaster recovery posture, including recovery objectives (RTO/RPO), dependency mapping, and failure domain analysis across applications, data, and infrastructure.
- Define, document, and establish disaster recovery standards and best practices across cloud infrastructure, platforms, and application architectures.
- Partner with SRE, platform, security, and product engineering teams to design and implement resilient, fault-tolerant systems, progressing from backup-based recovery to multi-region and active-active architectures.
- Lead the disaster recovery roadmap, balancing technical feasibility, cost, risk, and business priorities.
- Design and recommend reference architectures for disaster recovery patterns, including pilot-light, warm standby, hot standby, and active-active.
- Drive adoption of active-active disaster recovery for critical systems, including traffic management, data replication, consistency models, and automated failover.
- Define and operationalize testing strategies for DR, including game days, chaos testing, and regular recovery exercises.
- Establish clear documentation, runbooks, and escalation paths to ensure recoverability is well understood and not dependent on individuals.
- Evaluate and recommend platform upgrades, cloud services, and tooling that improve resilience, recovery speed, and reliability.
- Serve as a technical authority and advisor on disaster recovery and resilience for leadership and engineering teams.
- Provide architectural guidance, design reviews, and mentorship to engineers implementing DR-related changes.
- Partner with security and compliance teams to ensure DR strategies meet regulatory, audit, and data protection requirements.
Requirements
What you’ll need- Bachelor’s or Master’s degree in Computer Science or equivalent practical experience.
- 8+ years of experience in cloud infrastructure, platform engineering, SRE, or reliability-focused architecture roles.
- Deep understanding of disaster recovery concepts including RTO/RPO, blast radius reduction, failure domains, and dependency isolation.
- Proven experience designing and implementing multi-region and multi-availability zone architectures.
- Hands-on experience moving systems toward active-active or highly available architectures.
- Strong grasp of data replication strategies, consistency tradeoffs, and recovery patterns for databases and stateful systems.
- Extensive experience with major cloud providers (AWS preferred, GCP/Azure acceptable).
- Strong understanding of managed cloud services and their DR characteristics and limitations.
- Experience with Kubernetes-based platforms, including regional failover, workload portability, and cluster recovery strategies.
- Familiarity with global traffic management, DNS, load balancing, and service mesh patterns.
- Experience designing and maintaining Infrastructure as Code using tools such as Terraform, Pulumi, CloudFormation, or Ansible.
- Strong focus on automation for recovery workflows, failover testing, and environment provisioning.
- Ability to eliminate manual recovery steps and reduce time-to-recovery through software.
- Experience defining and running DR tests, game days, and failure simulations.
- Comfortable working across organizational boundaries to influence priorities and standards.
- Strong documentation and communication skills, with the ability to translate complex technical risk into business impact.
Benefits
Comp & perks- Health insurance
- Remote work flexibility
- Professional development
- Paid time off
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
disaster recoveryRTORPOdependency mappingfailure domain analysisdata replicationconsistency modelsInfrastructure as CodeKubernetescloud architecture
Soft Skills
leadershipcommunicationdocumentationmentorshipinfluencecollaborationorganizational skillstechnical authorityproblem-solvingstrategic thinking