
Principal Engineer, Operational Excellence – Resilience
CrowdStrike
full-time
Posted on:
Location Type: Remote
Location: Remote • 🇺🇸 United States
Visit company websiteSalary
💰 $145,000 - $220,000 per year
Job Level
Lead
Tech Stack
AWSAzureCloudGoogle Cloud Platform
About the role
- Facilitate coordination between stakeholders across IT, Product, Engineering, and business units, serving as the central point for technology resilience initiatives and ensuring alignment with business objectives
- Own and maintain enterprise-wide technology resilience standards, ensuring consistent implementation and reducing organizational drift from established frameworks across infrastructure, application, and product domains
- Drive comprehensive technical resilience architecture including infrastructure redundancy and fault tolerance, application resilience and graceful degradation strategies, and chaos engineering frameworks for continuous resilience validation
- Lead enterprise technical recovery strategy development and implementation, including backup and redundancy systems, recovery time/point objectives (RTO/RPO) for technical systems, and data recovery/restoration procedures
- Partner to define and implement resilience standards, including feature flagging, release, testing, multi-tenancy frameworks, and scalability frameworks to manage growth
- Provide technical oversight and aggregation of technology resilience risks across the enterprise, establishing and monitoring key performance indicators including system uptime
- Drive chaos engineering and resilience testing programs, establishing enterprise-wide practices for proactive resilience validation and continuous improvement
- Own shared resilience tooling strategy, evaluation, and implementation to support enterprise-wide capabilities including monitoring, testing, and recovery automation
- Build and maintain formal networks with key constituents across business units, engineering teams, and external partners
- Serve as senior technical advisor during major incident response, providing expertise on technical recovery strategies and coordinating cross-functional recovery efforts
- Drive innovation in resilience practices, identifying emerging technologies and methodologies to advance CrowdStrike's competitive resilience advantage
- Provide strategic guidance and expertise to junior team members and cross-functional partners on resilience engineering best practices
Requirements
- 10+ years of direct experience in technology resilience, disaster recovery, site reliability engineering, or related technical disciplines, with demonstrated expertise in enterprise-scale cloud-native environments
- Deep understanding of infrastructure redundancy patterns, application resilience design, chaos engineering principles, and enterprise disaster recovery strategies across hybrid cloud architectures
- Proven experience with feature management systems, progressive deployment strategies, multi-tenant architecture resilience, and scalability engineering practices
- Proven ability to drive strategic initiatives across large technology organizations, with experience influencing senior stakeholders and leading complex, cross-functional resilience programs
- Experience establishing and monitoring resilience KPIs, including system uptime, MTTR, RTO/RPO objectives, and deployment success metrics
- Advanced certifications in disaster recovery, cloud architecture, or site reliability disciplines (e.g., DRCS, CISSP, AWS/Azure/GCP architecture certifications)
- Exceptional written and oral communication skills, including experience developing and delivering strategic briefings to executive leadership and technical teams
- Advanced analytical and conceptual thinking abilities, with proven track record of solving complex, ambiguous resilience challenges with enterprise-wide impact
- Demonstrated ability to build formal networks and influence stakeholders across engineering, product, and business organizations
- Bachelor's degree in Computer Science, Information Systems, Engineering, Risk/Resilience, or equivalent practical experience
Benefits
- Remote-friendly and flexible work culture
- Market leader in compensation and equity awards
- Comprehensive physical and mental wellness programs
- Competitive vacation and holidays for recharge
- Paid parental and adoption leaves
- Professional development opportunities for all employees regardless of level or role
- Employee Networks, geographic neighborhood groups, and volunteer opportunities to build connections
- Vibrant office culture with world class amenities
- Great Place to Work Certified™ across the globe
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
technology resiliencedisaster recoverysite reliability engineeringcloud-native environmentsinfrastructure redundancyapplication resilience designchaos engineeringresilience KPIsbackup and redundancy systemsdata recovery procedures
Soft skills
communication skillsanalytical thinkingconceptual thinkingstakeholder influencenetwork buildingstrategic guidancecross-functional leadershipinnovation in practicesmentoringproblem-solving
Certifications
DRCSCISSPAWS architecture certificationAzure architecture certificationGCP architecture certification