
Global Head of SRE
Socure
full-time
Posted on:
Location Type: Remote
Location: Remote • 🇺🇸 United States
Visit company websiteSalary
💰 $260,000 - $285,000 per year
Job Level
Lead
Tech Stack
AWSCloud
About the role
- Define the global reliability strategy and roadmap across availability, latency, durability, data integrity, cost efficiency, and safety—mapped to clear business outcomes and service level objectives.
- Architect multi‑region, multi‑zone resilience patterns with automated failover, graceful degradation, and progressive delivery; validate readiness through continuous game days and fault‑injection experiments.
- Build and lead a world‑class red‑team QA and chaos engineering program across infrastructure, data pipelines, and applications; codify attack playbooks and steady‑state guardrails to improve detection and recovery.
- Establish a unified observability practice: end‑to‑end tracing, high‑signal alerting, health and saturation indicators, user‑journey telemetry, and incident command protocols—standardized into a single, actionable operations view.
- Drive rigorous incident management: real‑time incident command, rapid mitigation, blameless post‑incident reviews, durable corrective actions, and automated safeguards.
- Ensure public sector readiness and continuous authorization: sustain FedRAMP Moderate posture, prove environmental parity between commercial and GovCloud, and strengthen controls for data residency, deletion, and audit evidence.
- Partner with product engineering to make reliability a product feature: embed reliability patterns into RiskOS workflows and make Identity Graph‑based decisions observable, explainable, and resilient by default.
- Lead developer tooling and release engineering: own CI/CD pipelines, test sandboxes and ephemeral environments, and the golden paths that make shipping changes safe, repeatable, and fast.
- Advance an AI‑first SRE strategy: deploy ML for anomaly detection, incident prediction, adaptive alerting, automated runbooks, incident summarization, and capacity forecasts; measure impact via concrete reliability and efficiency wins.
- Lead capacity planning and performance engineering across compute, storage, and networking—delivering consistently low‑latency decisions at peak volumes.
- Attract, grow, and retain exceptional reliability engineers and leaders across regions; run a humane, effective, continuously improving on‑call program.
Requirements
- Deep experience leading reliability for large‑scale, always‑on platforms with highly sensitive data—owning availability, latency, durability, and security across multiple product lines and regions.
- Mastery in modern cloud architecture (AWS), product‑aligned multi‑account patterns, real‑time observability, progressive delivery, and automated disaster recovery—with a track record of measurable reliability gains.
- Experience building red‑team and chaos engineering programs that surface systemic weaknesses, improve mean time to mitigate, and harden systems over time.
- Proven leadership of developer tooling at scale: CI/CD, release engineering, and ephemeral environment strategies that increase velocity while reducing risk.
- Strong partnership with product, data, and security; fluency in data lifecycle, retention and deletion, privacy, and governance for regulated industries and public sector.
- A people‑first leadership style: you raise the bar on hiring and mentoring, set crisp principles, and build an ownership culture grounded in curiosity, accountability, and continuous learning.
Benefits
- Offers Equity
- Offers Bonus
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
reliability engineeringcloud architectureCI/CDchaos engineeringincident managementcapacity planningperformance engineeringdata integrityautomated disaster recoveryanomaly detection
Soft skills
leadershipmentoringaccountabilitycuriositycontinuous learningpartnershipcommunicationproblem-solvingteam buildingstrategic thinking
Certifications
FedRAMP ModerateAWS Certified Solutions ArchitectCertified Kubernetes AdministratorCertified Information Systems Security Professional (CISSP)Certified Reliability Engineer (CRE)ITIL CertificationCertified ScrumMaster (CSM)Google Cloud Professional Cloud ArchitectMicrosoft Certified: Azure Solutions Architect ExpertCompTIA Security+