
Director, Site Reliability Engineer
Cision
full-time
Posted on:
Location Type: Remote
Location: Kentucky • Utah • United States
Visit company websiteExplore more
Job Level
About the role
- Provide strategic leadership and oversight for four SRE teams, setting clear direction, priorities, and expectations aligned to business and engineering objectives
- Lead, mentor, and develop SRE managers and senior engineers, fostering a culture of accountability, operational ownership, innovation, and psychological safety
- Define and own the SRE and Platform Engineering strategy and roadmap, ensuring alignment with cloud transformation initiatives and long-term organizational goals
- Serve as a key voice in architectural and platform decisions, influencing designs with a focus on scalability, reliability, automation, and operational efficiency
- Partner with executive leadership to communicate reliability posture, risks, and investment needs in clear business terms
- Establish and continuously evolve SRE principles and best practices, including SLIs, SLOs, error budgets, toil management, and reliability-driven prioritization
- Provide technical direction and governance across GCP (preferred) and AWS environments, ensuring consistent reliability and operational patterns
- Drive the evolution of Platform Engineering, enabling self-service infrastructure and guard-railed service delivery for application teams
- Own strategy and standards for Infrastructure-as-Code (IaC) and automation, leveraging tools such as Terraform or equivalent frameworks across cloud environments
- Ensure observability excellence through metrics, logging, tracing, alerting, and proactive capacity and performance management
- Provide executive leadership during large-scale or high-impact incidents, ensuring effective coordination, escalation, and stakeholder communication
- Define, refine, and scale incident management and on-call practices, emphasizing resilience, sustainability, and rapid recovery
- Champion blameless postmortems, ensuring root causes are addressed and learnings are translated into systemic improvements
- Partner with Security and Compliance teams to ensure systems meet security, privacy, and regulatory requirements without compromising reliability
- Own and report on reliability metrics, operational KPIs, and service health for leadership and executive stakeholders
- Drive continuous improvement through reliability reviews, retrospectives, and data-driven decision-making
- Balance reliability, velocity, and cost across platforms, applying error budgets and capacity planning to guide trade-offs
Requirements
- 10+ years of experience in SRE, infrastructure, platform, or systems engineering roles, with 5+ years leading managers and senior technical teams
- Direct, hands-on experience in Site Reliability Engineering, including operating production systems at scale
- Strong experience with Google Cloud Platform (GCP) or equivalent public cloud (AWS or Azure), including distributed, cloud-native architectures
- Proven expertise in Infrastructure-as-Code (IaC) and automation frameworks (e.g., Terraform or similar)
- Deep understanding of observability ecosystems (metrics, logging, tracing), CI/CD pipelines, and DevOps/SRE tooling
- Ability to communicate complex technical concepts clearly to both technical and non-technical stakeholders, influencing at all levels of the organization.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Site Reliability EngineeringInfrastructure-as-Codeautomation frameworksobservability ecosystemscloud-native architecturescapacity planningerror budgetsSLIsSLOsincident management
Soft Skills
strategic leadershipmentoringcommunicationoperational ownershipinnovationpsychological safetyinfluencingcontinuous improvementstakeholder communicationresilience