Director, Site Reliability Engineer

Cision

full-time

Posted on: 4/7/2026

Location Type: Remote

Location: Kentucky • Utah • United States

Visit company website

Explore more

DevOps Engineer jobs

✨ AI Apply

Apply

Job Level

Lead

Tech Stack

AWS Azure Cloud Google Cloud Platform Terraform

About the role

Provide strategic leadership and oversight for four SRE teams, setting clear direction, priorities, and expectations aligned to business and engineering objectives
Lead, mentor, and develop SRE managers and senior engineers, fostering a culture of accountability, operational ownership, innovation, and psychological safety
Define and own the SRE and Platform Engineering strategy and roadmap, ensuring alignment with cloud transformation initiatives and long-term organizational goals
Serve as a key voice in architectural and platform decisions, influencing designs with a focus on scalability, reliability, automation, and operational efficiency
Partner with executive leadership to communicate reliability posture, risks, and investment needs in clear business terms
Establish and continuously evolve SRE principles and best practices, including SLIs, SLOs, error budgets, toil management, and reliability-driven prioritization
Provide technical direction and governance across GCP (preferred) and AWS environments, ensuring consistent reliability and operational patterns
Drive the evolution of Platform Engineering, enabling self-service infrastructure and guard-railed service delivery for application teams
Own strategy and standards for Infrastructure-as-Code (IaC) and automation, leveraging tools such as Terraform or equivalent frameworks across cloud environments
Ensure observability excellence through metrics, logging, tracing, alerting, and proactive capacity and performance management
Provide executive leadership during large-scale or high-impact incidents, ensuring effective coordination, escalation, and stakeholder communication
Define, refine, and scale incident management and on-call practices, emphasizing resilience, sustainability, and rapid recovery
Champion blameless postmortems, ensuring root causes are addressed and learnings are translated into systemic improvements
Partner with Security and Compliance teams to ensure systems meet security, privacy, and regulatory requirements without compromising reliability
Own and report on reliability metrics, operational KPIs, and service health for leadership and executive stakeholders
Drive continuous improvement through reliability reviews, retrospectives, and data-driven decision-making
Balance reliability, velocity, and cost across platforms, applying error budgets and capacity planning to guide trade-offs

Requirements

10+ years of experience in SRE, infrastructure, platform, or systems engineering roles, with 5+ years leading managers and senior technical teams
Direct, hands-on experience in Site Reliability Engineering, including operating production systems at scale
Strong experience with Google Cloud Platform (GCP) or equivalent public cloud (AWS or Azure), including distributed, cloud-native architectures
Proven expertise in Infrastructure-as-Code (IaC) and automation frameworks (e.g., Terraform or similar)
Deep understanding of observability ecosystems (metrics, logging, tracing), CI/CD pipelines, and DevOps/SRE tooling
Ability to communicate complex technical concepts clearly to both technical and non-technical stakeholders, influencing at all levels of the organization.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Site Reliability EngineeringInfrastructure-as-Codeautomation frameworksobservability ecosystemscloud-native architecturescapacity planningerror budgetsSLIsSLOsincident management

Soft Skills

strategic leadershipmentoringcommunicationoperational ownershipinnovationpsychological safetyinfluencingcontinuous improvementstakeholder communicationresilience