Cision

Director, Site Reliability Engineer

Cision

full-time

Posted on:

Location Type: Remote

Location: KentuckyUtahUnited States

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Provide strategic leadership and oversight for four SRE teams, setting clear direction, priorities, and expectations aligned to business and engineering objectives
  • Lead, mentor, and develop SRE managers and senior engineers, fostering a culture of accountability, operational ownership, innovation, and psychological safety
  • Define and own the SRE and Platform Engineering strategy and roadmap, ensuring alignment with cloud transformation initiatives and long-term organizational goals
  • Serve as a key voice in architectural and platform decisions, influencing designs with a focus on scalability, reliability, automation, and operational efficiency
  • Partner with executive leadership to communicate reliability posture, risks, and investment needs in clear business terms
  • Establish and continuously evolve SRE principles and best practices, including SLIs, SLOs, error budgets, toil management, and reliability-driven prioritization
  • Provide technical direction and governance across GCP (preferred) and AWS environments, ensuring consistent reliability and operational patterns
  • Drive the evolution of Platform Engineering, enabling self-service infrastructure and guard-railed service delivery for application teams
  • Own strategy and standards for Infrastructure-as-Code (IaC) and automation, leveraging tools such as Terraform or equivalent frameworks across cloud environments
  • Ensure observability excellence through metrics, logging, tracing, alerting, and proactive capacity and performance management
  • Provide executive leadership during large-scale or high-impact incidents, ensuring effective coordination, escalation, and stakeholder communication
  • Define, refine, and scale incident management and on-call practices, emphasizing resilience, sustainability, and rapid recovery
  • Champion blameless postmortems, ensuring root causes are addressed and learnings are translated into systemic improvements
  • Partner with Security and Compliance teams to ensure systems meet security, privacy, and regulatory requirements without compromising reliability
  • Own and report on reliability metrics, operational KPIs, and service health for leadership and executive stakeholders
  • Drive continuous improvement through reliability reviews, retrospectives, and data-driven decision-making
  • Balance reliability, velocity, and cost across platforms, applying error budgets and capacity planning to guide trade-offs

Requirements

  • 10+ years of experience in SRE, infrastructure, platform, or systems engineering roles, with 5+ years leading managers and senior technical teams
  • Direct, hands-on experience in Site Reliability Engineering, including operating production systems at scale
  • Strong experience with Google Cloud Platform (GCP) or equivalent public cloud (AWS or Azure), including distributed, cloud-native architectures
  • Proven expertise in Infrastructure-as-Code (IaC) and automation frameworks (e.g., Terraform or similar)
  • Deep understanding of observability ecosystems (metrics, logging, tracing), CI/CD pipelines, and DevOps/SRE tooling
  • Ability to communicate complex technical concepts clearly to both technical and non-technical stakeholders, influencing at all levels of the organization.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Site Reliability EngineeringInfrastructure-as-Codeautomation frameworksobservability ecosystemscloud-native architecturescapacity planningerror budgetsSLIsSLOsincident management
Soft Skills
strategic leadershipmentoringcommunicationoperational ownershipinnovationpsychological safetyinfluencingcontinuous improvementstakeholder communicationresilience