Docusign

Principal Product Manager - Site Reliability

Docusign

full-time

Posted on:

Origin:  • 🇺🇸 United States

Visit company website
AI Apply
Manual Apply

Salary

💰 $174,400 - $327,625 per year

Job Level

Lead

Tech Stack

AnsibleAWSAzureCloudDistributed SystemsGoogle Cloud PlatformGrafanaKubernetesPrometheusTerraform

About the role

  • Define and lead high-impact SRE programs, including incident management, observability, automation, and capacity planning, to ensure system reliability and performance
  • Partner with product, engineering, and executive teams to create and maintain a strategic roadmap for site reliability initiatives, balancing short-term needs with long-term scalability
  • Facilitate requirements definition across teams to create engineering-ready epics, user stories and acceptance criteria while identifying dependencies and relative priority with other initiatives by working closely with software engineers, SREs, DevOps teams, and stakeholders to align on priorities, resolve blockers, and drive successful outcomes
  • Collaborate with senior executives to align SRE initiatives with company-wide goals, providing strategic insights and recommendations
  • Develop and deliver clear, compelling presentations to executive leadership, translating complex technical concepts into business-oriented outcomes and securing buy-in for initiatives
  • Oversee post-incident reviews, drive root cause analysis, and implement preventive measures to minimize downtime and improve system resilience
  • Champion the development and adoption of automation tools and frameworks to reduce toil and improve operational efficiency
  • Communicate program status, risks, and outcomes to senior leadership and stakeholders, translating technical details into business impact
  • Remote designation details: Job Designation Remote: Employee is not required to be in or near an office frequently and works from a designated remote work location for the majority of the time.

Requirements

  • Basic 15+ years of professional experience in the High-Tech Industry, including 12+ years of experience in product management or site reliability engineering (SRE) managing, designing, and delivering world class SRE platform and infrastructure for SaaS products and services
  • Proven track record of leading complex, cross-functional programs in a fast-paced, technology-driven environment
  • Demonstrated ability to work effectively with executives, including presenting strategic plans and program updates to senior leadership
  • Experience with SRE principles, including observability, incident response, and infrastructure automation
  • Experience with distributed systems, cloud platforms (e.g. AWS, Azure, GCP), and container orchestration (e.g. Kubernetes)
  • Experience with CI/CD pipelines, infrastructure as code (e.g. Terraform, Ansible), and monitoring tools (e.g. Prometheus, Grafana)
  • Experience in building dashboards and data driven approach to projects
  • Experience in presenting to Executive Leadership, internal to the company and with external customers
  • Ability to thrive in ambiguous environments and manage competing priorities
  • Bachelor’s degree in Computer Science, Engineering, or a related technical field
  • Preferred Advanced degree or equivalent experience
  • Exceptional communication and interpersonal skills, with the ability to influence and align diverse stakeholders, including C-level executives
  • Strong problem-solving and analytical skills, with a focus on driving measurable outcomes
  • Experience in defining and implementing SLOs/SLIs for large-scale systems
  • Understanding of setting OKR and managing of multiple deliverables
  • Background in managing incident response processes or chaos engineering programs
  • Familiarity with DevOps software development methodologies