HubSpot

Director, Reliability Engineering

HubSpot

full-time

Posted on:

Location Type: Remote

Location: Ireland

Visit company website

Explore more

AI Apply
Apply

Salary

💰 €167,750 - €268,375 per year

Job Level

Tech Stack

About the role

  • Lead a team of ~20 reliability engineers, fostering a culture of operational excellence, continuous learning, and customer obsession
  • Attract, develop, and retain top talent; build career paths that keep engineers engaged and growing
  • Define and drive HubSpot's reliability roadmap, balancing proactive resilience investments with reactive incident reduction
  • Partner with Infrastructure leadership to prioritize reliability initiatives alongside cost, performance, and platform evolution
  • Set and evolve SLO standards that align engineering effort with customer experience
  • Lead the strategy for integrating AI and agentic approaches into incident detection, diagnosis, and mitigation-reducing time-to-resolution and human toil
  • Explore and implement AI-assisted tooling for pattern recognition across incidents, automated runbook execution, and predictive reliability insights
  • Build intelligent systems that learn from our operational history, proactively surface risks, and recommend-or execute-mitigation actions
  • Balance automation with human judgment-designing systems where AI augments engineers rather than creating blind spots
  • Own incident management end-to-end: response coordination, executive communication during major incidents, and blameless post-incident reviews that drive systemic improvement
  • Influence engineering culture across 100+ product teams-evangelizing reliability practices without compromising team autonomy
  • Identify systemic risks across the platform and drive cross-functional mitigation efforts
  • Serve as the voice of reliability in leadership forums, translating technical risk into business terms
  • Communicate transparently with customers and stakeholders during and after operational incidents
  • Partner with peer directors across Infrastructure, Product Engineering, and Security to align on shared priorities

Requirements

  • 10+ years of experience in software engineering, SRE, or infrastructure, with 5+ years leading teams
  • Track record of building and scaling reliability functions at companies with significant operational complexity
  • Deep technical fluency-you can dive into architecture discussions, incident analysis, and system design with credibility
  • Curiosity and vision for how AI/ML can transform operations; experience with or strong interest in AIOps, agentic automation, or ML-driven observability is a plus
  • Proven ability to drive cultural and process change across a large engineering organization without top-down mandates
  • Strong executive communication skills; comfortable leading incident bridges, presenting to leadership, and representing reliability externally
  • Experience with modern cloud infrastructure (AWS preferred), observability tooling, and incident management practices
  • A philosophy that balances reliability with velocity-you understand that the goal is sustainable speed, not gates
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
software engineeringsite reliability engineering (SRE)infrastructure managementAI/ML integrationincident analysissystem designobservabilityautomationincident managementreliability engineering
Soft Skills
leadershipcommunicationcuriosityvisioncultural changeprocess changecollaborationcustomer obsessionstrategic thinkingproblem-solving