
Director, Reliability Engineering
HubSpot
full-time
Posted on:
Location Type: Remote
Location: Ireland
Visit company websiteExplore more
Salary
💰 €167,750 - €268,375 per year
Job Level
About the role
- Lead a team of ~20 reliability engineers, fostering a culture of operational excellence, continuous learning, and customer obsession
- Attract, develop, and retain top talent; build career paths that keep engineers engaged and growing
- Define and drive HubSpot's reliability roadmap, balancing proactive resilience investments with reactive incident reduction
- Partner with Infrastructure leadership to prioritize reliability initiatives alongside cost, performance, and platform evolution
- Set and evolve SLO standards that align engineering effort with customer experience
- Lead the strategy for integrating AI and agentic approaches into incident detection, diagnosis, and mitigation-reducing time-to-resolution and human toil
- Explore and implement AI-assisted tooling for pattern recognition across incidents, automated runbook execution, and predictive reliability insights
- Build intelligent systems that learn from our operational history, proactively surface risks, and recommend-or execute-mitigation actions
- Balance automation with human judgment-designing systems where AI augments engineers rather than creating blind spots
- Own incident management end-to-end: response coordination, executive communication during major incidents, and blameless post-incident reviews that drive systemic improvement
- Influence engineering culture across 100+ product teams-evangelizing reliability practices without compromising team autonomy
- Identify systemic risks across the platform and drive cross-functional mitigation efforts
- Serve as the voice of reliability in leadership forums, translating technical risk into business terms
- Communicate transparently with customers and stakeholders during and after operational incidents
- Partner with peer directors across Infrastructure, Product Engineering, and Security to align on shared priorities
Requirements
- 10+ years of experience in software engineering, SRE, or infrastructure, with 5+ years leading teams
- Track record of building and scaling reliability functions at companies with significant operational complexity
- Deep technical fluency-you can dive into architecture discussions, incident analysis, and system design with credibility
- Curiosity and vision for how AI/ML can transform operations; experience with or strong interest in AIOps, agentic automation, or ML-driven observability is a plus
- Proven ability to drive cultural and process change across a large engineering organization without top-down mandates
- Strong executive communication skills; comfortable leading incident bridges, presenting to leadership, and representing reliability externally
- Experience with modern cloud infrastructure (AWS preferred), observability tooling, and incident management practices
- A philosophy that balances reliability with velocity-you understand that the goal is sustainable speed, not gates
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
software engineeringsite reliability engineering (SRE)infrastructure managementAI/ML integrationincident analysissystem designobservabilityautomationincident managementreliability engineering
Soft Skills
leadershipcommunicationcuriosityvisioncultural changeprocess changecollaborationcustomer obsessionstrategic thinkingproblem-solving