Senior Site Reliability Engineer

Honeycomb.io

Senior Site Reliability Engineer scaling backend systems to support high-volume customers at Honeycomb. Working with AWS, Kubernetes, and various backend teams in a fully remote setting.

Posted 6/3/2026full-timeRemote • 🇮🇪 IrelandSenior💰 €140,590 - €165,400 per yearWebsite

ATS Keywords

Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills

AWSKubernetesHelmTerraformCI/CDGolangKafkaperformance engineeringcost analysisobservability

Soft Skills

communicationfeedbackproject managementcuriositycollaborationadaptabilitybias for actionexperimentationtrust buildingtailoring communication

Industry Keywords

reliability engineeringdata-driven decision makinghigh-volume distributed systemsincident commandercross-Atlantic engineering cultureon-call rotationSLOsinstrumentationgeographically distributed teamsorganizational tradeoffs

Tech Stack

Tools & technologies

AWSGoKafkaKubernetesTerraform

About the role

Key responsibilities & impact

Help Honeycomb scale our backend systems to support our highest-volume customers.
Build organizational trust through transparent communication, giving and receiving direct and kind feedback.
Work with other backend teams to dive deep into our stack to make sure we’re getting the most out of our infrastructure.
Be trained, become, and then train others as an Incident Commander.
Help SRE and Honeycomb develop a healthy cross-Atlantic engineering culture.
Participate in the team’s on-call rotation as the EU side of a new follow-the-sun rotation.
Help the organization navigate tradeoffs between reliability and its other goals and priorities.
Optional: act as an external ambassador through blog posts, conference talks, and presentations with support from our DevRel team.

Requirements

What you’ll need

Strong experience in AWS and Kubernetes.
Experience performing cost analysis and reduction.
Solid Helm, Terraform, and CI/CD experience.
Project management skills.
Software engineering experience (Golang is a plus, and so is performance engineering).
Experience with Kafka or another high-volume distributed system.
Excellent written and spoken communication skills, with the ability to tailor your communication for your audience and give direct feedback when you notice something wrong.
A curiosity to learn how people and systems work, and the willingness to make them partners in your initiatives.
Familiarity with observability concepts (SLOs, instrumentation) and data-driven decision making.
Comfort operating in ambiguity, with a bias for action and experimentation.
Interest in both the technical and human sides of reliability engineering.
Experience working in geographically distributed teams.

Benefits

Comp & perks

A stake in our success - generous equity with employee-friendly stock program
It’s not about how strong of a negotiator you are - our pay is based on transparent levels relative to experience
Time to recharge with unlimited PTO
A distributed-first mindset and culture (really!)
Home office, co-working, and internet stipend
Full benefits coverage for employees, with additional coverage available for dependents
Up to 16 weeks of paid parental leave, regardless of path to parenthood
Annual development allowance
And much more...