The Hartford

Staff Reliability Engineer

The Hartford

full-time

Posted on:

Location Type: Hybrid

Location: ColumbusConnecticutIllinoisUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $127,600 - $191,400 per year

Job Level

About the role

  • Lead the design, implementation, and optimization of reliable systems and infrastructure.
  • Collaborate with software engineering, operations, and product teams to ensure uptime and availability targets are met.
  • Develop and maintain monitoring, alerting, and incident response strategies to detect and resolve issues quickly.
  • Conduct root cause analysis of system failures and drive corrective actions to prevent recurrence.
  • Advocate for reliability best practices and foster a culture of proactive risk mitigation across the organization.
  • Mentor and provide technical guidance to other reliability engineers and cross-functional team members.
  • Develop automation tools to enhance efficiency in deployment, monitoring, and recovery processes.
  • Participate in capacity planning, performance testing, and disaster recovery exercises.
  • Stay current with industry trends, emerging technologies, and best practices in reliability engineering.

Requirements

  • 5+ years of experience in reliability engineering, site reliability engineering (SRE), or related roles.
  • Expertise in cloud platforms (e.g., AWS, Azure, Google Cloud) and container orchestration (e.g., Kubernetes).
  • Strong programming skills in one or more languages (e.g., Python, Java).
  • Proven experience with logging and monitoring tools (e.g., Splunk, Dynatrace, Datadog) and incident management frameworks (e.g. ServiceNow).
  • Excellent analytical, troubleshooting, and communication skills.
  • Ability to lead complex projects and influence stakeholders at all levels.
Benefits
  • Short-term or annual bonuses
  • Long-term incentives
  • On-the-spot recognition

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
reliability engineeringsite reliability engineeringcloud platformscontainer orchestrationprogramminglogging toolsmonitoring toolsincident management frameworksautomation toolscapacity planning
Soft skills
analytical skillstroubleshooting skillscommunication skillsleadershipmentoringcollaborationinfluencing stakeholdersproactive risk mitigationtechnical guidanceproject management