Principal Reliability Engineer – EDS

The Hartford

Principal Reliability Engineer overseeing performance and reliability of data platforms and cloud infrastructure in an insurance company. Leading technical initiatives and embedding reliability principles across the data product lifecycle.

Posted 6/23/2026full-timeHartford • Connecticut, North Carolina • 🇺🇸 United StatesLead💰 $152,800 - $229,200 per yearWebsite

Tech Stack

Tools & technologies

AWSCloudDistributed SystemsGoogle Cloud PlatformGrafanaHadoopKubernetesPrometheusPythonSparkSplunkTerraform

About the role

Key responsibilities & impact

Serve as the senior technical authority for reliability, resilience, availability, and performance of data platforms, cloud infrastructure, and data products
Define and implement Reliability Engineering practices, tooling, automation, and observability frameworks
Influence architectural direction and lead cross-organizational technical initiatives
Embed RE principles into the data product lifecycle
Establish long-term RE roadmaps and architectural patterns
Oversee reliability controls and fail-safe patterns for critical data systems
Develop AI-driven automation for operations and implement observability frameworks
Partner with engineering and product teams to define RE best practices and ensure data quality

Requirements

What you’ll need

10+ years in data, cloud, platform engineering, site/reliability engineering, or large-scale distributed systems
Proficiency with data or cloud platforms including architectural patterns for resilience, networking, security
Experience supporting or engineering platforms such as Snowflake, EMR, Hadoop/Spark, Data Integration, cloud-native data ecosystems
Scripting and programming (preferably Python) for automation, platform tooling, and reliability frameworks
Experience with Infrastructure-as-Code (Terraform, CloudFormation) and enterprise CI/CD
Experience in regulated or highly complex environments (financial services, insurance, healthcare)
Knowledge of data governance, metadata, lineage systems, and data quality engineering practices
Certifications in AWS, GCP, Kubernetes, or SRE/DevOps frameworks
Background applying AI and AIOps to operations for anomaly detection and automated remediation
Expertise with observability stacks like Prometheus, Grafana, Datadog, Splunk, Dynatrace, OpenTelemetry
Ability to lead technical strategy, influence engineering leaders, and mentor engineers

Benefits

Comp & perks

Health insurance
401(k) matching
Flexible work arrangements
Professional development opportunities
Short-term or annual bonuses
Long-term incentives
On-the-spot recognition

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Reliability Engineeringdata platformscloud infrastructureAI-driven automationobservability frameworksscriptingprogrammingInfrastructure-as-Codedata governancedata quality engineering

Soft Skills

leadershipinfluencementoringcollaborationstrategic thinking

Certifications

AWSGCPKubernetesSREDevOps