Senior Site Reliability Engineer – Observability

Dimensional Fund Advisors

full-time

Posted on: 2/27/2026

Location Type: Hybrid

Location: Austin • North Carolina • Texas • United States

Visit company website

Explore more

DevOps Engineer jobs

✨ AI Apply

Apply

Job Level

Senior

Tech Stack

Ansible Chef Cloud ElasticSearch Grafana Linux Logstash Puppet Python Shell Scripting Terraform

About the role

Serve as a primary escalation point for production support involving the ELK Stack, Grafana, and New Relic
Own platform health, capacity planning, and performance tuning for on-premises observability infrastructure – including Elasticsearch cluster management, index lifecycle policies, and retention strategies
Monitor and maintain SLOs for the observability platforms, ensuring the tools engineers depend on are highly available and performant
Support engineering teams in onboarding to observability platforms – helping teams instrument their applications, build dashboards, and define meaningful alerts
Manage patching, upgrades, and configuration management across the observability stack
Collaborate with security to harden platform configurations and manage software vulnerabilities
Contribute to on-call rotations and maintain runbooks and escalation procedures
Design and build tooling/automation to reduce toil and improve the experience for teams using observability platforms
Lead or contribute to platform modernization initiatives – e.g., improving ingestion pipelines, scaling platform capacity, standardizing Grafana dashboard and alerting patterns, or evaluating new capabilities within the existing stack
Develop and maintain infrastructure-as-code (Terraform, Helm, Ansible, etc.) for platform components
Build and enforce standards around logging metrics and alerting that help engineering teams adopt observability best practices at scale
Participate in design reviews and contribute to the overall platform roadmap

Requirements

Bachelor’s degree in a technical field or equivalent practical experience
5+ years of experience in SRE, DevOps, or platform engineering roles
Deep hands-on experience with the ELK Stack – Elasticsearch cluster operations, Logstash pipeline development, Kibana, and index lifecycle management
Strong experience with Grafana, including data source integrations, dashboard design, and alerting
Solid understanding of observability principles
Experience operating on-premises infrastructure, including capacity planning, server management, and the operational tradeoffs with managed cloud services
Proficiency in Python for automation and tooling; familiarity with shell scripting
Strong Linux systems knowledge and comfort working with configuration management tools (e.g., Ansible, Chef, Puppet, etc.)
Demonstrated ability to drive incidents to resolution and communicate clearly under pressure
A bias toward automation and a low tolerance for repetitive manual work

Benefits

comprehensive benefits
educational initiatives
special celebrations of our history, culture, and growth

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

ELK StackElasticsearchLogstashKibanaGrafanaTerraformHelmAnsiblePythonLinux

Soft Skills

communicationincident resolutioncollaborationleadershipcapacity planningperformance tuningautomationproblem-solvingorganizational skillsadaptability

Certifications

Bachelor’s degree in a technical field