Dimensional Fund Advisors

Senior Site Reliability Engineer – Workflow Automation

Dimensional Fund Advisors

full-time

Posted on:

Location Type: Hybrid

Location: AustinNorth CarolinaTexasUnited States

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Serve as a primary escalation point for production support involving Airflow and UC4 – assisting end-user inquiries, incident root cause analysis, and implementing go-forward solutions
  • Own and continuously improve SLOs, SLIs, and error budgets for orchestration platforms
  • Monitor platform health, capacity, and performance; proactively identify and remediate issues before they impact users
  • Partner with data engineering and application teams to troubleshoot DAG failures, job dependencies, and scheduling issues
  • Manage patching, upgrades, and configuration management for Airflow and UC4 environments
  • Collaborate with security to harden platform configurations and manage software vulnerabilities
  • Contribute to on-call rotations and maintain runbooks and escalation procedures
  • Design and build tooling and automation to reduce toil and improve developer experience for teams that depend on Airflow and UC4
  • Lead or contribute to platform modernization initiatives– e.g., migrating workloads, improving deployment pipelines, containerizing components, or adopting managed service offerings
  • Develop and maintain infrastructure-as-code (Terraform, Helm, Ansible, etc.) for platform components
  • Build observability solutions (e.g., dashboards, alerting, log aggregation) that give teams better visibility into their workflows
  • Build and enforce standards around platform use that help engineering teams adopt best practices at scale
  • Participate in design reviews and contribute to the overall platform roadmap

Requirements

  • Bachelor’s degree in a technical field or equivalent practical experience
  • 5+ years of experience in SRE, DevOps, or platform engineering roles
  • Deep hands-on experience with Apache Airflow – ideally including distributed executor configurations (Celery or Kubernetes), DAG authoring best practices, and multi-environment deployments
  • Experience operating enterprise job scheduling platforms (e.g., Automic/UC4, Control-M, etc.)
  • Strong Linux and Windows systems knowledge and comfort working in cloud environments (AWS preferred)
  • Proficiency in Python for automation and tooling; familiarity with shell scripting
  • Experience with container orchestration (Kubernetes, Docker) and CI/CD pipelines
  • Solid understanding of observability principles – metrics, logging, tracing – and tools like ELK, Grafana, and Prometheus
  • Demonstrated ability to drive incidents to resolution and communicate clearly under pressure
  • A bias toward automation and a low tolerance for repetitive manual work
Benefits
  • comprehensive benefits
  • educational initiatives
  • special celebrations of our history, culture, and growth
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Apache AirflowUC4TerraformHelmAnsiblePythonKubernetesDockerCI/CDobservability
Soft Skills
incident resolutioncommunication under pressurecollaborationproblem-solvingleadership