Director, Engineering Operations

Endeavor

Director of Engineering Operations at On Location overseeing operational excellence in core platforms. Focused on AI adoption and incident management for high-traffic events.

Posted 5/22/2026full-timeAustin • New York, Texas • 🇺🇸 United StatesLead💰 $150,000 - $200,000 per yearWebsite

Tech Stack

Tools & technologies

SDLC

About the role

Key responsibilities & impact

Own end-to-end engineering operations across RTB: intake, triage, prioritization, change/release governance, incident response, and post-mortems
Drive AI-enabled operational efficiency and automation across the SDLC, STLC, and SSDLC
Establish comprehensive observability with golden signals, SLIs/SLOs, anomaly detection, auto-remediation, and cost/capacity insights
Define and uphold SLOs for critical domains and guest journeys (checkout, inventory sync, fulfillment, payments)
Standardize Datadog logs, metrics, traces, and RUM/synthetics to accelerate detection and root-cause analysis
Continuously measure and improve delivery performance through DORA metrics
Enforce release discipline: balanced planned vs. unplanned releases, readiness criteria, rollback playbooks, and event blackout windows
Support major events with elevated operational rigor: dry runs, performance testing, strict change controls, enhanced monitoring, and clear comms protocols
Partner with Business Operations, Technical Product, and Solutions Architecture to maintain a single, aligned view of priorities, dependencies, and SLAs
Lead post-event and incident post-mortems to drive continuous improvement of SOPs, runbooks, response protocols, and reliability
Mature incident and security response in close partnership with TechOps and Security & Compliance (IRP/SIRP)
Continuously reduce technical debt across performance, security, and maintainability
Foster learning, blameless culture with KPI/OKR-driven improvements and transparent communication
Publish clear weekly and monthly operational health and stability reporting

Requirements

What you’ll need

10+ years in software engineering operations, site reliability engineering, platform or DevOps leadership supporting 24x7 systems
Experience leading and improving team performance measuring against DORA metrics
Proven track record leading incident response and postmortems, with measurable reductions in MTTD, MTTI, and MTTR and decreases in MTBF
Hands-on experience implementing observability and SLO/SLI frameworks
Strong background with CI/CD, trunk-based development, automated testing strategies, and release orchestration
Security-by-design mindset, experience with IRP/SIRP operations and DevSecOps practices
Excellent stakeholder management; effective and concise communication skills with both technical and non-technical audiences
Ability to lead and execute through ambiguity and high-demand, high-stakes events

Benefits

Comp & perks

health care
retirement
vacation and other paid time off

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

software engineering operationssite reliability engineeringDevOps leadershipDORA metricsincident responseobservabilitySLO frameworksCI/CDautomated testing strategiesrelease orchestration

Soft Skills

stakeholder managementeffective communicationleadershipproblem-solvingcontinuous improvementblameless culturetransparencyadaptabilityteam performance improvementexecution under ambiguity