Lead Site Reliability Engineer

DraftKings Inc.

. Lead SRE initiatives across multiple projects and products, collaborating with cross-functional teams to shape platform and infrastructure engineering efforts across the organization.

Posted 4/2/2026full-timeRemote • 🇺🇸 United StatesSenior💰 $148,000 - $185,000 per yearWebsite

Tech Stack

Tools & technologies

AnsibleAWSChefCloudDockerElixirGoGoogle Cloud PlatformIoTJavaKubernetesLinux.NETPythonRubyTerraform

About the role

Key responsibilities & impact

Lead SRE initiatives across multiple projects and products, collaborating with cross-functional teams to shape platform and infrastructure engineering efforts across the organization.
Drive technical excellence by mentoring and guiding engineers, fostering a culture of continuous learning and innovation.
Architect and automate self-healing, fault-tolerant infrastructure with declarative configurations, GitOps, and event-driven automation for scalable deployments across public clouds and on-premise.
Design, develop, and maintain software-driven infrastructure automation to build internal tools and eliminate repetitive operational tasks.
Own and drive decisions on product deployment, performance tuning, monitoring, and alerting to ensure high availability and system efficiency in production.
Define key metrics and SLAs around new web services being created to support our rapid traffic growth.
Design and implement monitoring and alerting strategies to enforce application SLAs.

Requirements

What you’ll need

At least 6 years of experience managing distributed cloud environments (GCP, AWS, vSphere, Nutanix) and platform automation at scale.
Deep expertise in container orchestration (Kubernetes) and container runtimes (Docker, containers), with the ability to design, scale, and troubleshoot complex workloads.
Expert-level understanding of networking and web concepts, with the ability to debug issues down to the packet level.
Strong experience developing software for automation and infrastructure tooling (Go, Python).
Strong understanding of Linux-based operating systems, including performance tuning, bootloaders, storage, partitioning, kernel debugging, and low-level system optimizations.
Experience with Infrastructure as Code (IaC) and configuration management tools (Terraform, Ansible, Chef, etc.), ensuring scalable and repeatable infrastructure provisioning.
Understanding of applications written in various programming languages (C#/.NET, Java, Elixir, Ruby, etc).
Experience in AWS Greengrass IoT management and A/B booting.

Benefits

Comp & perks

bonus
equity
benefits as applicable

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

GitOpsevent-driven automationinfrastructure automationKubernetesDockerGoPythonTerraformAnsibleChef

Soft Skills

mentoringcollaborationcontinuous learninginnovationdecision making