The Walt Disney Company

Senior Systems Reliability Engineer

The Walt Disney Company

full-time

Posted on:

Origin:  • 🇺🇸 United States • California

Visit company website
AI Apply
Apply

Salary

💰 $152,100 - $203,900 per year

Job Level

Senior

Tech Stack

AWSCloudDistributed SystemsDockerEC2FluxGoKubernetesPackerPythonPyTorchSplunkTensorflowTerraformVMware

About the role

  • Design, manage and maintain critical infrastructure for both software development and deployed global production resources.
  • Collaborate on the provisioning of cloud infrastructure in AWS using Terraform to ensure consistency and scalability.
  • Maintain and manage multiple Kubernetes clusters across both cloud and on-premise environments.
  • Implement and enforce best practices for secure software development and deployment in alignment with industry standards.
  • Monitor, troubleshoot, and optimize build and deployment processes to maximize efficiency and minimize downtime.
  • Collaborate with cross-functional teams, including developers and security experts, to ensure systems meet operational requirements.
  • Develop, maintain, and enhance CI/CD pipelines using GitLab to support build automation, unit testing, and integration testing.
  • Continuously evaluate and implement tools and technologies to improve workflows and platform reliability.

Requirements

  • BS Degree in Computer Science
  • 5+ years of experience in DevOps, Site Reliability Engineering, or a related field.
  • Extensive AWS knowledge: EC2, ECS/EKS, Lambda, ELB, ASGs, Route53, KMS, SSM, IAM, S3, ACM, VPC, RDS, Elasticache.
  • Proficiency with modern observability practices: application monitoring, tracing, and profiling tools (e.g. Datadog, New Relic, OpenTelemetry, Splunk).
  • Proficiency with GitLab CI, Terraform, Helm, and Packer
  • Demonstrated experience designing and managing CI/CD pipelines for complex software platforms.
  • In-depth knowledge of Containers and Container Orchestration technologies: Docker, Kubernetes
  • Experience with Terraform or other infrastructure as code tooling.
  • Strong scripting skills in Python, Bash, or similar languages.
  • Familiarity with modern security practices for protecting sensitive assets in distributed systems.
  • Exceptional problem-solving skills, with a proactive and collaborative mindset.
  • Preferred: Experience working with media and entertainment pipelines or pre-release content workflows.
  • Preferred: Proficiency with Golang, Python, or C++
  • Preferred: Experience with modern AI/ML frameworks (e.g., TensorFlow, PyTorch, Hugging Face) and their integration into operational workflows.
  • Preferred: Knowledge of container security tools and systems, such as Falco or Aqua Security.
  • Preferred: Experience with emerging deployment systems like ArgoCD or Flux for GitOps workflows.
  • Preferred: Familiarity with serverless computing paradigms and technologies such as AWS Lambda or Google Cloud Run/Functions.
  • Preferred: Understanding of high-performance computing systems in cloud environments.
  • Preferred: Experience with administering VMWare vSphere clusters.
Rescale

HPC Engineer, R&D

Rescale
Mid · Seniorfull-time$100k–$150k / year🇺🇸 United States
Posted: 16 days agoSource: jobs.ashbyhq.com
AWSAzureCloudLinuxPythonTerraformUnix
Samsara

Senior Data Platform Engineer

Samsara
Seniorfull-time$126k–$169k / year🇺🇸 United States
Posted: 12 hours agoSource: boards.greenhouse.io
AWSCloudEC2ETLIoTPythonRDBMSSQLTerraformUnity
Truelogic Software

Staff DevOps Engineer, AWS – Health Care

Truelogic Software
Leadfull-time🇨🇴 Colombia
Posted: 7 days agoSource: jobs.ashbyhq.com
AWSCloudDockerEC2GoGrafanaJenkinsKubernetesLinuxPrometheusPythonTerraform
Truelogic Software

Staff DevOps Engineer, AWS – Health Care

Truelogic Software
Leadfull-time🇩🇴 Dominican Republic
Posted: 7 days agoSource: jobs.ashbyhq.com
AWSCloudDockerEC2GoGrafanaJenkinsKubernetesLinuxPrometheusPythonTerraform
Truelogic Software

Staff DevOps Engineer, AWS – Health Care

Truelogic Software
Leadfull-time🇲🇽 Mexico
Posted: 7 days agoSource: jobs.ashbyhq.com
AWSCloudDockerEC2GoGrafanaJenkinsKubernetesLinuxPrometheusPythonTerraform