Senior Software Engineer – Site Reliability Engineering

The Home Depot

Senior Software Engineer for Site Reliability Engineering at Home Depot. Building and operating internal platforms for store systems' reliability and observability.

Posted 4/29/2026full-timeRemote • 🇺🇸 United StatesSenior💰 $90,000 - $180,000 per yearWebsite

Tech Stack

Tools & technologies

BigQueryCloudGoGoogle Cloud PlatformJavaScriptKubernetesPythonSeleniumSpinnakerTerraformTypeScript

About the role

Key responsibilities & impact

Develops, tests, deploys, and maintains software for internal platforms
Designs, develops, and maintains tools for reliability engineering teams
Extends internal reliability tools using Kubernetes, Terraform on Google Cloud Platform
Deploys and maintains production logging, tracing, and profiling systems
Identifies and automates repetitive operational tasks
Maintains and extends SLO and Critical User Journey platforms
Participates in on-call rotation and contributes to incident response

Requirements

What you’ll need

3-5 years of experience in Site Reliability Engineering, Platform Engineering, DevOps, or Infrastructure Engineering
Hands-on experience with Google Cloud Platform (GCP), including GKE, GCS, BigQuery, Cloud Pub/Sub, Cloud Logging, IAM, and Workload Identity.
Strong Kubernetes experience: deploying and managing workloads on GKE or similar managed Kubernetes services, writing and debugging Helm charts, managing namespaces, RBAC, service accounts, and troubleshooting issues
Experience with infrastructure-as-code tools, particularly Terraform for cloud resource management.
Proficiency in one or more of: Go, Python, JavaScript/TypeScript, YAML.
Experience with observability platforms: deploying, configuring, or operating log aggregation, distributed tracing, metrics, dashboarding, or continuous profiling
Practical understanding of SLOs, SLIs, and error budgets.
Experience with synthetic monitoring or performance testing frameworks (k6, Playwright, Selenium, Locust, or similar).
Familiarity with incident management and on-call practices: Blameless post-mortems, runbook development, and incident communication
Experience with CI/CD pipelines using GitHub Actions, Spinnaker, ArgoCD, or similar.
Understanding of deployment strategies (blue/green, canary, rolling).

Benefits

Comp & perks

Health insurance
401(k) matching
Flexible work hours
Paid time off
Remote work options

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

KubernetesTerraformGoogle Cloud PlatformGoPythonJavaScriptTypeScriptYAMLobservability platformssynthetic monitoring

Soft Skills

incident responseon-call practicescommunicationblameless post-mortemsrunbook development