Tech Stack
CloudDockerGoGrafanaKubernetesPrometheusPythonTerraform
About the role
- Reporting to the Sr Director of IT and DevOps, lead infrastructure and delivery operations across multiple brands and product lines
- Lead a team of DevOps engineers supporting multiple cloud environments, CI/CD pipelines, and incident response processes
- Mentor engineers on technical skills and communication/teamwork dynamics
- Lead agile ceremonies for core DevOps engineering teams including sprint planning, commitments, velocity and retrospectives
- Identify and drive investments in automation to improve efficiency
- Collaborate with product and engineering leaders to align deployment and SRE practices with agile methodologies
- Partner with engineering teams to improve deployment velocity, rollback safety, and environment parity
- Participate in incident response reviews and postmortems; establish a culture of reliability and learning
- Support documentation, runbooks, and internal knowledge sharing via Jira and Confluence
- Guide and evolve infrastructure-as-code practices using Terraform, Helm, and related tooling
- Drive containerization and Kubernetes orchestration best practices across brands and platforms
- Oversee observability tooling and alerting, including Grafana, Prometheus, Loki, PagerDuty, and Opsgenie
- Own operational standards for CI/CD platforms such as GitHub Actions and CircleCI
- Hands-on focus on tooling, environments, automation, and observability
Requirements
- 5+ years in DevOps, SRE, or infrastructure engineering roles
- 2+ years in a lead or managerial capacity
- Proven experience managing and deploying Kubernetes workloads in production
- Deep knowledge of infrastructure as code (Terraform, Helm, or equivalent)
- Hands-on experience with containerization (Docker)
- Experience with CI/CD platforms (GitHub Actions, CircleCI)
- Proficiency with observability stacks (Grafana, Prometheus, Loki)
- Experience with alerting and on-call tools (PagerDuty, Opsgenie)
- Familiarity with GitHub, Jira, and Confluence in engineering workflows
- Ability to lead agile DevOps engineering processes and ceremonies (Scrum/Kanban)
- Ability to schedule repeated operational activities (upgrades, maintenance windows, health checks)
- Experience defining on-call schedules and incident response processes
- Strong communication and collaboration skills
- Comfortable scripting in Bash, Python, or Go for automation tasks
- Exposure to cloud-native security practices, RBAC, and secrets management (preferred)
- Familiarity with cost optimization and cloud spend reporting (preferred)
- Experience with incident command or formalized incident response frameworks (preferred)
- Prior involvement in DevOps maturity assessments, tooling selection, or team scaling (preferred)