Tech Stack
AnsibleCloudGoogle Cloud PlatformGrafanaKubernetesPrometheusTerraform
About the role
- Act as the architect and defender of infrastructure, guaranteeing maximum uptime and performance for millions of players
- Ensure smooth, highly available operation of Kubernetes clusters; scale, maintain, and optimize cluster environments
- Design and manage Google Cloud (GCP) infrastructure (GKE, Compute Engine, Networking, IAM)
- Design infrastructure as code using Terraform to build automated, reproducible infrastructures
- Automate configuration management and deployments using Ansible
- Develop and maintain monitoring, logging, and alerting systems (e.g., Prometheus & Grafana) to detect problems proactively
- Collaborate with teams to maintain platform stability, resilience, and performance
Requirements
- Deep, hands-on experience operating, managing, and troubleshooting Kubernetes clusters in production
- Strong knowledge of Google Cloud Platform (GCP) core services, especially GKE, Compute Engine, Networking, and IAM
- Fluent in Terraform with proven experience designing and managing complex infrastructures
- Strong expertise in Ansible for configuration management and automations
- Experience developing and maintaining monitoring, logging, and alerting systems (e.g., Prometheus & Grafana)
- Familiarity with SLAs, SLOs, and error budgets; focus on high availability and resilience
- Understanding of players' expectations (Gamer Mindset)