Tech Stack
AirflowCloudGoogle Cloud PlatformGrafanaJenkinsKubernetesPrometheusPythonSQLTerraform
About the role
- Lead, mentor, and grow a team of DevOps engineers, fostering a culture of ownership, excellence, and continuous improvement
- Architect, maintain, and optimize cloud infrastructure on Google Cloud Platform (GCP) for scalability, performance, and security
- Oversee Kubernetes and Terraform environments in production, ensuring high availability and efficient deployments
- Drive automation in CI/CD pipelines, release processes, and infrastructure management
- Implement and maintain robust monitoring, alerting, and logging systems (Prometheus, EFK, GCP Monitoring) to ensure proactive incident detection and resolution
- Collaborate with development, data, and product teams to design and deliver infrastructure that meets evolving product and customer needs
- Establish infrastructure as code (IaC) standards and ensure consistent adoption across the team
- Manage and improve internal tooling for infrastructure, deployments, and developer productivity
- Stay up to date with emerging technologies and evaluate their potential impact on the platform
Requirements
- 6+ years in DevOps, Site Reliability Engineering, or Software Configuration Management roles
- 2+ years in a team leadership or management position, with proven ability to mentor and guide engineers
- Strong experience managing Kubernetes in production and writing/maintaining complex Helm charts
- Proven expertise in Terraform and Infrastructure as Code principles
- Proficiency in Bash, Python, and at least one additional scripting or programming language
- Hands-on experience with GCP services and deployments
- Experience with CI/CD tools and pipelines (ArgoCD, Jenkins, GitLab CI, etc.)
- Solid monitoring and alerting background with Prometheus, Grafana, and GCP Monitoring
- Strong problem-solving skills and the ability to handle production incidents calmly and effectively
- Excellent communication skills, with the ability to work cross-functionally and present technical concepts to both technical and non-technical audiences
- Nice to have: Experience with ArgoCD or Kubernetes operator development
- Nice to have: Exposure to Big Data or ML Ops environments
- Nice to have: Familiarity with Airflow, Kubeflow, or MLFlow
- Nice to have: DBA experience and strong SQL skills
- Nice to have: Experience in Agile/Scrum environments