Fetcherr

DevOps Team Leader

Fetcherr

full-time

Posted on:

Origin:  • 🇺🇸 United States • Florida

Visit company website
AI Apply
Manual Apply

Job Level

Senior

Tech Stack

AirflowCloudGoogle Cloud PlatformGrafanaJenkinsKubernetesPrometheusPythonSQLTerraform

About the role

  • Lead, mentor, and grow a team of DevOps engineers, fostering a culture of ownership, excellence, and continuous improvement
  • Architect, maintain, and optimize cloud infrastructure on Google Cloud Platform (GCP) for scalability, performance, and security
  • Oversee Kubernetes and Terraform environments in production, ensuring high availability and efficient deployments
  • Drive automation in CI/CD pipelines, release processes, and infrastructure management
  • Implement and maintain robust monitoring, alerting, and logging systems (Prometheus, EFK, GCP Monitoring) to ensure proactive incident detection and resolution
  • Collaborate with development, data, and product teams to design and deliver infrastructure that meets evolving product and customer needs
  • Establish infrastructure as code (IaC) standards and ensure consistent adoption across the team
  • Manage and improve internal tooling for infrastructure, deployments, and developer productivity
  • Stay up to date with emerging technologies and evaluate their potential impact on the platform

Requirements

  • 6+ years in DevOps, Site Reliability Engineering, or Software Configuration Management roles
  • 2+ years in a team leadership or management position, with proven ability to mentor and guide engineers
  • Strong experience managing Kubernetes in production and writing/maintaining complex Helm charts
  • Proven expertise in Terraform and Infrastructure as Code principles
  • Proficiency in Bash, Python, and at least one additional scripting or programming language
  • Hands-on experience with GCP services and deployments
  • Experience with CI/CD tools and pipelines (ArgoCD, Jenkins, GitLab CI, etc.)
  • Solid monitoring and alerting background with Prometheus, Grafana, and GCP Monitoring
  • Strong problem-solving skills and the ability to handle production incidents calmly and effectively
  • Excellent communication skills, with the ability to work cross-functionally and present technical concepts to both technical and non-technical audiences
  • Nice to have: Experience with ArgoCD or Kubernetes operator development
  • Nice to have: Exposure to Big Data or ML Ops environments
  • Nice to have: Familiarity with Airflow, Kubeflow, or MLFlow
  • Nice to have: DBA experience and strong SQL skills
  • Nice to have: Experience in Agile/Scrum environments