Salary
💰 $6,000 - $7,000 per month
Tech Stack
CloudElasticSearchGoogle Cloud PlatformGrafanaKubernetesMicroservicesPostgresTerraform
About the role
- Lead a team of 4 DevOps engineers, setting priorities and mentoring team members
- Define and maintain best practices for cloud-native infrastructure
- Build and manage GitLab CI/CD pipelines with ArgoCD using GitOps workflows
- Administer and scale GKE clusters for secure, cost-efficient, and high-availability microservices
- Manage GCP infrastructure with Terraform (Infrastructure as Code)
- Implement observability with OpenTelemetry, Grafana, Datadog, and Sentry
- Oversee PostgreSQL, Elasticsearch, and ClickHouse database performance and reliability
- Develop and execute disaster recovery and incident response strategies
- Collaborate with backend, frontend, and QA teams to improve deployment reliability
- Lead incident management, escalation, and post-incident reviews
Requirements
- 5+ years of DevOps or SRE experience, including 3+ years in a leadership capacity.
- Expert in Kubernetes (GKE) and GitOps workflows with ArgoCD.
- Deep experience in GCP infrastructure and Terraform-based IaC.
- Hands-on experience with OpenTelemetry for distributed tracing and system instrumentation.
- Strong expertise in Grafana, Datadog, and Sentry.
- Operational experience with PostgreSQL, Elasticsearch, and ClickHouse in production environments.
- Strong debugging, incident management, and communication skills.
- Proven track record of leading DevOps initiatives in fast-paced environments.
- Nice to Have Helm, Kustomize, and OPA/Gatekeeper experience.
- Cloud cost optimization experience on GCP.
- Familiarity with secret management solutions.
- Experience with blue-green or canary deployments.
- 15 Days Paid Time Off + Christmas Day + New Year's Day Paid Off
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
DevOpsSREKubernetesGKEGitOpsTerraformOpenTelemetryGrafanaDatadogPostgreSQL
Soft skills
leadershipmentoringcommunicationincident managementdebugging