Gridware

Senior Site Reliability Engineer

Gridware

full-time

Posted on:

Location Type: Hybrid

Location: San Francisco • California • 🇺🇸 United States

Visit company website
AI Apply
Apply

Job Level

Senior

Tech Stack

AWSDistributed SystemsEC2GrafanaKafkaKubernetesPrometheusTerraform

About the role

  • Design, build, and maintain scalable, secure, and highly available infrastructure on AWS (EKS, EC2, RDS,MSK, S3, VPC …).
  • Manage and optimize Kubernetes clusters (EKS) and deploy applications using ArgoCD with GitOps best practices.
  • Implement and maintain CI/CD pipelines using GitHub Actions (GHA), ensuring fast, reliable, and automated software delivery.
  • Build and support Kafka-based event streaming platforms using Amazon MSK for high-throughput, low-latency data pipelines.
  • Manage identity and access across platforms with IdP integration (Okta, Auth0, or similar).
  • Define and manage Infrastructure as Code with Terraform
  • Monitor, troubleshoot, and optimize system performance, cost, and reliability using observability tools like Grafana and Loki.

Requirements

  • 5+ years in DevOps/SRE/Platform Engineering, with production experience in AWS infrastructure management.
  • Deep knowledge of Kubernetes administration and GitOps tools like ArgoCD.
  • Proficiency with Infrastructure as Code with Terraform
  • Hands-on experience with CI/CD automation and pipelines (preferably GitHub Actions).
  • Expertise in running and maintaining distributed systems such as Kafka on MSK and relational databases (RDS).
  • Strong understanding of networking, security best practices, and IdP-driven access control.
  • Experience with monitoring and logging solutions (Grafana,Loki, Prometheus, or similar).
  • Ability to debug complex production issues across infrastructure, deployment, and networking layers.
Benefits
  • Health, Dental & Vision (Gold and Platinum with some providers plans fully covered)
  • Paid parental leave
  • Alternating day off (every other Monday)
  • “Off the Grid”, a two week per year paid break for all employees.
  • Commuter allowance
  • Company-paid training

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
AWSKubernetesGitOpsArgoCDCI/CDGitHub ActionsTerraformKafkaMSKRDS
Soft skills
troubleshootingoptimizationproblem-solving
Relevance AI

Senior Site Reliability Engineer

Relevance AI
Seniorfull-timeCalifornia · 🇺🇸 United States
Posted: 10 hours agoSource: jobs.ashbyhq.com
AWSEC2GrafanaKubernetesMicroservicesPrometheusSDLCTerraform
Adobe

Senior Site Reliability Engineer

Adobe
Seniorfull-time$134k–$242k / yearCalifornia, New York · 🇺🇸 United States
Posted: 2 days agoSource: adobe.wd5.myworkdayjobs.com
Cloud
EEOC

DevOps Engineer

EEOC
Mid · Seniorfull-time$78k–$176k / yearAlabama, California, Colorado, Virginia · 🇺🇸 United States
Posted: 2 days agoSource: bah.wd1.myworkdayjobs.com
AWSAzureCloudDockerJenkinsKubernetes
GEICO

Senior Staff Engineer, Software Engineering – CICD, DevOps, Change Management

GEICO
Seniorfull-time$130k–$260k / yearCalifornia, Maryland, Washington · 🇺🇸 United States
Posted: 3 days agoSource: geico.wd1.myworkdayjobs.com
AWSAzureCloudDockerGoogle Cloud PlatformKubernetesNoSQLPythonSQL