Galaxy

VP, Site Reliability Engineer

Galaxy

full-time

Posted on:

Location Type: Remote

Location: Remote • 🇺🇸 United States

Visit company website
AI Apply
Apply

Job Level

Lead

Tech Stack

AWSCloudGrafanaKubernetesPrometheusTerraform

About the role

  • Architect, deploy, and maintain robust, scalable, secure AWS-based infrastructure.
  • Drive adoption and optimization of EKS and Kubernetes for containerized workloads.
  • Support migration initiatives, moving workloads from legacy VMs to containers in AWS.
  • Implement and fine-tune SLOs, SLAs, and error budgets to balance innovation and stability.
  • Collaborate on best practices with Security and Engineering teams for workload reliability.
  • Build Infrastructure as Code (IaC) with Terraform; maintain compliant, repeatable environments.
  • Enhance CI/CD pipelines for efficient, secure, and reliable cloud delivery.
  • Develop and refine automated solutions for autoscaling, failover, and disaster recovery.
  • Design and implement metrics, logging, and tracing tools (Datadog, OpenTelemetry).
  • Set up robust monitoring and alerting to proactively detect and address failures.
  • Lead incident analysis and post-mortems; drive improvements in operational playbooks.
  • Serve as a subject matter expert for AWS, EKS, and cloud-native tooling within the SRE team.
  • Optimize AWS resources, cost management, and resiliency best practices.
  • Ensure secure key management and regulatory compliance for decentralized workloads.

Requirements

  • 8+ years in SRE, DevOps, or Infrastructure Engineering (IC capacity preferred).
  • Deep hands-on expertise in AWS, Kubernetes/EKS, and containerization.
  • Extensive IaC experience (Terraform) and cloud-native automation.
  • Proven track record migrating VM-based workloads to containers in AWS at scale.
  • Strong experience with observability stacks (Datadog, Prometheus, Grafana, OpenTelemetry).
  • Excellent analytical, problem-solving, and incident management abilities.
  • Clear communicator who thrives in team environments, collaborating cross-functionally.
Benefits
  • Galaxy respects diversity and seeks to provide equal employment opportunities to all employees and job applicants for employment.
  • We will endeavor to make a reasonable accommodation to the known limitations of a qualified applicant with a disability.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
AWSKubernetesEKSTerraformIaCCI/CDcontainerizationobservabilityautomationdisaster recovery
Soft skills
analytical skillsproblem-solvingincident managementcommunicationcollaboration