
VP, Site Reliability Engineer
Galaxy
full-time
Posted on:
Location Type: Remote
Location: Remote • 🇺🇸 United States
Visit company websiteJob Level
Lead
Tech Stack
AWSCloudGrafanaKubernetesPrometheusTerraform
About the role
- Architect, deploy, and maintain robust, scalable, secure AWS-based infrastructure.
- Drive adoption and optimization of EKS and Kubernetes for containerized workloads.
- Support migration initiatives, moving workloads from legacy VMs to containers in AWS.
- Implement and fine-tune SLOs, SLAs, and error budgets to balance innovation and stability.
- Collaborate on best practices with Security and Engineering teams for workload reliability.
- Build Infrastructure as Code (IaC) with Terraform; maintain compliant, repeatable environments.
- Enhance CI/CD pipelines for efficient, secure, and reliable cloud delivery.
- Develop and refine automated solutions for autoscaling, failover, and disaster recovery.
- Design and implement metrics, logging, and tracing tools (Datadog, OpenTelemetry).
- Set up robust monitoring and alerting to proactively detect and address failures.
- Lead incident analysis and post-mortems; drive improvements in operational playbooks.
- Serve as a subject matter expert for AWS, EKS, and cloud-native tooling within the SRE team.
- Optimize AWS resources, cost management, and resiliency best practices.
- Ensure secure key management and regulatory compliance for decentralized workloads.
Requirements
- 8+ years in SRE, DevOps, or Infrastructure Engineering (IC capacity preferred).
- Deep hands-on expertise in AWS, Kubernetes/EKS, and containerization.
- Extensive IaC experience (Terraform) and cloud-native automation.
- Proven track record migrating VM-based workloads to containers in AWS at scale.
- Strong experience with observability stacks (Datadog, Prometheus, Grafana, OpenTelemetry).
- Excellent analytical, problem-solving, and incident management abilities.
- Clear communicator who thrives in team environments, collaborating cross-functionally.
Benefits
- Galaxy respects diversity and seeks to provide equal employment opportunities to all employees and job applicants for employment.
- We will endeavor to make a reasonable accommodation to the known limitations of a qualified applicant with a disability.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
AWSKubernetesEKSTerraformIaCCI/CDcontainerizationobservabilityautomationdisaster recovery
Soft skills
analytical skillsproblem-solvingincident managementcommunicationcollaboration