Tech Stack
AWSCloudDistributed SystemsGoogle Cloud PlatformKubernetesTerraform
About the role
- Design, build, and maintain cloud infrastructure for our distributed build acceleration platform
- Automate everything: from deployment pipelines to monitoring and recovery
- Manage scalability and reliability for high-throughput, low-latency systems
- Implement and maintain observability: logging, metrics, tracing, and alerting
- Work closely with product and engineering teams to embed reliability into every feature
- Diagnose and resolve production incidents quickly, and feed learnings back into systems design
- Optimize cost, performance, and resilience across multi-cloud environments
Requirements
- 4+ years in SRE, DevOps, or Production Engineering roles
- Experience managing Kubernetes in production
- Strong background in cloud infrastructure (GCP or AWS) and IaC (Terraform preferred)
- Solid knowledge of networking, security, and distributed systems
- Track record of improving system availability and developer productivity
- A knack for debugging complex, cross-system issues under pressure