Site Reliability Engineer

EngFlow

full-time

Posted on: 9/4/2025

Location: 🇺🇸 United States

✨ AI Apply

Mid-LevelSenior

AWSCloudDistributed SystemsGoogle Cloud PlatformKubernetesTerraform

About the role

Design, build, and maintain cloud infrastructure for our distributed build acceleration platform
Automate everything: from deployment pipelines to monitoring and recovery
Manage scalability and reliability for high-throughput, low-latency systems
Implement and maintain observability: logging, metrics, tracing, and alerting
Work closely with product and engineering teams to embed reliability into every feature
Diagnose and resolve production incidents quickly, and feed learnings back into systems design
Optimize cost, performance, and resilience across multi-cloud environments

4+ years in SRE, DevOps, or Production Engineering roles
Experience managing Kubernetes in production
Strong background in cloud infrastructure (GCP or AWS) and IaC (Terraform preferred)
Solid knowledge of networking, security, and distributed systems
Track record of improving system availability and developer productivity
A knack for debugging complex, cross-system issues under pressure