Salary
💰 $170,000 - $240,000 per year
Tech Stack
ApacheAWSCloudDistributed SystemsDockerEC2ElasticSearchGoGrafanaJenkinsKubernetesOpen SourcePrometheusPythonTerraform
About the role
- Lead and own scaling AWS infrastructure to support thousands of servers and high-growth workloads.
- Design and implement automation frameworks for provisioning, monitoring/logging, scaling, and recovery to minimize manual operations.
- Continuously evaluate and tune systems for latency, throughput, and cost efficiency.
- Build resilient, self-healing, and observable systems using SLOs, error budgets, and reliability best practices.
- Partner closely with Development, QA, and Product Engineering teams to deliver highly available and performant systems.
- Own on-call processes, lead incident management and root-cause analysis, and implement preventive measures.
- Mentor engineers, act as technical leader, and set standards for best practices.
Requirements
- 7+ years in Site Reliability, DevOps, or Infrastructure Engineering roles.
- Startup experience and track record of scaling infrastructure to thousands of servers.
- Hands-on mastery of AWS services (EC2, EKS, RDS, S3, CloudFront, VPC, IAM).
- Proficiency in Infrastructure as Code (Terraform, CloudFormation, or similar tools).
- Strong automation and scripting skills in Python, Go, or similar languages (beyond basic scripting).
- Expertise with monitoring & observability tools (Prometheus, Grafana, Loki, ELK/EFK, Datadog).
- Experience with CI/CD and containers (Docker, Kubernetes, Jenkins or GitHub Actions).
- Performance engineering experience: identify bottlenecks and optimize systems for scalability and efficiency.
- Proven problem solving diagnosing complex production issues at scale.
- Experience designing, deploying, and managing multi-region, highly available AWS architectures in production.
- Experience owning end-to-end observability and leading production incident response/root-cause analysis.
- Legal authorization to work in the United States (E-Verify and application questions indicate requirement).