Collaborate with product engineering teams to design and build the infrastructure their services run on.
Keep our Kubernetes clusters on AWS EKS running smoothly, secure, and ready to scale.
Design and deliver resilience strategies that cover multi-region architecture, backups, disaster recovery, and failover.
Automate infrastructure with Terraform and Infrastructure-as-Code, reducing manual effort and human error.
Help teams ship faster by improving CI/CD pipelines and deployment practices.
Monitor performance and reliability using modern observability tools.
Support on-call rotations and lead incident response with a focus on long-term fixes.
Requirements
You code to solve problems and are comfortable in one of the following languages: Python, Bash, Go, Java, or similar.
You have strong experience with AWS (RDS, CloudFront, IAM, VPCs), Terraform, and Kubernetes.
You are resilience focused, with experience designing and running systems that remain dependable during failures and recover seamlessly.
You have hands-on experience improving and operating CI/CD pipelines (e.g., CircleCI, GitHub Actions, or similar) to help teams ship faster with confidence.
You stay calm under pressure, bringing incident response expertise and strong root-cause analysis skills.
Most importantly, you are a team player who brings clear communication, strong collaboration, and a mindset of continuous improvement.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.