Salary
💰 $165,000 - $185,000 per year
Tech Stack
AWSCloudGoGrafanaKubernetesPrometheusPythonTerraformTypeScript
About the role
- Serve as the go-to AWS expert, setting technical direction and raising the bar for operational excellence across the platform
- Design and operate multi-region, fault-tolerant systems to ensure InStride’s learning platform availability
- Deliver Infrastructure as Code libraries, CI/CD pipelines, and self-service capabilities to reduce operational toil
- Implement defense-in-depth strategies, policy-as-code guardrails, and proactive monitoring for security and compliance
- Define and enforce SLIs/SLOs and error-budget policies and build monitoring frameworks that inform release readiness
- Deploy and manage service mesh solutions to secure, monitor, and optimize service-to-service communication across Kubernetes workloads
- Partner with engineering and security stakeholders to shape InStride’s AWS strategy for scalability, resilience, and cost efficiency
- Mentor and uplift engineers, lead design reviews, and guide teams toward modern DevOps and SRE practices
Requirements
- 10+ years of experience in SRE, DevOps, or Platform Engineering roles operating production AWS workloads
- Hands-on expertise with AWS EKS, Kubernetes networking, Helm, autoscaling frameworks (Karpenter/Cluster Autoscaler), serverless architectures, and API Gateways
- Proven delivery of service mesh solutions (Istio, Linkerd, or AWS App Mesh)
- Proficiency with Infrastructure as Code (IaC) using AWS CDK (TypeScript preferred/Python), Terraform, or CloudFormation
- Strong programming and automation skills in Go, Python, or TypeScript, with additional proficiency in Bash
- Demonstrated experience implementing policy-as-code with OPA/Rego or similar tooling integrated into CI/CD pipelines
- Solid understanding of SLI/SLO/error-budget methodologies and hands-on experience with Prometheus, Grafana, CloudWatch, Groundcover
- Deep knowledge of AWS security best practices, including IAM policies, encryption, OS hardening, and compliance enforcement
- Excellent communication skills with the ability to translate reliability metrics into business impact and guide incident/post-mortem discussions
- Experience mentoring engineers and influencing enterprise AWS and DevOps strategies without direct management responsibilities
- Familiarity with Internal Developer Portals (Backstage, Port, Cortex) and self-service automation is a strong plus
- Candidates must be located in one of the following states to be considered eligible for employment: AZ, CA, CO, CT, FL, GA, IL, IN, KS, LA, MD, MA, MI, MO, NV, NH, NJ, NY, PA, OH, OR, TX, VA, WA, WI