InStride

Principal Site Reliability Engineer, SRE

InStride

full-time

Posted on:

Origin:  • 🇺🇸 United States • Arizona, California, Colorado

Visit company website
AI Apply
Apply

Salary

💰 $165,000 - $185,000 per year

Job Level

Lead

Tech Stack

AWSCloudGoGrafanaKubernetesPrometheusPythonTerraformTypeScript

About the role

  • Serve as the go-to AWS expert, setting technical direction and raising the bar for operational excellence across the platform
  • Design and operate multi-region, fault-tolerant systems to ensure InStride’s learning platform availability
  • Deliver Infrastructure as Code libraries, CI/CD pipelines, and self-service capabilities to reduce operational toil
  • Implement defense-in-depth strategies, policy-as-code guardrails, and proactive monitoring for security and compliance
  • Define and enforce SLIs/SLOs and error-budget policies and build monitoring frameworks that inform release readiness
  • Deploy and manage service mesh solutions to secure, monitor, and optimize service-to-service communication across Kubernetes workloads
  • Partner with engineering and security stakeholders to shape InStride’s AWS strategy for scalability, resilience, and cost efficiency
  • Mentor and uplift engineers, lead design reviews, and guide teams toward modern DevOps and SRE practices

Requirements

  • 10+ years of experience in SRE, DevOps, or Platform Engineering roles operating production AWS workloads
  • Hands-on expertise with AWS EKS, Kubernetes networking, Helm, autoscaling frameworks (Karpenter/Cluster Autoscaler), serverless architectures, and API Gateways
  • Proven delivery of service mesh solutions (Istio, Linkerd, or AWS App Mesh)
  • Proficiency with Infrastructure as Code (IaC) using AWS CDK (TypeScript preferred/Python), Terraform, or CloudFormation
  • Strong programming and automation skills in Go, Python, or TypeScript, with additional proficiency in Bash
  • Demonstrated experience implementing policy-as-code with OPA/Rego or similar tooling integrated into CI/CD pipelines
  • Solid understanding of SLI/SLO/error-budget methodologies and hands-on experience with Prometheus, Grafana, CloudWatch, Groundcover
  • Deep knowledge of AWS security best practices, including IAM policies, encryption, OS hardening, and compliance enforcement
  • Excellent communication skills with the ability to translate reliability metrics into business impact and guide incident/post-mortem discussions
  • Experience mentoring engineers and influencing enterprise AWS and DevOps strategies without direct management responsibilities
  • Familiarity with Internal Developer Portals (Backstage, Port, Cortex) and self-service automation is a strong plus
  • Candidates must be located in one of the following states to be considered eligible for employment: AZ, CA, CO, CT, FL, GA, IL, IN, KS, LA, MD, MA, MI, MO, NV, NH, NJ, NY, PA, OH, OR, TX, VA, WA, WI
Software Mind

DevSecOps

Software Mind
Mid · Seniorfull-time🇵🇱 Poland
Posted: 5 days agoSource: jobs.smartrecruiters.com
AWSCloudGoogle Cloud PlatformGrafanaKubernetesPrometheusTerraformVault
Scientific Games

Technical Operations Engineer

Scientific Games
Senior · Leadfull-time🇺🇸 United States
Posted: 17 days agoSource: sglottery.wd5.myworkdayjobs.com
AWSCloudGoGrafanaJenkinsKubernetesPrometheusPythonTerraform
Articul8 AI

Senior Site Reliability Engineer, SRE

Articul8 AI
Seniorfull-timeCalifornia · 🇺🇸 United States
Posted: 9 days agoSource: jobs.ashbyhq.com
AWSAzureCloudDistributed SystemsDockerGoGoogle Cloud PlatformGrafanaKubernetesNoSQLPrometheusPython+2 more
CodingChiefs: Dedicated Remote Developers

Senior Site Reliability Engineer

CodingChiefs: Dedicated Remote Developers
Seniorfull-time🇵🇭 Philippines
Posted: 7 days agoSource: codingchiefsbv.recruitee.com
AWSCloudDockerEC2GoGrafanaJavaJenkinsKubernetesMySQLPostgresPrometheus+2 more
Aldea

Foundational AI Researcher

Aldea
Mid · Seniorfull-timeFlorida · 🇺🇸 United States
Posted: 10 days agoSource: apply.workable.com
AWSCloudDNSDockerElasticSearchFirewallsGrafanaKubernetesLinuxPostgresPrometheusPython+3 more