Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Akuity

Senior Site Reliability Engineer

Akuity

Senior SRE responsible for platform reliability at Akuity, optimizing Kubernetes and AWS performance. Collaborate with teams on incident response and improvements while maintaining critical SLAs.

Posted 7/3/2026full-timeRemote • 🇺🇸 United StatesSeniorWebsite

Tech Stack

Tools & technologies
AWSEC2GoGrafanaKubernetesPrometheusPython

About the role

Key responsibilities & impact
  • Own SLI/SLO/SLA definitions for the Akuity SaaS platform and drive continuous improvement against them
  • Design, instrument, and maintain observability systems (metrics, logs, traces) across multi-region AWS infrastructure
  • Identify reliability gaps, lead blameless post-mortems, and close the loop with permanent fixes
  • Partner with engineering teams to build reliability into new features before they ship to production
  • Participate in an on-call rotation and act as incident commander for high-severity production events
  • Build and maintain runbooks, escalation paths, and incident playbooks that keep mean time to resolution low
  • Drive improvements to alerting fidelity; reduce noise, increase signal, eliminate toil
  • Lead post-incident reviews with clear timelines, root cause analysis, and follow-through on action items

Requirements

What you’ll need
  • 5+ years of SRE, platform engineering, or production operations experience in a SaaS environment
  • Deep hands-on Kubernetes expertise; you understand the scheduler, networking, storage, and autoscaling at a level where you can debug anything
  • Strong AWS fundamentals across compute (EC2, EKS), networking (VPC, NLB, Route53), storage (S3, RDS), and IAM
  • Experience defining and operating against SLOs in production; you've written error budgets, not just read about them
  • Proficiency with observability tooling (Prometheus, Grafana, OpenTelemetry, Datadog, or equivalent)
  • Solid scripting and automation skills; Go, Python, Bash, or similar; you automate what you touch
  • Strong written communication: clear runbooks, sharp incident reports, thoughtful post-mortems
  • Live within US time zones (Pacific through Eastern), including Canada and other regions

Benefits

Comp & perks
  • Competitive compensation, commensurate with experience
  • Equity participation in a well-funded, growing company
  • Fully remote: work from anywhere within US time zones (Pacific through Eastern), including Canada and other regions
  • Home office stipend and equipment budget
  • Flexible time off and a culture that respects it
  • Work directly with the engineers who built Argo CD and Kargo; you'll learn a lot here
  • US-based employees receive full benefits, including comprehensive health, dental, and vision coverage. Candidates based outside the US will be engaged as contractors.

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
SREKubernetesAWS (EC2, EKS, VPC, NLB, Route53, S3, RDS, IAM)SLO Definition and OperationPrometheusGrafanaOpenTelemetryDatadogScripting (Go, Python, Bash)Automation
Soft Skills
Strong Written Communication