FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesAWSEC2GoGrafanaKubernetesPrometheusPython
About the role
Key responsibilities & impact- Own SLI/SLO/SLA definitions for the Akuity SaaS platform and drive continuous improvement against them
- Design, instrument, and maintain observability systems (metrics, logs, traces) across multi-region AWS infrastructure
- Identify reliability gaps, lead blameless post-mortems, and close the loop with permanent fixes
- Partner with engineering teams to build reliability into new features before they ship to production
- Participate in an on-call rotation and act as incident commander for high-severity production events
- Build and maintain runbooks, escalation paths, and incident playbooks that keep mean time to resolution low
- Drive improvements to alerting fidelity; reduce noise, increase signal, eliminate toil
- Lead post-incident reviews with clear timelines, root cause analysis, and follow-through on action items
Requirements
What you’ll need- 5+ years of SRE, platform engineering, or production operations experience in a SaaS environment
- Deep hands-on Kubernetes expertise; you understand the scheduler, networking, storage, and autoscaling at a level where you can debug anything
- Strong AWS fundamentals across compute (EC2, EKS), networking (VPC, NLB, Route53), storage (S3, RDS), and IAM
- Experience defining and operating against SLOs in production; you've written error budgets, not just read about them
- Proficiency with observability tooling (Prometheus, Grafana, OpenTelemetry, Datadog, or equivalent)
- Solid scripting and automation skills; Go, Python, Bash, or similar; you automate what you touch
- Strong written communication: clear runbooks, sharp incident reports, thoughtful post-mortems
- Live within US time zones (Pacific through Eastern), including Canada and other regions
Benefits
Comp & perks- Competitive compensation, commensurate with experience
- Equity participation in a well-funded, growing company
- Fully remote: work from anywhere within US time zones (Pacific through Eastern), including Canada and other regions
- Home office stipend and equipment budget
- Flexible time off and a culture that respects it
- Work directly with the engineers who built Argo CD and Kargo; you'll learn a lot here
- US-based employees receive full benefits, including comprehensive health, dental, and vision coverage. Candidates based outside the US will be engaged as contractors.
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
SREKubernetesAWS (EC2, EKS, VPC, NLB, Route53, S3, RDS, IAM)SLO Definition and OperationPrometheusGrafanaOpenTelemetryDatadogScripting (Go, Python, Bash)Automation
Soft Skills
Strong Written Communication
