SRE – Infra

PostHog

full-time

Posted on: 4/9/2026

Location Type: Remote

Location: United States

Visit company website

Explore more

DevOps Engineer jobs

✨ AI Apply

Apply

Job Level

Mid-Level Senior

Tech Stack

AWS Cloud Kubernetes Linux Node.js Terraform

About the role

You won’t be in a typical “keep the lights on” SRE role. The work is about turning a fast-growing, stateful system into a predictable, well-automated platform.
Operating EKS clusters across several environments with Karpenter autoscaling, Cilium networking, and ArgoCD-driven GitOps deployments
Managing and evolving a multi AWS account organization, provisioning, networking, access control, and cross-account connectivity
Maintaining the Terraform/Terragrunt IaC platform - modules, automated plan-on-PR / apply-on-merge pipelines, and safe patterns for shared infrastructure
Improving operational tooling around deploys, schema changes, backups, restores, and incident response
Reducing operational load by identifying repeat pain points and eliminating them through code and self-healing automation
Optimizing cloud spend as you go
Participating in on-call and incident response, with a strong focus on making incidents rarer over time.

Requirements

Deep hands-on experience with Kubernetes in production (EKS preferred). You've debugged node pressure, networking issues, and deployment failures at scale (thousands of nodes)
Strong experience operating production infrastructure on AWS. Not just one account, but understanding organizational boundaries, IAM, and networking between many
Experience automating infrastructure using Terraform or Terragrunt at scale, including module design and state management
Solid understanding of Linux systems (disk, memory, networking, failure modes)
Experience supporting stateful systems (databases, queues, storage systems, etc.)
Ability to debug and reason about performance and reliability issues in production
You're comfortable owning systems end-to-end, including on-call responsibilities.

Benefits

Transparency: Everyone can read about our roadmap, how we pay (or even let go of) people, our strategy, and how we work, in our public company handbook. Internally, we share revenue, notes and slides from board meetings, and fundraising plans, so everyone has the context they need to make good decisions.
Autonomy: We don’t tell anyone what to do. Everyone chooses what to work on next based on what's going to have the biggest impact on our customers, and what they find interesting and motivating to work on.
Shipping fast: Why not now? We want to build a lot of products; we can't do that shipping at a normal pace. We prioritize heads down building time over perfect coordination. This will be the most productive job you've ever had.
Time for building: Nothing gets shipped in a meeting. We're a natively remote company. We default to async communication – PRs > Issues > Slack. Tuesdays and Thursdays are meeting-free days.
Ambition: We want to solve big problems. We strongly believe that aiming for the best possible upside, and sometimes missing, is better than never trying. We're optimistic about what's possible and our ability to get there.
Being weird: Doing weird stuff is a competitive advantage. And it's fun.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

KubernetesEKSTerraformTerragruntAWSLinuxGitOpsCiliumArgoCDautomation

Soft Skills

problem-solvingdebuggingincident responseownershipperformance analysisreliability analysiscommunicationcollaborationorganizational skillsself-healing automation