Senior Site Reliability Engineer

Parallel Domain

Senior Site Reliability Engineer managing AWS infrastructure and Kubernetes for autonomous systems testing. Collaborating across teams to ensure system reliability and security.

Posted 4/29/2026full-timeRemote • Oregon, Washington • 🇺🇸 United StatesSenior💰 CA$145,000 - CA$185,000 per yearWebsite

Tech Stack

Tools & technologies

AWSCloudDNSGrafanaKubernetesLinuxNode.jsPrometheusPythonTerraform

About the role

Key responsibilities & impact

Design, build, and maintain multi-region AWS infrastructure using Terraform.
Operate and scale EKS clusters across production regions: autoscaling, node lifecycle, workload health.
Manage networking across environments: VPC design, DNS, load balancing, and cross-region connectivity.
Support infrastructure changes, migrations, and expansions into new regions.
Help build and run incident management processes: severity definitions, escalation paths, on-call practices.
Lead incident response, debugging, and root-cause analysis.
Write postmortems and drive systemic reliability improvements from what they surface.
Improve observability across metrics, logging, tracing, and dashboards.
Provide security-conscious feedback on platform architecture decisions.
Own cloud IAM governance: roles, policies, and access boundaries across accounts and services.
Improve CI/CD pipelines and infrastructure validation.
Support engineers with infrastructure debugging, environment setup, and performance issues.
Contribute to tooling and automation in Python and Bash.

Requirements

What you’ll need

5+ years in SRE, DevOps, or infrastructure engineering roles, with a track record of operating production systems across multiple regions.
Terraform experience: Modules, state management, and multi-environment patterns.
AWS depth: Solid experience across VPC, IAM, EKS, S3, and CloudWatch.
Kubernetes expertise: Cluster operations, autoscaling, RBAC, and Helm.
CI/CD and GitOps: Experience with GitHub Actions, ArgoCD, or similar workflows.
Networking fundamentals: CIDR, DNS, load balancing, VPN, and cross-region connectivity.
Observability: Experience with tooling such as Prometheus and Grafana.
Scripting: Comfort with Python and Bash for tooling and automation.
Cross-platform familiarity: Working knowledge of both Linux and Windows environments. Operational experience supporting Windows-based workloads is a meaningful advantage.
Pragmatism and ownership: Comfortable in a fast-moving startup with evolving priorities. You take ownership of systems while collaborating closely with other teams, and you're pragmatic about tradeoffs between speed, reliability, and complexity.

Benefits

Comp & perks

equity
full health/dental/vision coverage
learning stipend
generous vacation

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

AWSTerraformEKSKubernetesCI/CDGitOpsPythonBashNetworkingObservability

Soft Skills

pragmatismownershipcollaborationincident managementdebuggingroot-cause analysissystemic reliability improvement