Site Reliability Engineer II

Zigsaw

Site Reliability Engineer enhancing AWS-based platform reliability at Pinterest and scaling Kubernetes workloads. Operating and improving cloud-native infrastructure with a focus on automation and resilience.

Posted 6/18/2026full-timeRemote • California • 🇺🇸 United StatesMid-LevelSenior💰 $114,297 - $235,319 per yearWebsite

ATS Keywords

Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills

KubernetesGitOpsArgoCDHelmTerraformTerragruntCI/CDBashPythonAWS

Soft Skills

troubleshootingcollaborationcommunicationownership mindsetcritical evaluationdata validationpeer reviewintegrityaccountabilityprocess automation

Tools & Technologies

GitHub Actionsproduction infrastructuremonitoringalertingobservabilityIAMmulti-tenancydashboardslogsmetrics

Industry Keywords

Site Reliability EngineeringDevOpsPlatform EngineeringCloud Infrastructureincident responseroot cause analysisproduction environmentsworkload reliabilityenvironment managementsensitive data protection

Tech Stack

Tools & technologies

AWSCloudDistributed SystemsKubernetesLinuxPythonTerraform

About the role

Key responsibilities & impact

Ensuring the reliability, availability, and performance of production infrastructure and platform services
Operating and scaling Kubernetes platforms, including governance and support for multi-tenant workloads
Managing GitOps-based deployment workflows using ArgoCD and Helm
Supporting infrastructure provisioning and change management through Terraform/Terragrunt
Building and supporting CI/CD automation and deployment workflows using GitHub Actions
Participating in incident response, root cause analysis, and post-incident improvement initiatives
Reducing operational toil through scripting, tooling, and process automation
Advancing observability practices across logs, metrics, traces, dashboards, and alerting
Supporting secure secrets integration, IAM-aware operations, and platform guardrails
Partnering closely with application, security, and platform teams to improve reliability and delivery outcomes

Requirements

What you’ll need

4+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or Cloud Infrastructure
Strong hands-on experience operating AWS in production environments
Good expertise in Kubernetes, including cluster operations, troubleshooting, workload reliability, and platform administration
Experience with Kubernetes multi-tenancy, including namespaces, RBAC, quotas, policies, and tenant isolation patterns
Experience implementing and operating ArgoCD within a GitOps delivery model
Strong hands-on experience with Helm
Experience with Terraform/Terragrunt for infrastructure provisioning and environment management
Solid scripting and automation skills using Bash and/or Python
Experience building, maintaining, or supporting CI/CD pipelines, ideally using GitHub Actions
Strong troubleshooting skills across Linux, containers, IAM, networking, and distributed systems
Experience with monitoring, alerting, and observability in production environments
Demonstrated ownership mindset with experience handling incidents and resolving production issues
Strong collaboration and communication skills, with the ability to work effectively across engineering, security, and platform teams
Bachelor’s degree in computer science, engineering, a related field or equivalent experience
Demonstrated ability to use AI to improve speed and quality in your day-to-day workflow for relevant outputs
Strong track record of critical evaluation and verification of AI-assisted work (e.g., testing, source-checking, data validation, peer review)
High integrity and ownership: you protect sensitive data, avoid over-reliance on AI, and remain accountable for final decisions and deliverables.

Benefits

Comp & perks

Information regarding the culture at Pinterest and benefits available for this position can be found [here](https://www.pinterestcareers.com/pinterest-life/)