Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Zscaler

Staff Site Reliability Engineer

Zscaler

Staff Site Reliability Engineer at Zscaler ensuring reliability of cloud services. Collaborating on large-scale systems and driving operational improvements for a secure AI platform.

Posted 6/1/2026full-timeSan Jose • California • 🇺🇸 United StatesLead💰 $119,000 - $170,000 per yearWebsite

Tech Stack

Tools & technologies
CloudDistributed SystemsDNSGoKubernetesLinuxPythonTCP/IPUnix

About the role

Key responsibilities & impact
  • Own the reliability of a large-scale cloud service (Linux/BSD, bare metal, Kubernetes, custom load balancing, SD-WAN) by partnering with Engineering and Network teams to define requirements early, conduct operability reviews, and contribute code/design docs for platform resilience
  • Develop and operate end-to-end observability (metrics/logs/traces, dashboards, alerting) and incident tooling to manage SLOs/error budgets, reduce noise, and improve system detection and diagnosis
  • Participate in an on-call rotation to lead full-cycle incident response; perform deep cross-stack troubleshooting (OS, networking, distributed systems, packet captures, core dumps) to drive permanent software fixes and codify learnings into runbooks and tests
  • Build and maintain everything-as-code for fleet and service lifecycle, driving provisioning, configuration, release automation, canary deployments, and complex rollout/rollback workflows
  • Continuously improve platform hygiene through consistent OS/app upgrades, dependency/vulnerability patching, capacity and performance tuning, and strict CI/CD validation prior to production rollouts

Requirements

What you’ll need
  • US Citizenship is required (due to the nature of assigned customers) and 5+ years industry experience in software engineering, infrastructure software, and/or platform engineering
  • Proficiency in at least one programming language (such as Python, Bash, or Go) with demonstrated ability to write production-quality code (testing, code reviews, CI, maintainable design, scripting for diagnostics)
  • Strong Linux/Unix systems fundamentals (process/memory, filesystems, networking stack basics, debugging/perf troubleshooting) and solid understanding of networking protocols and components (e.g., HTTP, DNS, TCP/IP, ICMP, OSI model, subnetting, and load balancing/traffic concepts)
  • Proven experience operating production services (including incident response, troubleshooting, reducing toil) and ability to participate in on-call rotations and support occasional after-hours or weekend deployments
  • Managing BSD in production, with a focus on driving systemic fixes through platform engineering

Benefits

Comp & perks
  • Various health plans
  • Time off plans for vacation and sick time
  • Parental leave options
  • Retirement options
  • Education reimbursement
  • In-office perks, and more!

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PythonBashGoLinuxBSDKubernetesSD-WANobservabilityCI/CDnetworking
Soft Skills
incident responsetroubleshootingcollaborationcommunicationproblem-solvingleadershiporganizationattention to detailadaptabilitycritical thinking