
Senior Site Reliability Engineer – Platform Focus
F5
full-time
Posted on:
Location Type: Hybrid
Location: San Jose • California • Washington • United States
Visit company websiteExplore more
Salary
💰 $176,800 - $265,200 per year
Job Level
About the role
- Design, build, and operate foundational systems that power our products.
- Blend deep infrastructure expertise with software engineering discipline to create scalable, resilient, and developer‑friendly platforms.
- Partner closely with engineering teams to evolve our platform architecture, improve reliability, and accelerate delivery through automation, observability, and thoughtful system design.
- Build and evolve multiple distributions of Kubernetes platform.
- Build automation and tooling to streamline deployments, configuration, and environment management.
- Drive reliability practices such as SLOs, error budgets, incident responses, and post‑incident reviews.
- Develop golden paths for service onboarding, CI/CD, and platform usage across all K8s variants.
- Implement observability systems including metrics, logging, tracing, and alerting.
- Collaborate with product and engineering teams to ensure platform capabilities meet evolving needs.
- Optimize performance and capacity across compute, storage, and networking layers.
- Champion infrastructure-as-code and modern cloud‑native patterns.
- Drive automation-first operations using IaC and GitOps.
- Lead incident response, RCA, post-incident learning, and improve on-call health.
- Partner with security teams to enforce platform guardrails, policy, and secure defaults.
- Lead complex troubleshooting efforts across distributed systems and production environments.
- Mentor engineers and contribute to a culture of operational excellence.
Requirements
- 8+ years in SRE, DevOps, or platform engineering with hands‑on ownership of production systems.
- Expertise in Kubernetes and container orchestration at scale.
- Proficiency with IaC tools such as Terraform, Ansible, and CloudFormation.
- Solid programming skills in languages such as Go, Python, or Bash.
- Deep understanding of distributed systems, networking, and Linux internals.
- Experience building CI/CD pipelines using tools like Gitlab runners, GitHub Actions, or Jenkins.
- Strong observability background with Prometheus, Grafana, Open Telemetry, or similar.
- Proven track record of incident management and improving system reliability.
Benefits
- Incentive compensation
- Bonus
- Restricted stock units
- Benefits
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
KubernetesIaCTerraformAnsibleCloudFormationGoPythonBashCI/CDPrometheus
Soft Skills
mentoringcollaborationincident managementtroubleshootingoperational excellence