F5

Senior Site Reliability Engineer – Platform Focus

F5

full-time

Posted on:

Location Type: Hybrid

Location: San JoseCaliforniaWashingtonUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $176,800 - $265,200 per year

Job Level

About the role

  • Design, build, and operate foundational systems that power our products.
  • Blend deep infrastructure expertise with software engineering discipline to create scalable, resilient, and developer‑friendly platforms.
  • Partner closely with engineering teams to evolve our platform architecture, improve reliability, and accelerate delivery through automation, observability, and thoughtful system design.
  • Build and evolve multiple distributions of Kubernetes platform.
  • Build automation and tooling to streamline deployments, configuration, and environment management.
  • Drive reliability practices such as SLOs, error budgets, incident responses, and post‑incident reviews.
  • Develop golden paths for service onboarding, CI/CD, and platform usage across all K8s variants.
  • Implement observability systems including metrics, logging, tracing, and alerting.
  • Collaborate with product and engineering teams to ensure platform capabilities meet evolving needs.
  • Optimize performance and capacity across compute, storage, and networking layers.
  • Champion infrastructure-as-code and modern cloud‑native patterns.
  • Drive automation-first operations using IaC and GitOps.
  • Lead incident response, RCA, post-incident learning, and improve on-call health.
  • Partner with security teams to enforce platform guardrails, policy, and secure defaults.
  • Lead complex troubleshooting efforts across distributed systems and production environments.
  • Mentor engineers and contribute to a culture of operational excellence.

Requirements

  • 8+ years in SRE, DevOps, or platform engineering with hands‑on ownership of production systems.
  • Expertise in Kubernetes and container orchestration at scale.
  • Proficiency with IaC tools such as Terraform, Ansible, and CloudFormation.
  • Solid programming skills in languages such as Go, Python, or Bash.
  • Deep understanding of distributed systems, networking, and Linux internals.
  • Experience building CI/CD pipelines using tools like Gitlab runners, GitHub Actions, or Jenkins.
  • Strong observability background with Prometheus, Grafana, Open Telemetry, or similar.
  • Proven track record of incident management and improving system reliability.
Benefits
  • Incentive compensation
  • Bonus
  • Restricted stock units
  • Benefits
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
KubernetesIaCTerraformAnsibleCloudFormationGoPythonBashCI/CDPrometheus
Soft Skills
mentoringcollaborationincident managementtroubleshootingoperational excellence