
Senior Site Reliability Engineer
Onebrief
full-time
Posted on:
Location Type: Hybrid
Location: Tacoma • Washington • 🇺🇸 United States
Visit company websiteSalary
💰 $180,000 - $220,000 per year
Job Level
Senior
Tech Stack
AnsibleAWSCloudGoGrafanaJenkinsKubernetesPrometheusPythonTerraform
About the role
- You'll own the reliability, scalability, and security of the production application and/or platform.
- Implementing a World-Class Observability Platform: Design, implement, and manage our monitoring, logging, and alerting stack (e.g., Prometheus, Loki, Alloy, and Grafana).
- Defining and Upholding Reliability: Define, measure, and own alerting that feeds into our Service Level Indicators (SLIs) and Service Level Objectives (SLOs).
- Leading Incident Response: Act as the incident responder and potentially incident commander during critical incidents.
- Automating for Scale and Security: Partner with platform engineers to design, build, and manage secure, resilient Kubernetes clusters and cloud/on-prem environments using Infrastructure-as-Code.
- Eliminating Toil and Scaling the Team: Proactively identify and eliminate operational toil by building automation.
Requirements
- 5+ years in Platform, DevOps, or Site Reliability Engineering with an infrastructure and operations focus.
- Proven partner to DevOps/Platform and application teams; collaborates well across functions and shares context openly.
- A deep understanding of incident response processes, with experience conducting thorough root cause analyses and driving continuous improvement.
- Technical expertise in Infrastructure as Code: Terraform (or CloudFormation), Ansible.
- Containers and orchestration: Kubernetes design, deployment, and operations.
- CI/CD: experience building and maintaining pipelines (GitLab CI/CD, Jenkins, GitHub Actions).
- Scripting: proficiency with at least one of Python, Go, or Bash.
- Cloud: Familiarity with AWS or AWS GovCloud.
- Observability: Grafana stack, ELK stack, or Datadog.
- Networking fundamentals: core protocols and secure configurations.
- Active Security+ or another DoD 8570.01-approved security credential, or the ability to obtain the valid credentials within 3 months of employment.
Benefits
- Relocation assistance
- Active Secret Clearance required
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
Infrastructure as CodeTerraformAnsibleKubernetesCI/CDGitLab CI/CDJenkinsGitHub ActionsPythonGo
Soft skills
collaborationincident responseroot cause analysiscontinuous improvementcommunicationleadership
Certifications
Security+DoD 8570.01-approved security credential