Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Onebrief

Senior Site Reliability Engineer

Onebrief

Site Reliability Engineer at Onebrief focusing on reliability and scalability of mission-critical applications in DoD environments and AWS cloud.

Posted 5/19/2026full-timeColorado Springs • Colorado • 🇺🇸 United StatesSenior💰 $180,000 - $220,000 per yearWebsite

Tech Stack

Tools & technologies
AnsibleAWSCloudGoGrafanaJenkinsKubernetesPrometheusPythonTerraform

About the role

Key responsibilities & impact
  • You'll own the reliability, scalability, and security of the production application and/or platform.
  • Implementing a World-Class Observability Platform: Design, implement, and manage our monitoring, logging, and alerting stack (e.g., Prometheus, Loki, Alloy, and Grafana).
  • Defining and Upholding Reliability: Define, measure, and own alerting that feeds into our Service Level Indicators (SLIs) and Service Level Objectives (SLOs).
  • Leading Incident Response: Act as the incident responder and potentially incident commander during critical incidents.
  • Automating for Scale and Security: Partner with platform engineers to design, build, and manage secure, resilient Kubernetes clusters and cloud/on-prem environments.
  • Eliminating Toil and Scaling the Team: Proactively identify and eliminate operational toil by building automation.

Requirements

What you’ll need
  • An active Top Secret clearance
  • 5+ years in Platform, DevOps, or Site Reliability Engineering with an infrastructure and operations focus.
  • Proven partner to DevOps/Platform and application teams; collaborates well across functions and shares context openly.
  • A deep understanding of incident response processes, with experience conducting thorough root cause analyses and driving continuous improvement.
  • Infrastructure as Code: Terraform (or CloudFormation), Ansible.
  • Containers and orchestration: Kubernetes design, deployment, and operations.
  • CI/CD: experience building and maintaining pipelines (GitLab CI/CD, Jenkins, GitHub Actions).
  • Scripting: proficiency with at least one of Python, Go, or Bash.
  • Cloud: Familiarity with AWS or AWS GovCloud.
  • Observability: Grafana stack, ELK stack, or Datadog.
  • Networking fundamentals: core protocols and secure configurations.

Benefits

Comp & perks
  • Equity: Share in the company's success.
  • Flexible Work Environment: Remote-first organization* with flexible work hours and unlimited PTO.***(*note that some roles are in-person, on-site with customers)*
  • Comprehensive Health Coverage: Health, dental, vision, and life insurance.
  • Retirement Plan: 401(k) plan with company match to secure your future.
  • Parental Leave: 8 weeks at 100% regardless of state.
  • Company Retreats: Annual company summit trips.
  • Home Office Budget: $1,000 per year for home office improvements.

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
KubernetesTerraformAnsiblePythonGoBashGitLab CI/CDJenkinsGitHub ActionsAWS
Soft Skills
collaborationincident responseroot cause analysiscontinuous improvement
Certifications
Top Secret clearance