CentralReach

Senior Site Reliability Engineer

CentralReach

full-time

Posted on:

Location: Florida • 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $140,000 - $180,000 per year

Job Level

Senior

Tech Stack

AnsibleAWSChefCloudGoGrafanaJavaKubernetesLinuxPrometheusPythonSplunkTerraform

About the role

  • Responsible for availability, latency, performance, efficiency, monitoring/observability, emergency response, and capacity planning
  • Set and maintain SLOs, SLIs, Error Budgets and create dashboards
  • Analyze, troubleshoot and resolve operational challenges contributing to defined SLOs
  • Manage site stability, performance, reliability, and maintain uptime for production environments
  • Develop a fully automated multi-environment observability stack and extend it to predict capacity needs
  • Automate to reduce toil and increase development velocity
  • Provide application-specific production support, incident management, change management, problem management, RCAs, and service restoration
  • Identify architecture changes for reliability, performance, and availability using a data-driven approach
  • Document run books and standard operating procedures
  • Collaborate with software development teams on release management and operational readiness
  • Implement reliability and observability tools (New Relic, Prometheus, Grafana, etc.)

Requirements

  • Strong background as a SRE supporting a 24x7 highly available production environment for a SaaS or cloud service provider
  • Strong experience with AWS and Infrastructure as Code (Terraform, CloudFormation)
  • Understanding of High Availability best practices in AWS
  • Solid experience with Monitoring/APM/Observability tools (Splunk, New Relic)
  • Experience with Prometheus and Grafana; implementing observability plans around logs, metrics, and traces
  • Extensive experience with Kubernetes, Helm, CI/CD and config management tools like Ansible, Chef
  • Experience with release automation, system administration, and configuration management
  • Programming experience in Java, Python, Go (or similar)
  • Scripting experience with Bash and PowerShell
  • Strong understanding of Linux, Windows, software development, systems, networking, and cloud concepts
  • Experience with SLOs, SLIs, Error Budgets, dashboards, incident management, RCA, and change management
Circle

Senior Site Reliability Engineer

Circle
Seniorfull-time$130k–$140k / yearCalifornia · 🇺🇸 United States
Posted: 1 hour agoSource: boards.greenhouse.io
AWSKubernetesMySQLPostgresRedis
General Dynamics Information Technology

Azure DevOps Engineer – FedRAMP Healthcare Modernization

General Dynamics Information Technology
Mid · Seniorfull-time$73k–$95k / year🇺🇸 United States
Posted: 2 hours agoSource: gdit.wd5.myworkdayjobs.com
AzureCyber SecurityPythonTerraform
Coates Group

Senior DevOps Engineer

Coates Group
Seniorfull-time$125k–$140k / yearIllinois · 🇺🇸 United States
Posted: 8 hours agoSource: jobs.lever.co
AWSCloudDockerIoTLinuxMicroservicesPython
Eduphoria! Inc.

AWS DevOps Engineer

Eduphoria! Inc.
Mid · Seniorfull-time$110k–$125k / yearFlorida, Illinois, Kansas, Maryland, North Carolina, Ohio, Tennessee, Texas, Virginia · 🇺🇸 United States
Posted: 22 hours agoSource: eduphoria.applytojob.com
AWSAzureCloudEC2LinuxMySQL.NETSQLTerraform
GEICO

DevOps Engineer II – FinTech Commissions, Substantiation

GEICO
Mid · Seniorfull-time$75k–$160k / yearDistrict of Columbia, Maryland, Texas, Virginia · 🇺🇸 United States
Posted: 23 hours agoSource: geico.wd1.myworkdayjobs.com
AWSAzureCloudDistributed SystemsJava.NETNoSQLPythonSQL