CentralReach

Senior Site Reliability Engineer, Security

CentralReach

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $160,000 - $180,000 per year

Job Level

About the role

  • Responsible for availability, latency, performance, efficiency, monitoring/observability, emergency response, capacity planning, setting and maintaining SLOs, SLIs and Error Budgets, creating dashboards.
  • Analyze, troubleshoot and resolve operational challenges contributing to defined SLO's.
  • Manage site stability, performance, reliability, and maintain uptime for production environments.
  • Develop a fully automated multi-environment observability stack based on the existing system and extend it to predict capacity needs based on the usage patterns.
  • Strive for automation to reduce toil and increase development velocity.
  • Perform application-specific production support, incident management, change management, problem management, RCAs, and service restoration as needed.
  • Identify changes for the product architecture from the reliability, performance and availability perspective with a data driven approach.
  • Document resolution run books and standard operating procedures.
  • Actively look for opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation.
  • Collaborate with software development teams in the release management process and to shape the future roadmap and establish strong operational readiness across teams.
  • Implementation of reliability and observability tools (like New Relic, Prometheus, Grafana etc.,).
  • Collaborates with Security team and other platform engineering teams to build reliable, maintainable, and scalable solutions that improve our security posture.

Requirements

  • Strong background as a SRE supporting a 24x7 highly available production environment for a SaaS or cloud service provider.
  • Solid experience with Monitoring/APM/Observability tools (Splunk, New Relic etc.)
  • Experience implementing observability plans around logs, metrics, and traces.
  • Experience in an agile development team developing software.
  • Experience with cloud infrastructure environments, preferably AWS, and Infrastructure as code (Terraform, CloudFormation).
  • Extensive experience with Docker, Kubernetes, Helm, CI/CD and config management tools like Ansible, Chef.
  • Strong experience with containerization technology and/or Kubernetes.
  • Experience with Release automation, system administration, configuration management.
  • Experience with programming languages (Java, Python, Go, etc.).
  • Strong understanding of Linux, Windows, software development, systems, networking, and cloud concepts.
  • Strong interpersonal and teaming skills - ability to set and enforce process and influence engineers who are not direct reports.
  • Strong analytical and programming skills (Python, Go, Java etc.).
  • Deep understanding around best practices for modern cloud security.
  • Proven experience building observability for security concerns, such as privilege escalations and bot detection.
Benefits
  • Comprehensive health benefits
  • Generous PTO
  • 401(k) matching
  • Paid parental leave
  • Hybrid work schedules
  • Career development support
  • Wellness programs
  • Opportunities to give back through CR Cares™
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
SREmonitoringobservabilityincident managementchange managementproblem managementapplication supportcloud infrastructureprogramming languagescontainerization
Soft Skills
interpersonal skillsteaming skillsanalytical skillsinfluenceprocess enforcement