
Senior Site Reliability Engineer, Security
CentralReach
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $160,000 - $180,000 per year
Job Level
About the role
- Responsible for availability, latency, performance, efficiency, monitoring/observability, emergency response, capacity planning, setting and maintaining SLOs, SLIs and Error Budgets, creating dashboards.
- Analyze, troubleshoot and resolve operational challenges contributing to defined SLO's.
- Manage site stability, performance, reliability, and maintain uptime for production environments.
- Develop a fully automated multi-environment observability stack based on the existing system and extend it to predict capacity needs based on the usage patterns.
- Strive for automation to reduce toil and increase development velocity.
- Perform application-specific production support, incident management, change management, problem management, RCAs, and service restoration as needed.
- Identify changes for the product architecture from the reliability, performance and availability perspective with a data driven approach.
- Document resolution run books and standard operating procedures.
- Actively look for opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation.
- Collaborate with software development teams in the release management process and to shape the future roadmap and establish strong operational readiness across teams.
- Implementation of reliability and observability tools (like New Relic, Prometheus, Grafana etc.,).
- Collaborates with Security team and other platform engineering teams to build reliable, maintainable, and scalable solutions that improve our security posture.
Requirements
- Strong background as a SRE supporting a 24x7 highly available production environment for a SaaS or cloud service provider.
- Solid experience with Monitoring/APM/Observability tools (Splunk, New Relic etc.)
- Experience implementing observability plans around logs, metrics, and traces.
- Experience in an agile development team developing software.
- Experience with cloud infrastructure environments, preferably AWS, and Infrastructure as code (Terraform, CloudFormation).
- Extensive experience with Docker, Kubernetes, Helm, CI/CD and config management tools like Ansible, Chef.
- Strong experience with containerization technology and/or Kubernetes.
- Experience with Release automation, system administration, configuration management.
- Experience with programming languages (Java, Python, Go, etc.).
- Strong understanding of Linux, Windows, software development, systems, networking, and cloud concepts.
- Strong interpersonal and teaming skills - ability to set and enforce process and influence engineers who are not direct reports.
- Strong analytical and programming skills (Python, Go, Java etc.).
- Deep understanding around best practices for modern cloud security.
- Proven experience building observability for security concerns, such as privilege escalations and bot detection.
Benefits
- Comprehensive health benefits
- Generous PTO
- 401(k) matching
- Paid parental leave
- Hybrid work schedules
- Career development support
- Wellness programs
- Opportunities to give back through CR Cares™
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
SREmonitoringobservabilityincident managementchange managementproblem managementapplication supportcloud infrastructureprogramming languagescontainerization
Soft Skills
interpersonal skillsteaming skillsanalytical skillsinfluenceprocess enforcement