Senior Site Reliability Engineer

Recorded Future

full-time

Posted on: 9/28/2025

Location Type: Office

Location: Boston • Massachusetts • 🇺🇸 United States

Visit company website

✨ AI Apply

Apply

Job Level

Senior

Tech Stack

ApacheAWSChefDistributed SystemsElasticSearchGoGrafanaKafkaKubernetesLinuxLogstashMicroservicesMongoDBNoSQLPrometheusPythonRabbitMQTerraform

About the role

Ensure the reliability, scalability, and performance of critical systems and infrastructure.
Build and maintain robust infrastructure on AWS, implementing automation and Infrastructure as Code.
Design, implement, and maintain scalable and reliable infrastructure on AWS.
Develop and manage observability solutions using Grafana, ELK (Elasticsearch, Logstash, Kibana), and Prometheus to monitor system health and performance.
Automate infrastructure provisioning and configuration using Terraform and Chef.
Participate in a 24/7 on-call rotation to respond to and resolve production incidents.
Collaborate with engineering teams to ensure applications are designed for high availability and resilience.
Proactively identify and address performance bottlenecks and potential issues.
Drive continuous improvement through automation, process optimization, and post-incident reviews.
Work closely with development teams to build and maintain robust infrastructure and foster a culture of operational excellence.

Requirements

2+ years of experience in a Site Reliability Engineer, DevOps Engineer, or similar role.
Extensive hands-on experience with Amazon Web Services (AWS), including a deep understanding of networking concepts within AWS.
Ability to grasp complex architectures and perform multi-step troubleshooting.
Advanced Linux skills (engineering fundamentals, networking, storage, operating systems)
Development experience with Go or Python
Exposure managing and optimizing observability suites (e.g., Grafana, ELK Stack).
Strong proficiency in Terraform and Chef.
A strong preference for automating tasks and implementing solutions via Infrastructure as Code rather than manual changes.
Spectacular collaborator and communicator.
A team player but self motivated.
Knowledge and experience with Kubernetes. (preferred)
Familiarity with message brokers such as RabbitMQ and Apache Kafka. (preferred)
Experience with NoSQL databases, particularly MongoDB and Elasticsearch. (preferred)
Familiarity with OpenTelemetry (preferred)
Experience with large distributed systems and microservices architecture (preferred)
Experience with CI/CD pipelines. (preferred)

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills

AWSTerraformChefGoPythonLinuxGrafanaELKPrometheusKubernetes

Soft skills

collaborationcommunicationself-motivationteam playerproblem-solvingprocess optimizationcontinuous improvementtroubleshootingoperational excellenceresilience