Recorded Future

Senior Site Reliability Engineer

Recorded Future

full-time

Posted on:

Location Type: Office

Location: Boston • Massachusetts • 🇺🇸 United States

Visit company website
AI Apply
Apply

Job Level

Senior

Tech Stack

ApacheAWSChefDistributed SystemsElasticSearchGoGrafanaKafkaKubernetesLinuxLogstashMicroservicesMongoDBNoSQLPrometheusPythonRabbitMQTerraform

About the role

  • Ensure the reliability, scalability, and performance of critical systems and infrastructure.
  • Build and maintain robust infrastructure on AWS, implementing automation and Infrastructure as Code.
  • Design, implement, and maintain scalable and reliable infrastructure on AWS.
  • Develop and manage observability solutions using Grafana, ELK (Elasticsearch, Logstash, Kibana), and Prometheus to monitor system health and performance.
  • Automate infrastructure provisioning and configuration using Terraform and Chef.
  • Participate in a 24/7 on-call rotation to respond to and resolve production incidents.
  • Collaborate with engineering teams to ensure applications are designed for high availability and resilience.
  • Proactively identify and address performance bottlenecks and potential issues.
  • Drive continuous improvement through automation, process optimization, and post-incident reviews.
  • Work closely with development teams to build and maintain robust infrastructure and foster a culture of operational excellence.

Requirements

  • 2+ years of experience in a Site Reliability Engineer, DevOps Engineer, or similar role.
  • Extensive hands-on experience with Amazon Web Services (AWS), including a deep understanding of networking concepts within AWS.
  • Ability to grasp complex architectures and perform multi-step troubleshooting.
  • Advanced Linux skills (engineering fundamentals, networking, storage, operating systems)
  • Development experience with Go or Python
  • Exposure managing and optimizing observability suites (e.g., Grafana, ELK Stack).
  • Strong proficiency in Terraform and Chef.
  • A strong preference for automating tasks and implementing solutions via Infrastructure as Code rather than manual changes.
  • Spectacular collaborator and communicator.
  • A team player but self motivated.
  • Knowledge and experience with Kubernetes. (preferred)
  • Familiarity with message brokers such as RabbitMQ and Apache Kafka. (preferred)
  • Experience with NoSQL databases, particularly MongoDB and Elasticsearch. (preferred)
  • Familiarity with OpenTelemetry (preferred)
  • Experience with large distributed systems and microservices architecture (preferred)
  • Experience with CI/CD pipelines. (preferred)

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
AWSTerraformChefGoPythonLinuxGrafanaELKPrometheusKubernetes
Soft skills
collaborationcommunicationself-motivationteam playerproblem-solvingprocess optimizationcontinuous improvementtroubleshootingoperational excellenceresilience