
Senior Site Reliability Engineer
Recorded Future
full-time
Posted on:
Location Type: Office
Location: Boston • Massachusetts • 🇺🇸 United States
Visit company websiteJob Level
Senior
Tech Stack
ApacheAWSChefDistributed SystemsElasticSearchGoGrafanaKafkaKubernetesLinuxLogstashMicroservicesMongoDBNoSQLPrometheusPythonRabbitMQTerraform
About the role
- Ensure the reliability, scalability, and performance of critical systems and infrastructure.
- Build and maintain robust infrastructure on AWS, implementing automation and Infrastructure as Code.
- Design, implement, and maintain scalable and reliable infrastructure on AWS.
- Develop and manage observability solutions using Grafana, ELK (Elasticsearch, Logstash, Kibana), and Prometheus to monitor system health and performance.
- Automate infrastructure provisioning and configuration using Terraform and Chef.
- Participate in a 24/7 on-call rotation to respond to and resolve production incidents.
- Collaborate with engineering teams to ensure applications are designed for high availability and resilience.
- Proactively identify and address performance bottlenecks and potential issues.
- Drive continuous improvement through automation, process optimization, and post-incident reviews.
- Work closely with development teams to build and maintain robust infrastructure and foster a culture of operational excellence.
Requirements
- 2+ years of experience in a Site Reliability Engineer, DevOps Engineer, or similar role.
- Extensive hands-on experience with Amazon Web Services (AWS), including a deep understanding of networking concepts within AWS.
- Ability to grasp complex architectures and perform multi-step troubleshooting.
- Advanced Linux skills (engineering fundamentals, networking, storage, operating systems)
- Development experience with Go or Python
- Exposure managing and optimizing observability suites (e.g., Grafana, ELK Stack).
- Strong proficiency in Terraform and Chef.
- A strong preference for automating tasks and implementing solutions via Infrastructure as Code rather than manual changes.
- Spectacular collaborator and communicator.
- A team player but self motivated.
- Knowledge and experience with Kubernetes. (preferred)
- Familiarity with message brokers such as RabbitMQ and Apache Kafka. (preferred)
- Experience with NoSQL databases, particularly MongoDB and Elasticsearch. (preferred)
- Familiarity with OpenTelemetry (preferred)
- Experience with large distributed systems and microservices architecture (preferred)
- Experience with CI/CD pipelines. (preferred)
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
AWSTerraformChefGoPythonLinuxGrafanaELKPrometheusKubernetes
Soft skills
collaborationcommunicationself-motivationteam playerproblem-solvingprocess optimizationcontinuous improvementtroubleshootingoperational excellenceresilience