Ensure the reliability, scalability, and performance of critical systems and infrastructure.
Build and maintain robust infrastructure on AWS, implementing automation and Infrastructure as Code.
Design, implement, and maintain scalable and reliable infrastructure on AWS.
Develop and manage observability solutions using Grafana, ELK (Elasticsearch, Logstash, Kibana), and Prometheus to monitor system health and performance.
Automate infrastructure provisioning and configuration using Terraform and Chef.
Participate in a 24/7 on-call rotation to respond to and resolve production incidents.
Collaborate with engineering teams to ensure applications are designed for high availability and resilience.
Proactively identify and address performance bottlenecks and potential issues.
Drive continuous improvement through automation, process optimization, and post-incident reviews.
Work closely with development teams to build and maintain robust infrastructure and foster a culture of operational excellence.
Requirements
2+ years of experience in a Site Reliability Engineer, DevOps Engineer, or similar role.
Extensive hands-on experience with Amazon Web Services (AWS), including a deep understanding of networking concepts within AWS.
Ability to grasp complex architectures and perform multi-step troubleshooting.
Advanced Linux skills (engineering fundamentals, networking, storage, operating systems)
Development experience with Go or Python
Exposure managing and optimizing observability suites (e.g., Grafana, ELK Stack).
Strong proficiency in Terraform and Chef.
A strong preference for automating tasks and implementing solutions via Infrastructure as Code rather than manual changes.
Spectacular collaborator and communicator.
A team player but self motivated.
Knowledge and experience with Kubernetes. (preferred)
Familiarity with message brokers such as RabbitMQ and Apache Kafka. (preferred)
Experience with NoSQL databases, particularly MongoDB and Elasticsearch. (preferred)
Familiarity with OpenTelemetry (preferred)
Experience with large distributed systems and microservices architecture (preferred)
Experience with CI/CD pipelines. (preferred)
ATS Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.