Tech Stack
AnsibleAzureCloudDockerElasticSearchGoGoogle Cloud PlatformGrafanaKubernetesLogstashPrometheusPythonSDLCTerraform
About the role
- Implement and enhance system reliability, availability, scalability, performance, and efficiency by leveraging monitoring, alerting, and automation tools on public cloud platforms like Azure and GCP
- Participate in capacity planning, analyze software performance, and fine-tune systems to ensure optimal operation
- Develop and enhance our CI/CD process and toolset to streamline software delivery and deployment
- Define and monitor key metrics to assess and enhance system reliability
- Collaborate closely with the engineering team to improve reliability and operational efficiency at every software development life cycle (SDLC) stage
- Troubleshoot, optimize infrastructure and automate repetitive tasks to increase efficiency and effectiveness
Requirements
- Proficiency in programming languages such as Bash, Python, or Go
- Advanced knowledge of monitoring solutions like Prometheus, Grafana, ELK (Elasticsearch, Logstash, Kibana)
- Strong expertise and experience in cloud technologies, specifically Azure and GCP
- Experience in the complete software development life cycle (SDLC)
- In-depth understanding of network concepts, particularly with a focus on security
- Hands-on experience implementing CI/CD processes (for example, using GitLab CI)
- Proficiency in automation platforms like Ansible and Terraform
- Knowledge of orchestration tools like Kubernetes
- Familiarity with container technologies like Docker
- Experience with Git source code version control systems
- Strong problem-solving skills with a systematic approach, effective communication abilities, and a self-driven attitude