Tech Stack
AnsibleAWSCloudDynamoDBEC2GrafanaJenkinsPostgresPrometheusRabbitMQTerraform
About the role
- Design, implement, and manage AWS infrastructure using Terraform
- Automate provisioning and configuration management with Ansible
- Maintain idempotent and reusable IaC code across dev, staging, and prod
- Implement infrastructure changes via GitOps workflows (pull requests and approvals)
- Optimize cloud resources for cost versus performance
- Set up and maintain AWS services: EC2, EKS, DynamoDB, RDS (PostgreSQL), S3, VPC, IAM, Route53
- Design secure networking (VPC, subnets, security groups, VPN, Route53)
- Manage IAM roles, permissions, and policies with least privilege
- Design and maintain CI/CD pipelines using GitLab CI or Jenkins; ensure zero-downtime deployments
- Deploy, configure, and optimize RDS PostgreSQL and DynamoDB; implement backups and failover
- Fine-tune database query performance, caching, and connection pooling
- Manage RabbitMQ clusters, queues, and message durability
- Set up and manage Prometheus, Grafana, and Loki for monitoring and logging; create dashboards and alerts
- Use distributed tracing tools (e.g., OpenTelemetry) for debugging
- Implement security best practices: IAM, network security, WAF, private endpoints, secrets management
- Ensure automated security patching and perform security audits and vulnerability scans
- Set up automated backups and implement and test disaster recovery plans according to RPO/RTO
Requirements
- Expertise in Terraform for designing, implementing, and managing AWS infrastructure
- Experience with Ansible for provisioning and configuration management
- Maintain idempotent and reusable IaC code for dev, staging, and prod environments
- Implement infrastructure changes via GitOps workflows (PRs & approvals)
- Strong AWS skills: EC2, EKS, DynamoDB, RDS (PostgreSQL), S3, VPC, IAM, Route53
- Design and manage secure networking (VPC, Subnets, Security Groups, VPN, Route53)
- Manage IAM roles, permissions, and policies following least-privilege principle
- AWS cost optimization knowledge (rightsizing, auto-scaling, reserved instances, saving plans)
- CI/CD pipeline design and maintenance using GitLab CI or Jenkins; ensure zero-downtime deployments
- Experience deploying, configuring, and optimizing RDS PostgreSQL and DynamoDB; backups and failover strategies
- Database performance tuning, caching, and connection pooling experience
- Experience managing RabbitMQ clusters, queues, and message durability
- Monitoring and logging experience with Prometheus, Grafana, and Loki; creating dashboards and alerts
- Familiarity with distributed tracing tools (e.g., OpenTelemetry)
- Security best practices: IAM, network security, secrets management (AWS Secrets Manager), automated patching, vulnerability scans, security audits
- Backup and disaster recovery planning (RPO/RTO) and periodic DR testing
- Senior DevOps-level experience (role seniority implied by title)