Tech Stack
AWSDockerDynamoDBEC2GrafanaJenkinsKubernetesPrometheusTerraformVault
About the role
- Design, provision, and maintain AWS resources using Infrastructure-as-Code tools such as Terraform or CloudFormation.
- Manage compute, storage, databases, networking, and IAM policies, including EC2, ECS/EKS, S3, EFS, RDS, DynamoDB, VPCs, and Load Balancers.
- Build, optimize, and maintain Docker images and registries.
- Operate container clusters (ECS, EKS, Kubernetes) and manage deployments using Helm or similar tools.
- Architect, maintain, and optimize end-to-end CI/CD pipelines (GitHub Actions, Jenkins, GitLab CI, Bitbucket Pipelines).
- Automate build, test, security scanning, and deployment processes, including blue/green and canary releases.
- Enforce branch-and-release policies, code reviews, and pipeline governance.
- Implement and tune observability and monitoring systems (CloudWatch, Prometheus, Grafana, ELK/EFK, Datadog, Sentry).
- Define and track SLIs/SLOs, error budgets, and key operational metrics.
- Participate in on-call rotations, incident response, and post-incident reviews; maintain runbooks and operational documentation.
- Manage secrets and credentials using AWS Secrets Manager, HashiCorp Vault, or Parameter Store.
- Apply security best practices to networks, hosts, and containers.
- Collaborate with development teams to resolve production issues and optimize deployment and feedback processes.
- Document architecture, operational procedures, and standard operating practices.
Requirements
- Minimum of 4 years of experience in Infrastructure Engineering, DevOps, or Site Reliability Engineering.
- Proven hands-on experience with AWS production workloads, including EC2, RDS, S3, VPC, and IAM.
- Strong expertise in Docker and container orchestration platforms (ECS, EKS, Kubernetes).
- Proficient in CI/CD tools and pipelines (GitHub Actions, GitLab CI, Jenkins, Bitbucket Pipelines).
- Experience with Infrastructure-as-Code tools (Terraform, CloudFormation, Pulumi).
- Familiarity with monitoring, logging, and observability systems (CloudWatch, Grafana, Prometheus, ELK, Datadog, Sentry).
- Strong security awareness, including IAM policies, secrets management, and vulnerability mitigation.
- Excellent written and verbal communication skills with the ability to clearly articulate complex technical concepts.
- Demonstrated ownership of infrastructure projects, from design through post-incident analysis.