Tech Stack
AWSCloudDistributed SystemsDockerDynamoDBEC2GoGrafanaJenkinsKubernetesLinuxMongoDBPrometheusPythonTerraform
About the role
- Own uptime, scalability, and reliability of mission-critical production systems.
- Monitor, troubleshoot, and resolve production issues, ensuring minimal downtime and rapid incident response.
- Build observability into systems (metrics, logging, tracing) using tools such as CloudWatch, Prometheus, Grafana, or Datadog.
- Architect, build, and optimize cloud infrastructure using AWS services (EC2, ECS/EKS, Lambda, RDS, S3, CloudFront, VPC, IAM, etc.).
- Implement and manage autoscaling, load balancing, and high availability solutions.
- Work with containerized environments (Docker, Kubernetes/EKS, or ECS Fargate).
- Develop Infrastructure as Code (IaC) using Terraform, AWS CDK, or CloudFormation.
- Automate CI/CD pipelines with tools like GitHub Actions, Jenkins, or GitLab CI.
- Implement release management strategies to ensure safe and fast deployments.
- Implement AWS security best practices across IAM, networking, encryption, and monitoring.
- Partner with security teams to ensure compliance with regulatory and industry standards.
- Partner with developers to optimize services for production readiness.
- Mentor junior engineers on cloud best practices, automation, and troubleshooting.
- Contribute to incident postmortems, root cause analyses, and reliability improvement initiatives.
Requirements
- 7+ years of experience as a Production Engineer, Site Reliability Engineer (SRE), or DevOps Engineer.
- Education must be equivalent to a Bachelor’s or Master’s degree in Computer Science or a related field, with a strong preference for candidates who have completed their B.Tech/M.Tech from top Tier-1 Institutes.
- Strong expertise with AWS services in a production environment.
- Proficiency with Linux systems administration and networking fundamentals.
- Hands-on experience with containers (Docker, Kubernetes/EKS, ECS Fargate).
- Proficiency in Infrastructure as Code (Terraform, AWS CDK, CloudFormation).
- Experience with CI/CD pipelines and Git-based workflows.
- Strong knowledge of monitoring, logging, and alerting systems.
- Solid scripting/programming skills (Python, Bash, or Go).
- Preferred Qualifications: AWS Certified Solutions Architect / DevOps Engineer certification; Experience managing large-scale distributed systems; Exposure to multi-region architectures and disaster recovery strategies; Familiarity with modern databases (RDS, Aurora, DynamoDB, MongoDB); Prior experience in a fast-paced SaaS or fintech environment.