Tech Stack
AWSCloudDockerEC2GoGrafanaJavaJenkinsKubernetesMySQLPostgresPrometheusPythonTerraform
About the role
- Design, build, and maintain core infrastructure using Infrastructure as Code (IaC) principles
- Evolve CI/CD pipelines to ensure safe, rapid, and reliable releases
- Identify and address performance bottlenecks, single points of failure, and scalability limits
- Define and monitor Service Level Objectives (SLOs) and Service Level Indicators (SLIs)
- Implement and manage monitoring, logging, and alerting systems (e.g., Prometheus, Grafana, ELK Stack)
- Participate in on-call rotation; lead incident response and blameless post-mortems
- Collaborate with software engineering teams to improve resilience, observability, and developer experience
- Implement and maintain security best practices across cloud infrastructure
Requirements
- 5+ years of hands-on experience with a major cloud provider, preferably AWS (EC2, S3, RDS, VPC, IAM, etc.)
- Deep proficiency with tools like Terraform or CloudFormation to manage infrastructure declaratively
- Strong experience with Docker and container orchestration systems like Kubernetes (EKS) or ECS
- Proven ability to build, optimize, and manage CI/CD pipelines using tools like GitLab CI, Jenkins, or CircleCI
- Hands-on experience with modern monitoring and logging tools (e.g., Prometheus, Grafana, Loki, Alertmanager, ELK Stack)
- Proficiency in at least one programming language, such as Go, Python, or Bash, for automation and tooling
- Excellent written and verbal communication skills, with a proven ability to work effectively and asynchronously in a distributed team environment
- Preferred: Experience in the payments or FinTech industry
- Preferred: Familiarity with service mesh technologies like Istio or Linkerd
- Preferred: Experience with database administration (e.g., PostgreSQL, MySQL)
- Preferred: Knowledge of networking, security principles, and compliance standards (e.g., PCIDSS)