Tech Stack
AnsibleCloudElasticSearchGrafanaKubernetesPostgresPrometheusRedisTerraform
About the role
- Build and maintain the infrastructure backbone to enable secure, reliable financial services at scale
- Ensure zero-downtime operations for millions of transactions
- Design, operate, and optimize Kubernetes clusters (hybrid cloud and on-premise)
- Manage and optimize PostgreSQL, Redis, and Elasticsearch clusters
- Implement Infrastructure as Code (Terraform/Ansible) and GitOps (ArgoCD)
- Deploy monitoring, observability, and alerting solutions (Prometheus, Grafana, ELK)
- Design and implement disaster recovery architecture with short RTOs
- Automate provisioning and reduce manual processes through self-service engineering portal plugins
- Collaborate with Engineering Manager, Security, and Development teams
- Participate in on-call rotation and 24/7 operational support
Requirements
- Design and operate Kubernetes clusters in hybrid cloud/on-premise environments
- Manage PostgreSQL databases at scale with replication, backup, and performance optimization
- Implement and maintain Redis and Elasticsearch clusters for high-availability scenarios
- Develop automated workflows and build Infrastructure as Code using Terraform/Ansible
- Implement GitOps workflows with ArgoCD for continuous delivery
- Create monitoring and observability solutions with Prometheus, Grafana, ELK Stack
- Design fault-tolerant architectures across cloud and on-premise infrastructure
- Troubleshoot complex system issues under pressure
- Implement security best practices for network and infrastructure components
- Experience working in 24/7 operation environments with on-call rotation
- Cultural competencies: First Principles Thinking; Speed Meets Quality; 200% Ownership