Tech Stack
AnsibleAWSChefCloudDockerEC2FluxGrafanaJenkinsKubernetesPostgresPrometheusPuppetPythonRedisSplunkTerraformVault
About the role
- Design and build complete CI/CD pipelines from scratch using industry best practices
- Architect and maintain highly available, scalable AWS cloud infrastructure
- Implement Infrastructure as Code (IaC) using tools like Terraform, CloudFormation, or Pulumi
- Manage Kubernetes clusters and containerized applications with focus on security and performance
- Serve as a site reliability engineer, ensuring system uptime, monitoring, and incident response
- Implement comprehensive monitoring, logging, and alerting solutions
- Automate deployment processes and infrastructure provisioning
- Collaborate closely with development teams to optimize application performance and deployment workflows
- Establish and maintain security best practices across all infrastructure components
- Design disaster recovery and backup strategies for critical payment systems
- Optimize cloud costs while maintaining performance and reliability standards
- Take ownership of production incidents and lead root cause analysis
- Mentor development teams on DevOps best practices and cloud-native architectures
- Implement GitOps workflows and manage configuration management
- Ensure compliance with security standards and payment industry regulations (PCI DSS)
- Create and maintain comprehensive infrastructure documentation
Requirements
- Senior (5+ years) experience in DevOps/SRE roles
- Proven experience building complete CI/CD pipelines from ground zero
- Strong understanding of site reliability engineering principles and practices
- Experience with incident management, on-call responsibilities, and production support
- Cloud & Infrastructure AWS certification highly preferred (Solutions Architect, DevOps Engineer, or SysOps Administrator)
- Hands-on experience with core AWS services (EC2, ECS/EKS, RDS, S3, VPC, IAM, CloudWatch, etc.)
- Infrastructure as Code expertise (Terraform, CloudFormation, or Pulumi)
- Experience with cloud cost optimization and resource management
- Container Orchestration Kubernetes certification is a huge plus (CKA, CKAD, or CKS)
- Extensive experience with Kubernetes cluster management and operations
- Proficiency with container technologies (Docker, containerd)
- Experience with Helm charts and Kubernetes package management
- Knowledge of service mesh technologies (Istio, Linkerd) is preferred
- CI/CD & Automation
- Expert-level experience with CI/CD tools (Jenkins, GitLab CI, GitHub Actions, AWS CodePipeline)
- Proficiency in scripting languages (Python, Bash, PowerShell)
- Experience with configuration management tools (Ansible, Chef, Puppet)
- GitOps implementation experience (ArgoCD, Flux)
- Monitoring & Observability
- Experience with monitoring solutions (Prometheus, Grafana, DataDog, New Relic)
- Log management and analysis (ELK Stack, Splunk, AWS CloudWatch)
- Application Performance Monitoring (APM) implementation
- Distributed tracing and observability best practices
- Security & Compliance
- Understanding of security best practices in cloud environments
- Experience with secrets management (AWS Secrets Manager, HashiCorp Vault)
- Knowledge of compliance frameworks relevant to payment systems
- Network security and VPC configuration expertise
- Site Reliability Engineering
- Experience with SLA/SLO/SLI definition and monitoring
- Capacity planning and performance optimization
- Disaster recovery planning and implementation
- Incident response and post-mortem processes
- Preferred Qualifications
- Experience in fintech, payment processing, or financial services industry
- Multiple AWS certifications (Solutions Architect Professional, DevOps Engineer Professional)
- Kubernetes certifications (CKA, CKAD, CKS)
- Experience with database operations and management (PostgreSQL, Redis)
- Knowledge of payment gateway integrations and PCI compliance requirements
- Experience with chaos engineering and reliability testing
- Familiarity with service mesh architectures
- Background in network engineering and security
- Experience with multi-region and multi-cloud deployments