DevOps Engineer

• Monitor and maintain the reliability and performance of Mobile Banking and Internet Banking applications using Prometheus and Grafana dashboards
• Manage and support OpenShift/Kubernetes infrastructure for containerized banking applications on on-premise servers
• Respond to and resolve production incidents with minimal mean time to resolution (MTTR)
• Implement and maintain centralized logging solutions using ELK Stack (Elasticsearch, Logstash, Kibana) for application troubleshooting
• Develop and execute runbooks and automation scripts to reduce manual operational toil in OpenShift environments
• Provide 24/7 production support and on-call rotation for critical banking services
• Analyze logs and metrics from Prometheus and EFK to identify performance bottlenecks and reliability issues
• Conduct root cause analysis (RCA) on incidents and implement preventive measures
• Optimize Kubernetes/OpenShift deployments, pod management, and resource allocation on-premise
• Implement alerting strategies and threshold management in Prometheus and Grafana
• Support infrastructure scaling, capacity planning, and load balancing in production environments
• Implement security best practices and compliance requirements for financial systems in containerized environments
• Manage on-premise data center infrastructure and server resources
• Document operational procedures, troubleshooting guides, and create knowledge base articles

Site Reliability Engineer – Mobile and Internet Platform

Job Level

Tech Stack

About the role

Requirements

Applicant Tracking System Keywords

Hard skills

Soft skills