
Site Reliability Engineer – Mobile and Internet Platform
Xenon Seven
full-time
Posted on:
Location Type: Remote
Location: Remote • 🇩🇪 Germany
Visit company websiteJob Level
Entry Level
Tech Stack
ElasticSearchGoGrafanaKubernetesLinuxLogstashNoSQLOpenShiftPrometheusPythonSQLUnix
About the role
- Monitor and maintain the reliability and performance of Mobile Banking and Internet Banking applications using Prometheus and Grafana dashboards
- Manage and support OpenShift/Kubernetes infrastructure for containerized banking applications on on-premise servers
- Respond to and resolve production incidents with minimal mean time to resolution (MTTR)
- Implement and maintain centralized logging solutions using ELK Stack (Elasticsearch, Logstash, Kibana) for application troubleshooting
- Develop and execute runbooks and automation scripts to reduce manual operational toil in OpenShift environments
- Provide 24/7 production support and on-call rotation for critical banking services
- Analyze logs and metrics from Prometheus and EFK to identify performance bottlenecks and reliability issues
- Conduct root cause analysis (RCA) on incidents and implement preventive measures
- Optimize Kubernetes/OpenShift deployments, pod management, and resource allocation on-premise
- Implement alerting strategies and threshold management in Prometheus and Grafana
- Support infrastructure scaling, capacity planning, and load balancing in production environments
- Implement security best practices and compliance requirements for financial systems in containerized environments
- Manage on-premise data center infrastructure and server resources
- Document operational procedures, troubleshooting guides, and create knowledge base articles
Requirements
- BSc in Computer Science, Information Technology, Software Engineering, or related field
- 2+ years of hands-on experience in SRE, DevOps, or Production Engineering roles
- Hands-on experience supporting production applications in Kubernetes/OpenShift environments
- Strong experience with OpenShift container platform administration and troubleshooting on on-premise infrastructure
- Proficiency with Prometheus for metrics collection and monitoring
- Proficiency with Grafana for dashboard creation and visualization
- Experience with ELK Stack (Elasticsearch, Logstash, Kibana) for centralized logging
- Strong understanding of Linux/Unix operating systems and networking fundamentals
- Practical experience with CI/CD tools and automation frameworks
- Proficiency in at least one programming/scripting language (Python, Go, or Bash)
- Experience with database management (SQL and NoSQL) on-premise
- Excellent troubleshooting and analytical skills for production support
- Strong communication skills and ability to work in cross-functional teams
- Experience in 24/7 production support environments
- Experience with on-premise data center infrastructure management
- Previous experience in financial services or banking sector is a plus
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
KubernetesOpenShiftPrometheusGrafanaELK StackLinuxCI/CDPythonGoBash
Soft skills
troubleshootinganalytical skillscommunication skillscross-functional teamworkincident responseroot cause analysiscapacity planningload balancingautomationdocumentation