Xenon Seven

Senior Site Reliability Engineer – SRE

Xenon Seven

full-time

Posted on:

Location Type: Remote

Location: Remote • 🇩🇪 Germany

Visit company website
AI Apply
Apply

Job Level

Senior

Tech Stack

ElasticSearchGrafanaKubernetesLinuxLogstashOpenShiftPrometheusUnix

About the role

  • Design and architect highly available and scalable OpenShift/Kubernetes infrastructure for banking applications on on-premise servers
  • Lead and implement comprehensive monitoring and observability strategy using Prometheus and Grafana
  • Design and oversee centralized logging infrastructure using ELK Stack (Elasticsearch, Logstash, Kibana)
  • Lead SRE best practices implementation and adoption of production support standards across teams
  • Mentor and coach junior SRE and DevOps engineers on OpenShift, Kubernetes, monitoring, and production support
  • Define and implement Service Level Indicators (SLIs), Objectives (SLOs), and Agreements (SLAs) with measurable metrics
  • Lead incident response strategy, post-incident reviews, and drive continuous improvement in production stability
  • Architect and implement advanced alerting, monitoring dashboards, and visualization strategies using Prometheus and Grafana
  • Design automation frameworks and tools to reduce operational toil and improve production efficiency
  • Lead OpenShift/Kubernetes cluster upgrades, security patches, and infrastructure modernization on-premise
  • Establish production support procedures, on-call rotation policies, and escalation frameworks
  • Optimize system performance, cost, and resource utilization across containerized on-premise infrastructure
  • Conduct capacity planning, performance optimization, and infrastructure scaling initiatives
  • Lead technical architecture reviews and infrastructure design decisions for banking applications
  • Manage on-premise data center resources and infrastructure planning
  • Participate in 24/7 on-call rotation and escalation for critical production incidents
  • Ensure compliance, security hardening, and disaster recovery procedures for financial systems

Requirements

  • BSc in Computer Science, Information Technology, Software Engineering, or related field
  • 5+ years of hands-on SRE, DevOps, or Production Engineering experience
  • 3+ years of experience leading SRE teams or managing production support operations
  • 3+ years of hands-on experience managing OpenShift and Kubernetes infrastructure on on-premise infrastructure
  • Expert-level experience with Prometheus for monitoring and alerting in production
  • Expert-level experience with Grafana for creating comprehensive monitoring dashboards
  • Advanced experience with ELK Stack (Elasticsearch, Logstash, Kibana) for logging and log analysis
  • Proven experience designing and scaling production systems for high-traffic banking applications
  • Deep expertise in Linux/Unix system administration and container networking
  • Advanced knowledge of CI/CD automation and deployment strategies
  • Hands-on experience with database management, tuning, and optimization on-premises
  • Strong experience with infrastructure automation and Infrastructure as Code
  • Proven 24/7 production support experience in mission-critical environments
  • Experience managing on-premise data center infrastructure
  • Proven leadership skills and ability to mentor junior engineers
  • Excellent communication skills and ability to present to executive stakeholders
  • Experience in financial services or banking sector is highly preferred.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
OpenShiftKubernetesPrometheusGrafanaELK StackLinux/Unix administrationCI/CD automationInfrastructure as Codedatabase managementproduction support
Soft skills
leadershipmentoringcommunicationincident responsecontinuous improvementcapacity planningteam collaborationproblem-solvingpresentation skillsorganizational skills