T3 Operations & Support Specialist – Compute & OS

Interval Group

T3 Operations & Support Specialist for cloud-native platform supporting major energy transmission operator in Germany. Responsible for Compute & OS services within Local Production, handling complex incidents and ensuring readiness.

Posted 6/13/2026contractRemote • 🇩🇪 GermanySeniorLeadWebsite

ATS Keywords

Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills

incident managementproblem managementchange managementrelease managementVMware 8Red Hat Enterprise LinuxUbuntumonitoringloggingautomation

Soft Skills

troubleshootinganalytical skillscommunicationcollaborationleadership

Tools & Technologies

Jira Service ManagementJiraConfluencePrometheusGrafanaDatadogMimirLokiSatelliteIPA

Industry Keywords

IT operationsservice deliveryplatform operationsSRE conceptsmulti-tenant environmentsaudit compliancesecurity scansvulnerabilitiesperformance metricsrunbooks

Tech Stack

Tools & technologies

GrafanaITSMKubernetesLinuxPrometheusVMware

About the role

Key responsibilities & impact

Providing T3 operational ownership for Compute & OS services: handling complex incidents, troubleshooting and RCA, and driving permanent fixes and preventive measures
Ensuring compute/OS readiness for releases and changes: monitoring/alerting coverage, performance baselines, hardening, patch strategy, rollback and recovery procedures, and runbooks
Executing and improving standard operational procedures through automation to reduce toil and improve MTTR and stability
Coordinating with Kubernetes, Data, Network and Storage SMEs to resolve cross-domain production issues
Validating deployment artefacts from an operations perspective and enforcing quality assurance measures
Monitoring system health, performance metrics and service availability across multi-tenant environments
Identifying, analysing and resolving incidents to minimise service disruption, and triggering RCA and corrective actions
Implementing monitoring and logging strategies to support audit and compliance requirements
Performing routine security scans and remediating identified vulnerabilities

Requirements

What you’ll need

5 to 10+ years in IT operations, service delivery, or platform operations
Proven experience implementing and leading Incident, Problem, Change and Release governance in production
Hands-on experience with VMware 8 virtualisation
Operating Systems: Red Hat Enterprise Linux and Ubuntu
OS tooling: Satellite, IPA, Certificate Server
ITSM/collaboration tooling: Jira Service Management, Jira, Confluence
Fundamental understanding of core operations processes (Incident, Change, Problem management, ITSM) and SRE concepts
Experience gathering operational insights from monitoring/observability including SLI/SLA/SLO management and tracking
Hands-on experience documenting procedures and enforcing clear runbooks and playbooks
Hands-on experience with monitoring and logging tools (e.g. Prometheus, Grafana, Datadog, Mimir, Loki)
Understanding of modern platform operations (Kubernetes/containers, automation, observability) sufficient to govern specialists
Fluent English and German (C1 minimum in both)

Benefits

Comp & perks

Flexible working hours
Freedom to choose projects
Access to exciting projects in various industries
Support in advancing your career
Competitive pay
Dedicated team for assistance