CD Operations Engineer

Interval Group

Site Reliability Engineer managing and scaling a production Kubernetes platform for innovative companies. Focusing on automation, CI/CD pipelines, and operational excellence.

Posted 5/9/2026contractRemote • 🇩🇪 GermanyMid-LevelSeniorWebsite

Tech Stack

Tools & technologies

GrafanaITSMJenkinsKubernetesPrometheus

About the role

Key responsibilities & impact

Maintain and optimise CI/CD pipelines to ensure deployment readiness and validate all deployment artifacts from an operational perspective.
Define and enforce quality assurance measures, including standard operating procedures and successful test reporting.
Implement rollback strategies and comprehensive operational monitoring for all production deployments.
Manage monitoring, incident, problem, and change management within a multi-tenant managed Kubernetes environment.
Monitor system health, performance metrics, and service availability, resolving incidents to minimise service disruption.
Perform root cause analysis and implement corrective and preventive actions to enhance platform stability.
Automate recurring operational tasks and critical processes to reduce toil and improve service reliability.
Validate automated procedures through the full software development lifecycle, including staging and testing.
Implement logging and monitoring strategies to adhere to security and audit compliance standards.
Conduct routine security scans and remediate vulnerabilities across the platform.

Requirements

What you’ll need

Professional proficiency in both English and German (C1 level minimum)
At least 3 years of hands-on operational experience with self-managed Kubernetes clusters and productive applications in on-premise environments
Deep understanding of networking concepts, including protocols, load balancing, and security
Extensive experience with CI/CD processes and tooling, such as GitLab, Jenkins, Tekton, or ArgoCD
Fundamental understanding of core operations processes including incident, change, and problem management (ITSM) alongside SRE concepts
Experience gathering operational insights from monitoring and observability tools, including managing SLI/SLA/SLOs
Proven ability to document procedures and enforce clear runbooks or playbooks
Practical experience with monitoring and logging stacks such as Prometheus, Grafana, Mimir, or Loki

Benefits

Comp & perks

Flexible working hours
Freedom to choose your own projects
Access to exciting projects in various industries
Competitive pay
Dedicated team support

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

CI/CD pipelinesKubernetesroot cause analysisnetworking conceptsincident managementchange managementproblem managementmonitoringobservabilitysecurity compliance

Soft Skills

documentationcommunicationproblem-solvinganalytical thinkingcollaborationattention to detailproactive mindsetadaptabilityorganizational skillscritical thinking