Site Reliability Engineer

Deel

full-time

Posted on: 8/30/2025

Origin: • 🇪🇺 Anywhere in Europe

✨ AI Apply

Mid-LevelSenior

AWSCloudDockerGrafanaJavaScriptKafkaKubernetesNode.jsPrometheusRabbitMQTerraform

About the role

Maintain uptime and reliability across critical systems, focusing on scalability, observability, and incident prevention
Design and manage cloud infrastructure using Terraform, Kubernetes, and CI/CD pipelines
Automate operations for routine tasks, monitoring, deployment, and disaster recovery
Support and improve on-call processes, including incident response, retrospectives, and tooling
Collaborate with platform, security, and product teams to implement best practices and ship reliable software
Build systems for visibility—develop dashboards, alerts, and documentation to monitor and report on system health
Contribute to infrastructure projects that improve security, performance, and developer velocity
Deel is an all-in-one payroll and HR platform supporting global teams in 150+ countries, enabling payroll, HRIS, compliance, benefits, performance, and equipment management

Hands-on experience operating cloud-based systems (AWS preferred)
Proficiency with Kubernetes, Helm, Docker
Familiarity with CI/CD tooling and deployment pipelines
Strong understanding of observability tools (Datadog, Grafana, Prometheus, etc)
Ability to troubleshoot issues quickly and communicate clearly
Solid scripting or programming fundamentals (Node.js experience is a plus)
Good instincts around systems design, incident management, and reliability practices
Comfortable working in high-speed, high-scale environments
Experience with messaging systems like RabbitMQ, Kafka, or NATS (nice to have)
Exposure to internal developer platforms or tooling (nice to have)
Prior experience in platform, DevOps, or infrastructure teams (nice to have)
Previous experience supporting sandbox, staging, or demo environments (nice to have)