Tech Stack
AWSCloudDockerGrafanaJavaScriptKafkaKubernetesNode.jsPrometheusRabbitMQTerraform
About the role
- Maintain uptime and reliability across critical systems, focusing on scalability, observability, and incident prevention
- Design and manage cloud infrastructure using Terraform, Kubernetes, and CI/CD pipelines
- Automate operations for routine tasks, monitoring, deployment, and disaster recovery
- Support and improve on-call processes, including incident response, retrospectives, and tooling
- Collaborate with platform, security, and product teams to implement best practices and ship reliable software
- Build systems for visibility—develop dashboards, alerts, and documentation to monitor and report on system health
- Contribute to infrastructure projects that improve security, performance, and developer velocity
- Deel is an all-in-one payroll and HR platform supporting global teams in 150+ countries, enabling payroll, HRIS, compliance, benefits, performance, and equipment management
Requirements
- Hands-on experience operating cloud-based systems (AWS preferred)
- Proficiency with Kubernetes, Helm, Docker
- Familiarity with CI/CD tooling and deployment pipelines
- Strong understanding of observability tools (Datadog, Grafana, Prometheus, etc)
- Ability to troubleshoot issues quickly and communicate clearly
- Solid scripting or programming fundamentals (Node.js experience is a plus)
- Good instincts around systems design, incident management, and reliability practices
- Comfortable working in high-speed, high-scale environments
- Experience with messaging systems like RabbitMQ, Kafka, or NATS (nice to have)
- Exposure to internal developer platforms or tooling (nice to have)
- Prior experience in platform, DevOps, or infrastructure teams (nice to have)
- Previous experience supporting sandbox, staging, or demo environments (nice to have)