SiGMA World

Site Reliability Engineer – SRE

SiGMA World

full-time

Posted on:

Location Type: Hybrid

Location: BelgradeSerbia

Visit company website

Explore more

AI Apply
Apply

About the role

  • Ensures the availability, performance, and resilience of production systems supporting live events and iGaming platforms.
  • Builds and maintain monitoring, alerting, and observability systems to detect issues before they impact users.
  • Conducts capacity planning, load testing, and performance tuning to support traffic spikes during major events.
  • Leads incident response, root‑cause analysis, and post‑mortem processes.
  • Develops automation for deployments, scaling, configuration management, and routine operational tasks.
  • Implements Infrastructure‑as‑Code (IaC) to standardise and automate environment provisioning.
  • Improves CI/CD pipelines to ensure fast, reliable, and repeatable releases.
  • Reduces manual toil through scripting, tooling, and process optimisation.
  • Introduces AI‑powered tools to enhance reliability and operational efficiency.
  • Supports the deployment and scaling of AI‑enabled products and services across event and iGaming platforms.
  • Collaborates with AI and data teams to ensure infrastructure supports model training, inference, and real‑time AI workloads.
  • Manages cloud infrastructure (compute, storage, networking) with a focus on scalability and cost efficiency.
  • Implements best practices for security, resilience, and compliance across cloud environments.
  • Ensures systems are event‑ready, with robust failover, redundancy, and real‑time monitoring.
  • Supports event operations teams with technical readiness, live‑event monitoring, and rapid issue resolution.
  • Builds systems capable of handling unpredictable traffic patterns common in iGaming and live events.
  • Implements secure‑by‑design principles across infrastructure and operations.
  • Ensures compliance with data‑privacy regulations and responsible‑gaming requirements where applicable.
  • Identifies and mitigates operational risks, vulnerabilities, and single points of failure.
  • Works closely with engineering, product, data, and platform teams to ensure reliability is embedded throughout the development lifecycle.
  • Provides guidance on best practices for performance, scalability, and operational readiness.
  • Communicates system health, risks, and improvements to stakeholders.

Requirements

  • Strong proficiency in cloud platforms, Linux systems, and distributed architectures
  • Experience with monitoring tools (e.g., Prometheus, Grafana, Datadog, New Relic)
  • Strong scripting and automation skills (Python, Bash, Go, or similar)
  • Familiarity with AI‑assisted operations and emerging intelligent‑monitoring tools
  • Experience with CI/CD, containerisation, and orchestration
  • Strong problem‑solving and analytical skills
  • Ability to thrive in fast‑paced, event‑driven environments
  • Excellent communication and collaboration skills
  • Educated to degree level in a numerate or technical discipline, Masters preferred.
  • 5–7+ years of technical experience in SRE, DevOps, platform engineering, or systems engineering
  • 1–2+ years of management or mentorship experience, such as leading incident response, guiding junior engineers, or owning reliability initiatives
  • Experience supporting high‑availability, high‑traffic systems in production
  • Background working with event‑driven architectures or iGaming platforms
  • Proven track record of implementing automation and reliability improvements.
Benefits
  • Free iGaming Academy access -Learn the ins and outs of the industry with access to courses.
  • Travel perks - Visit our international offices and attend industry events worldwide.
  • Performance rewards - High performers are recognized and fast-tracked with annual reviews and bi-yearly performance checks ins.
  • Interest-free car loan after probation (T&Cs apply)
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
cloud platformsLinux systemsdistributed architecturesmonitoring toolsscriptingautomationCI/CDcontainerisationorchestrationInfrastructure-as-Code
Soft Skills
problem-solvinganalytical skillscommunicationcollaborationleadershipmentorshipadaptabilitytechnical readinessincident responsestakeholder communication
Certifications
degree in numerate or technical disciplineMasters preferred