
Site Reliability Engineer – SRE
SiGMA World
full-time
Posted on:
Location Type: Hybrid
Location: Belgrade • Serbia
Visit company websiteExplore more
About the role
- Ensures the availability, performance, and resilience of production systems supporting live events and iGaming platforms.
- Builds and maintain monitoring, alerting, and observability systems to detect issues before they impact users.
- Conducts capacity planning, load testing, and performance tuning to support traffic spikes during major events.
- Leads incident response, root‑cause analysis, and post‑mortem processes.
- Develops automation for deployments, scaling, configuration management, and routine operational tasks.
- Implements Infrastructure‑as‑Code (IaC) to standardise and automate environment provisioning.
- Improves CI/CD pipelines to ensure fast, reliable, and repeatable releases.
- Reduces manual toil through scripting, tooling, and process optimisation.
- Introduces AI‑powered tools to enhance reliability and operational efficiency.
- Supports the deployment and scaling of AI‑enabled products and services across event and iGaming platforms.
- Collaborates with AI and data teams to ensure infrastructure supports model training, inference, and real‑time AI workloads.
- Manages cloud infrastructure (compute, storage, networking) with a focus on scalability and cost efficiency.
- Implements best practices for security, resilience, and compliance across cloud environments.
- Ensures systems are event‑ready, with robust failover, redundancy, and real‑time monitoring.
- Supports event operations teams with technical readiness, live‑event monitoring, and rapid issue resolution.
- Builds systems capable of handling unpredictable traffic patterns common in iGaming and live events.
- Implements secure‑by‑design principles across infrastructure and operations.
- Ensures compliance with data‑privacy regulations and responsible‑gaming requirements where applicable.
- Identifies and mitigates operational risks, vulnerabilities, and single points of failure.
- Works closely with engineering, product, data, and platform teams to ensure reliability is embedded throughout the development lifecycle.
- Provides guidance on best practices for performance, scalability, and operational readiness.
- Communicates system health, risks, and improvements to stakeholders.
Requirements
- Strong proficiency in cloud platforms, Linux systems, and distributed architectures
- Experience with monitoring tools (e.g., Prometheus, Grafana, Datadog, New Relic)
- Strong scripting and automation skills (Python, Bash, Go, or similar)
- Familiarity with AI‑assisted operations and emerging intelligent‑monitoring tools
- Experience with CI/CD, containerisation, and orchestration
- Strong problem‑solving and analytical skills
- Ability to thrive in fast‑paced, event‑driven environments
- Excellent communication and collaboration skills
- Educated to degree level in a numerate or technical discipline, Masters preferred.
- 5–7+ years of technical experience in SRE, DevOps, platform engineering, or systems engineering
- 1–2+ years of management or mentorship experience, such as leading incident response, guiding junior engineers, or owning reliability initiatives
- Experience supporting high‑availability, high‑traffic systems in production
- Background working with event‑driven architectures or iGaming platforms
- Proven track record of implementing automation and reliability improvements.
Benefits
- Free iGaming Academy access -Learn the ins and outs of the industry with access to courses.
- Travel perks - Visit our international offices and attend industry events worldwide.
- Performance rewards - High performers are recognized and fast-tracked with annual reviews and bi-yearly performance checks ins.
- Interest-free car loan after probation (T&Cs apply)
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
cloud platformsLinux systemsdistributed architecturesmonitoring toolsscriptingautomationCI/CDcontainerisationorchestrationInfrastructure-as-Code
Soft Skills
problem-solvinganalytical skillscommunicationcollaborationleadershipmentorshipadaptabilitytechnical readinessincident responsestakeholder communication
Certifications
degree in numerate or technical disciplineMasters preferred