MTN Uganda

Manager, Customer Reliability Engineering

MTN Uganda

full-time

Posted on:

Origin:  • 🇿🇦 South Africa

Visit company website
AI Apply
Apply

Job Level

SeniorLead

Tech Stack

AnsibleAWSAzureCloudGoogle Cloud PlatformGrafanaKubernetesPrometheusSplunkTerraform

About the role

  • Achieve measurable improvements in system uptime and performance by implementing robust reliability engineering practices and leading incident prevention initiatives.
  • Reduce Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR) through streamlined incident response protocols and team readiness.
  • Build, lead, and develop a skilled team of Customer Reliability Engineers with a strong focus on ownership, collaboration, and continuous learning.
  • Ensure that reliability is embedded into service design, development, deployment, and operations by partnering with engineering, product, and operations teams.
  • Deliver clear and actionable reporting on reliability metrics to support leadership decision-making and continuous improvement.
  • Align reliability goals with customer expectations by addressing root causes of service degradation and championing seamless user experiences.
  • Identify and address potential reliability risks before they impact customers by implementing observability tools, runbooks, and automated responses.
  • Drive reliability improvements that reduce operational costs by eliminating manual processes, optimizing resource usage, and reducing reactive work.
  • Oversee timely incident response, root cause analysis, and implementation of long-term fixes to prevent recurring issues and improve service resilience.
  • Work closely with software engineering, DevOps, product, and support teams to embed reliability into the end-to-end service lifecycle.
  • Ensure effective monitoring systems, dashboards, and alerts are in place to detect, respond to, and analyze system performance and failures.
  • Define and drive the implementation of a reliability roadmap aligned with business objectives, system scalability, and customer needs.
  • Translate system performance into customer impact metrics (e.g., NPS, downtime minutes) and continuously enhance the end-user experience.
  • Track and report on key reliability metrics such as uptime, latency, error rates, and incident frequency to support transparency and data-driven decisions.
  • Proactively identify technical and operational risks, ensuring mitigation strategies are in place and aligned with compliance standards.
  • Foster a culture of experimentation and improvement by exploring automation, new tools, and process enhancements to strengthen reliability practices.

Requirements

  • Bachelor's Degree in Computer Science, Software Engineering, Information Technology, or a related technical discipline.
  • Certifications in relevant areas such as Site Reliability Engineering (SRE), DevOps, ITIL, or Cloud Infrastructure (e.g., AWS, Azure, GCP) are highly desirable.
  • A Master's Degree in Technology Management, Engineering, or Business Administration is an added advantage.
  • Experience: 7–10 years of experience in IT operations, systems engineering, or reliability engineering within a technology-driven environment.
  • At least 3–5 years in a leadership or managerial role, with proven experience leading reliability or DevOps team.
  • Hands-on experience implementing and managing observability platforms, monitoring tools (e.g., Prometheus, Grafana, Splunk), and automation frameworks.
  • Demonstrated ability to lead incident response efforts, conduct root cause analysis, and implement sustainable, long-term service reliability improvements.
  • Experience working in agile environments and with cross-functional teams, including software development, infrastructure, product, and support.
  • Strong understanding of cloud-native technologies, container orchestration (e.g., Kubernetes), CI/CD pipelines, and infrastructure as code (e.g., Terraform, Ansible).
C-Serv

Senior Cloud Edge Engineer

C-Serv
Seniorfull-time🇷🇴 Romania
Posted: 3 hours agoSource: apply.workable.com
AnsibleAWSAzureCloudDockerGoogle Cloud PlatformGrafanaIoTKubernetesLinuxPrometheusTerraform
C-Serv

Lead Cloud Edge Engineer

C-Serv
Seniorfull-time🇮🇪 Ireland
Posted: 1 day agoSource: apply.workable.com
AnsibleAWSAzureCloudDockerGoogle Cloud PlatformGrafanaIoTKubernetesLinuxPrometheusTerraform
Docusign

Principal Product Manager - Site Reliability

Docusign
Leadfull-time$174k–$328k / year🇺🇸 United States
Posted: 34 days agoSource: uscareers-docusign.icims.com
AnsibleAWSAzureCloudDistributed SystemsGoogle Cloud PlatformGrafanaKubernetesPrometheusTerraform
Procurement Sciences AI

Sr. DevOps Engineer

Procurement Sciences AI
Seniorfull-timeDistrict of Columbia, Utah, Washington · 🇺🇸 United States
Posted: 34 days agoSource: jobs.ashbyhq.com
AWSAzureCloudDockerGoogle Cloud PlatformGrafanaKubernetesMicroservicesPrometheusPythonSaltStackTerraform+1 more
Sinch

Senior Site Reliability Engineer

Sinch
Seniorfull-time$143k–$179k / yearColorado, Illinois · 🇺🇸 United States
Posted: 15 days agoSource: apply.workable.com
AnsibleAWSCassandraCloudDistributed SystemsElasticSearchGoGoogle Cloud PlatformGrafanaLinuxMicroservicesPrometheus+2 more