Senior Azure Site Reliability Engineer

Manila Recruitment

full-time

Posted on: 1/27/2026

Location Type: Remote

Location: Philippines

Visit company website

Explore more

DevOps Engineer jobs

✨ AI Apply

Apply

Job Level

Senior

Tech Stack

Ansible Azure Cloud Grafana JavaScript Kubernetes Linux NGINX Prometheus Python RabbitMQ Redis Splunk SQL

About the role

You will be responsible to provisioning and managing of cloud infrastructure on Azure public cloud to support organizational needs.
is responsible for ensuring the reliability, availability, and performance of cloud-based infrastructure and applications deployed on Microsoft Azure.
This role involves automating operations, monitoring system health, optimizing performance, and troubleshooting complex issues to maintain a highly available and secure cloud environment.
The SRE will work closely with development, security, and IT operations teams to enhance cloud solutions, implement best practices, and support scalable and resilient systems.
Deploy and manage Azure cloud services including Virtual Machines, Storage, Redis, Azure SQL databases, virtual networks, and AKS clusters (Azure Kubernetes Service).
Automate provisioning, configuration, and deployments using PowerShell, Bash, and Ansible.
Deliver and deploy Azure infrastructure using Infrastructure as Code (IaC), specifically Azure bicep
Review, Configure and implement monitoring functionalities to provide best visibility and transparency to level 1 support teams.
Implement and Troubleshoot CI/CD pipelines for application deployments in Azure DevOps, Team City, Octopus
Maintain system reliability using Azure Monitor, Application Insights, Log Analytics, and Prometheus/Grafana, Splunk, Ops-Genie, Slack.
Optimize performance and cost efficiency of Azure resources.
Train junior members of the team to deliver best of breed solutions on top of Azure public cloud.
Review, manage, and troubleshoot Azure Kubernetes Service (AKS) clusters.
Review and Manage Cloud and On-Prem servers including AKS in terms of OS, RMQ Upgrades, Security Patches, Application Service support.
Respond to system alerts, failures, and security incidents.
Perform root cause analysis (RCA) and implement preventive measures.
Provide Level 2 support in on-call capacity based on pre-approved schedule (including weekends).
Review the network and security design for all infrastructure and applications hosted in Azure.
Continuously promote better ways to deliver Infrastructure solutions on Azure cloud.
Propose adoption of new approaches, patterns, techniques, and ideas recommended by industry standards and industry trends.
Work closely with Software development and network teams to enhance platform reliability and identity better approaches.
Administer and optimize Linux-based systems used for application hosting, ensuring stability, security, and performance in production and non-production environments.
Troubleshoot issues in Linux operating systems, services, and middleware components to support application availability.

Requirements

At least 3 years of proven experience in delivering infrastructure solutions on Azure cloud.
5+ years of hands-on experience with infrastructure design and deployment utilizing PaaS, SaaS and IaaS cloud offerings.
At least 2 years of experience with Windows Server
Experience with either Azure ARM templates or Azure Biceps
At least 3 years of experience in Linux Administration and managing Linux Based OS, Applications
At least 2 years of hands-on experience designing, building, and deploying containerized runtime environments based on Azure Kubernetes Services
1+ years of proven experience administering RabbitMQ clusters and Nginx
Proven experience with scripting languages like: PowerShell, Python, JavaScript, Bash
Experience using Splunk, Grafana, Ops-Genie is an asset
__**Advantageous skills:**__
- Relevant certifications

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Azure cloudInfrastructure as Code (IaC)PowerShellBashAnsibleAzure Kubernetes Service (AKS)Linux AdministrationRabbitMQNginxCI/CD pipelines

Soft Skills

troubleshootingautomationmonitoringperformance optimizationteam collaborationtrainingroot cause analysisproblem-solvingcommunicationbest practices implementation