Manila Recruitment

Senior Azure Site Reliability Engineer

Manila Recruitment

full-time

Posted on:

Location Type: Remote

Location: Philippines

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • You will be responsible to provisioning and managing of cloud infrastructure on Azure public cloud to support organizational needs.
  • is responsible for ensuring the reliability, availability, and performance of cloud-based infrastructure and applications deployed on Microsoft Azure.
  • This role involves automating operations, monitoring system health, optimizing performance, and troubleshooting complex issues to maintain a highly available and secure cloud environment.
  • The SRE will work closely with development, security, and IT operations teams to enhance cloud solutions, implement best practices, and support scalable and resilient systems.
  • Deploy and manage Azure cloud services including Virtual Machines, Storage, Redis, Azure SQL databases, virtual networks, and AKS clusters (Azure Kubernetes Service).
  • Automate provisioning, configuration, and deployments using PowerShell, Bash, and Ansible.
  • Deliver and deploy Azure infrastructure using Infrastructure as Code (IaC), specifically Azure bicep
  • Review, Configure and implement monitoring functionalities to provide best visibility and transparency to level 1 support teams.
  • Implement and Troubleshoot CI/CD pipelines for application deployments in Azure DevOps, Team City, Octopus
  • Maintain system reliability using Azure Monitor, Application Insights, Log Analytics, and Prometheus/Grafana, Splunk, Ops-Genie, Slack.
  • Optimize performance and cost efficiency of Azure resources.
  • Train junior members of the team to deliver best of breed solutions on top of Azure public cloud.
  • Review, manage, and troubleshoot Azure Kubernetes Service (AKS) clusters.
  • Review and Manage Cloud and On-Prem servers including AKS in terms of OS, RMQ Upgrades, Security Patches, Application Service support.
  • Respond to system alerts, failures, and security incidents.
  • Perform root cause analysis (RCA) and implement preventive measures.
  • Provide Level 2 support in on-call capacity based on pre-approved schedule (including weekends).
  • Review the network and security design for all infrastructure and applications hosted in Azure.
  • Continuously promote better ways to deliver Infrastructure solutions on Azure cloud.
  • Propose adoption of new approaches, patterns, techniques, and ideas recommended by industry standards and industry trends.
  • Work closely with Software development and network teams to enhance platform reliability and identity better approaches.
  • Administer and optimize Linux-based systems used for application hosting, ensuring stability, security, and performance in production and non-production environments.
  • Troubleshoot issues in Linux operating systems, services, and middleware components to support application availability.

Requirements

  • At least 3 years of proven experience in delivering infrastructure solutions on Azure cloud.
  • 5+ years of hands-on experience with infrastructure design and deployment utilizing PaaS, SaaS and IaaS cloud offerings.
  • At least 2 years of experience with Windows Server
  • Experience with either Azure ARM templates or Azure Biceps
  • At least 3 years of experience in Linux Administration and managing Linux Based OS, Applications
  • At least 2 years of hands-on experience designing, building, and deploying containerized runtime environments based on Azure Kubernetes Services
  • 1+ years of proven experience administering RabbitMQ clusters and Nginx
  • Proven experience with scripting languages like: PowerShell, Python, JavaScript, Bash
  • Experience using Splunk, Grafana, Ops-Genie is an asset
  • __**Advantageous skills:**__
  • - Relevant certifications
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Azure cloudInfrastructure as Code (IaC)PowerShellBashAnsibleAzure Kubernetes Service (AKS)Linux AdministrationRabbitMQNginxCI/CD pipelines
Soft Skills
troubleshootingautomationmonitoringperformance optimizationteam collaborationtrainingroot cause analysisproblem-solvingcommunicationbest practices implementation