Site Reliability Engineer

DOCOsoft

Senior Azure Site Reliability Engineer focused on automation and continuous improvement of the Vew SaaS platform. Collaborating with Engineering and DevOps to enhance system reliability as it scales.

Posted 7/2/2026full-timeDublin • 🇮🇪 IrelandMid-LevelSeniorWebsite

Tech Stack

Tools & technologies

AzureCloudDistributed SystemsDockerGrafanaKubernetesPrometheusTerraform

About the role

Key responsibilities & impact

Design, implement, and operate highly available, scalable, and fault-tolerant systems on Microsoft Azure.
Define, track, and improve reliability metrics, including service health indicators and operational performance reporting.
Develop and maintain Infrastructure as Code using Bicep, ARM, Terraform, or similar tooling to ensure consistent and reproducible environments.
Build automation for provisioning, deployment, scaling, and operational workflows, reducing manual intervention and operational toil.
Enhance CI/CD pipelines in collaboration with DevOps to improve deployment safety, reliability, and efficiency.
Implement and maintain monitoring, logging, tracing, and alerting solutions to ensure real-time visibility and rapid issue detection.
Define meaningful alerting strategies that reduce noise and improve response effectiveness.
Lead incident response activities, including structured troubleshooting, stakeholder communication, root cause analysis, and post-incident reviews.
Strengthen incident and problem management processes to improve SLA adherence and customer impact mitigation.
Implement systemic improvements to prevent repeat incidents rather than applying short-term workarounds.
Embed security and compliance best practices across infrastructure, including access control, encryption, and policy enforcement.
Drive continuous service improvement initiatives to enhance performance, reliability, efficiency, and operational maturity.
Collaborate closely with Development and QA teams to improve application resilience and supportability.

Requirements

What you’ll need

Proven experience as a Site Reliability Engineer or in a similar reliability-focused role within a SaaS or cloud-native environment.
Strong hands-on experience operating production workloads on Microsoft Azure across compute, networking, storage, and monitoring services.
Infrastructure as Code expertise using Bicep, ARM, Terraform, or similar tools.
Strong automation and scripting capability (PowerShell essential; additional scripting languages advantageous).
Experience working with containerised environments (Docker) and orchestration concepts such as Kubernetes.
Practical experience with observability tooling such as Azure Monitor, Grafana, Prometheus, Datadog, or OpenTelemetry.
Strong understanding of structured incident response, root cause analysis, SLA/SLO concepts, and reliability engineering practices.
Knowledge of security best practices and compliance standards such as ISO27001, SOC 2, and GDPR.
Strong problem-solving capability with the ability to troubleshoot complex, distributed systems.
Effective communication skills and ability to collaborate across engineering, operations, and business stakeholders.
Azure certifications (e.g., Azure Administrator Associate, Azure Solutions Architect Expert) are desirable.

Benefits

Comp & perks

25 days Annual Leave
Private pension
Bonus scheme
Private health
Life assurance

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Site Reliability EngineeringProduction Workloads ManagementContainerization (Docker)Kubernetes OrchestrationReliability Engineering Practices

Soft Skills

Effective CommunicationCollaboration Across TeamsProblem-Solving Capability

Certifications

Azure Administrator AssociateAzure Solutions Architect Expert