finova

Senior Site Reliability Engineer

finova

full-time

Posted on:

Location Type: Hybrid

Location: London • 🇬🇧 United Kingdom

Visit company website
AI Apply
Apply

Job Level

Senior

Tech Stack

AnsibleAWSAzureCloudGrafanaKubernetes.NETPrometheusPythonTerraform

About the role

  • Spearhead the Site Reliability Engineering function to ensure availability, scalability, and performance of core systems
  • Take responsibility for monitoring .NET applications deployed in AKS, EKS, App Services, and VMs
  • Design, implement, and maintain robust monitoring and alerting systems
  • Analyse system performance metrics, establish baselines, identify bottlenecks, and implement improvements for scalability and efficiency
  • Set up, configure, and optimise observability tools (Prometheus, Grafana, Datadog) to monitor metrics, logs, and traces
  • Ensure high availability and disaster recovery for critical systems; lead incident response and post-incident analysis
  • Develop and maintain SLOs, SLIs, and error budgets to meet reliability targets
  • Automate routine tasks and use infrastructure-as-code (Terraform, Ansible, Bicep) to manage cloud resources
  • Collaborate with DevOps/CloudOps and product development teams to build and deploy infrastructure via CI/CD (Azure DevOps, GitLab CI)
  • Mentor junior SREs and drive best practices across the engineering organisation
  • Identify areas for continuous improvement and stay up-to-date with industry trends, tools, and technologies

Requirements

  • 5+ years of experience in Site Reliability Engineering, DevOps, or Systems Engineering with a strong focus on monitoring, alerting and incident management
  • Hands-on experience monitoring .NET applications in production (Grafana, Datadog, Azure Monitor)
  • Extensive experience with AKS, EKS, App Services, and VMs in cloud environments (AWS, Azure)
  • Strong proficiency in cloud platforms (AWS, Azure) and container orchestration (Kubernetes, AKS, EKS)
  • Proficiency in infrastructure-as-code tools (Terraform, Azure Resource Manager, Bicep, Ansible)
  • Experience with monitoring and observability tools (Prometheus, Grafana, Datadog)
  • Strong scripting skills (PowerShell, Bash, Python)
  • Proven ability to work independently and manage multiple projects in a fast-paced environment
  • Excellent verbal and written communication skills and strong problem-solving abilities
  • Preferred: experience with monitoring and maintaining financial services or FinOps platforms
  • Preferred: certifications in cloud platforms (AWS Certified Solutions Architect, Azure DevOps, Kubernetes Certified Administrator)
  • Preferred: experience scaling and maintaining high-performance systems with large data throughput
Benefits
  • Hybrid working: most teams around three days a week in the office
  • 25 days holiday plus bank holidays
  • Bank holiday trading and holiday purchase options
  • Opportunity to work from anywhere in the world for up to 4 weeks per year
  • Life Assurance
  • Group Income Protection
  • Private Medical Insurance
  • Pension scheme via Salary Exchange
  • Employee Assistance Programme
  • Access to a Virtual GP
  • Enhanced maternity and paternity pay
  • Paid time off for fertility treatments and pregnancy loss
  • Cycle to Work Scheme
  • Discounts on shops, restaurants, and gym memberships
  • Free fresh fruit daily
  • Colleague networks and social groups
  • One paid volunteering day annually
  • Give-As-You-Earn charitable scheme

ATS Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
.NETmonitoringalertinginfrastructure-as-codescriptingcloud platformscontainer orchestrationscalabilityperformance optimizationincident management
Soft skills
problem-solvingcommunicationindependenceproject managementmentoringcollaborationcontinuous improvementleadershipadaptabilityanalytical thinking
Certifications
AWS Certified Solutions ArchitectAzure DevOpsKubernetes Certified Administrator
ControlUp

DevOps Engineer

ControlUp
Mid · Seniorfull-timeFlorida · 🇺🇸 United States
Posted: 13 days agoSource: jobs.lever.co
AnsibleAWSAzureCloudConsulGrafanaGroovyJenkinsKubernetesLinuxPrometheusPython+3 more
Autodesk

Machine Learning Operations Developer – AI/ML Platform

Autodesk
Mid · Seniorfull-time🇨🇦 Canada
Posted: 5 days agoSource: autodesk.wd1.myworkdayjobs.com
AnsibleAWSAzureCloudDockerGrafanaKubernetesNoSQLPrometheusPythonPyTorchSQL+2 more
Spiritual Data

Founding Automation Engineer

Spiritual Data
Mid · Seniorfull-time🇮🇳 India
Posted: 25 days agoSource: jobs.ashbyhq.com
AnsibleAWSAzureCloudDockerGoogle Cloud PlatformGrafanaJavaScriptJenkinsJMeterKubernetesPrometheus+4 more
Temporal Technologies

Senior Developer Support Engineer

Temporal Technologies
Seniorfull-time$117k–$147k / year🇺🇸 United States
Posted: 9 days agoSource: boards.greenhouse.io
AnsibleAWSAzureCloudDistributed SystemsDNSDockerGoGoogle Cloud PlatformGrafanaJavaKubernetes+4 more
MoneyHash

DevOps Engineer [Senior] [Remote - EMEA]

MoneyHash
Seniorfull-time🇺🇸 United States
Posted: 38 days agoSource: moneyhash.recruitee.com
AnsibleAWSChefCloudDockerEC2FluxGrafanaJenkinsKubernetesPostgresPrometheus+6 more