Input Output (IOHK)

Site Reliability Engineer – IOE, Cardano

Input Output (IOHK)

full-time

Posted on:

Location Type: Remote

Location: Remote • 🇬🇧 United Kingdom

Visit company website
AI Apply
Apply

Job Level

Mid-LevelSenior

Tech Stack

AWSGrafanaKubernetesPostgresPrometheusPythonTerraform

About the role

  • As a Site Reliability Engineer (SRE) you are an integral part of our open-source project, ensuring the reliability, availability, and performance of our production systems.
  • This role combines service operation, systems engineering and software engineering principles to operate and monitor services as well as create or maintain tools, automations, and infrastructure code that bolster the efficiency and resilience of our platform.
  • Design, write, and deliver tools and software primarily using Python, Bash, Terraform or Nix to improve the availability, scalability, and efficiency of our services.
  • Engage in and refine the whole lifecycle of services, from inception and design, through deployment, operation, and continuous improvement.
  • Practice sustainable incident response and promote blameless postmortems.
  • Collaborate with the development teams to ensure that solutions are designed with customer experience, scalability, and performance in mind.
  • Analyze system performance and reliability, offering recommendations for enhancement.
  • Develop and uphold service-level objectives (SLOs), service-level indicators (SLIs), and error budgets for our services.
  • Participate in on-call rotations, responding to and mitigating service interruptions and technical challenges.

Requirements

  • Proficiency in Python, Bash, Terraform, Nix for DevOps services.
  • Extensive experience with AWS, specifically with services like EKS and RDS.
  • Familiarity with Container orchestration (e.g. Kubernetes) is essential.
  • Hands-on experience with PostgreSQL and its deployment on RDS.
  • Knowledge of monitoring tools (e.g., Prometheus, Grafana, Loki).
  • Solid troubleshooting and performance tuning capabilities.
  • Exceptional communication skills and team collaboration ethic.
  • Experience with CI/CD (e.g. Github Actions, Hydra, Earthly).
  • Strong analytical and troubleshooting skills.
  • Excellent communication skills to collaborate with development teams, operations, and other stakeholders.
  • Ability to quickly learn new technologies and adapt to changing environments.
  • High attention to detail to ensure system reliability and performance.
Benefits
  • Remote work
  • Laptop reimbursement
  • New starter package to buy hardware essentials (headphones, monitor, etc)
  • Learning & Development opportunities
  • Competitive PTO

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
PythonBashTerraformNixAWSEKSRDSPostgreSQLKubernetesCI/CD
Soft skills
communication skillsteam collaborationanalytical skillstroubleshooting skillsattention to detail
Kraken

Senior Site Reliability Engineer

Kraken
Seniorfull-time🇬🇧 United Kingdom
Posted: 6 hours agoSource: jobs.lever.co
Oscilar

Senior Infrastructure/Site Reliability Engineer, SRE

Oscilar
Seniorfull-time🇬🇧 United Kingdom
Posted: 7 days agoSource: jobs.ashbyhq.com
AWSCloudDistributed SystemsGoJavaKafkaKubernetesMicroservicesTerraform
Cint

Cloud Infrastructure DevOps Engineer

Cint
Mid · Seniorfull-time🇬🇧 United Kingdom
Posted: 9 days agoSource: jobs.smartrecruiters.com
AnsibleAWSCloudDockerGoGoogle Cloud PlatformGrafanaGraphiteGroovyJavaJenkinsNode.js+5 more
Cryptio

Senior Site Reliability Engineer

Cryptio
Seniorfull-time🇬🇧 United Kingdom
Posted: 14 days agoSource: jobs.ashbyhq.com
AWSCassandraDistributed SystemsKubernetesRustTypeScript