Fixify

Senior Site Reliability Engineer

Fixify

full-time

Posted on:

Location Type: Remote

Location: Ireland

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Design and maintain scalable, fault-tolerant infrastructure that supports our SaaS platform and keeps pace with business growth.
  • Instrument observability best practices—embracing tracing-first approaches, meaningful metrics, and monitoring that actually helps during incidents.
  • Define, document, and maintain SLIs, SLOs, and SLAs in partnership with product engineering, translating business commitments into technical guardrails.
  • Build automation that eliminates manual intervention across CI/CD, deployments, configuration management, and recovery—because your time is better spent on strategic problems.
  • Lead incident response with steady judgment, facilitate blameless postmortems, and drive remediation efforts that prevent recurrence.
  • Partner with engineering and product teams during design reviews to ensure new features are production-ready and operationally scalable.
  • Optimize infrastructure costs through performance tuning, capacity planning, and smart use of cloud resources.
  • Mentor engineers on operational best practices and champion reliability thinking across the organization.
  • Document infrastructure architecture clearly and maintain the kind of runbooks that your future self will thank you for.

Requirements

  • 4+ years of experience in SRE, DevOps, or infrastructure engineering roles, with demonstrated experience supporting SaaS platforms in production.
  • Expert-level knowledge of an infrastructure-as-code framework (Pulumi, Terraform, CDK)—you should be the kind of person who thinks "if it's not in code, it doesn't exist."
  • Strong working knowledge of AWS (or equivalent cloud platforms), including designing for availability, scalability, and security.
  • Proficiency in TypeScript or Python for infrastructure automation and tooling.
  • Experience with containerization and orchestration (ECS Fargate, Kubernetes, or similar).
  • Deep familiarity with observability tools and practices (OpenTelemetry, CloudWatch, Honeycomb)—bonus points if you embrace a tracing-first philosophy.
  • Solid understanding of networking, load balancing, and distributed systems concepts.
  • Experience with CI/CD tooling (GitHub Actions, CodeBuild, or equivalent).
  • The ability to communicate complex operational issues clearly to both technical and non-technical stakeholders.
  • Calm effectiveness during high-pressure incidents and the judgment to balance competing priorities like performance, cost, and reliability.
  • A collaborative spirit and the ability to build strong relationships with engineering, product, and operations teams.
  • Prior experience working closely with product engineering teams is a strong plus—this role thrives on cross-disciplinary understanding.
  • A commitment to continuous learning and improving team practices, systems, and culture.
Benefits
  • Give you ownership over infrastructure that powers a globally-used platform, with clear visibility into how your work drives collaboration and productivity.
  • Provide meaningful opportunities to learn and grow, whether that's diving deeper into distributed systems, exploring new observability paradigms, or mastering the latest cloud-native technologies.
  • Surround you with a team that values blameless postmortems, continuous improvement, and the kind of operational culture where everyone learns from every incident.
  • Share the "why" behind architectural decisions and give you a voice in shaping Fixify's reliability engineering principles as we scale.
  • Connect you directly with product engineers and users, so you see firsthand how reliable infrastructure translates into delighted customers.
  • Let you work across a hybrid container and serverless infrastructure environment, using what works best and leaning into a service’s strengths.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
infrastructure-as-codePulumiTerraformCDKAWSTypeScriptPythonECS FargateKubernetesCI/CD
Soft Skills
communicationcalm effectivenessjudgmentcollaborative spiritmentoringincident responseblameless postmortemsstrategic problem solvingrelationship buildingcontinuous learning