
Senior Site Reliability Engineer
Fixify
full-time
Posted on:
Location Type: Remote
Location: Ireland
Visit company websiteExplore more
Job Level
About the role
- Design and maintain scalable, fault-tolerant infrastructure that supports our SaaS platform and keeps pace with business growth.
- Instrument observability best practices—embracing tracing-first approaches, meaningful metrics, and monitoring that actually helps during incidents.
- Define, document, and maintain SLIs, SLOs, and SLAs in partnership with product engineering, translating business commitments into technical guardrails.
- Build automation that eliminates manual intervention across CI/CD, deployments, configuration management, and recovery—because your time is better spent on strategic problems.
- Lead incident response with steady judgment, facilitate blameless postmortems, and drive remediation efforts that prevent recurrence.
- Partner with engineering and product teams during design reviews to ensure new features are production-ready and operationally scalable.
- Optimize infrastructure costs through performance tuning, capacity planning, and smart use of cloud resources.
- Mentor engineers on operational best practices and champion reliability thinking across the organization.
- Document infrastructure architecture clearly and maintain the kind of runbooks that your future self will thank you for.
Requirements
- 4+ years of experience in SRE, DevOps, or infrastructure engineering roles, with demonstrated experience supporting SaaS platforms in production.
- Expert-level knowledge of an infrastructure-as-code framework (Pulumi, Terraform, CDK)—you should be the kind of person who thinks "if it's not in code, it doesn't exist."
- Strong working knowledge of AWS (or equivalent cloud platforms), including designing for availability, scalability, and security.
- Proficiency in TypeScript or Python for infrastructure automation and tooling.
- Experience with containerization and orchestration (ECS Fargate, Kubernetes, or similar).
- Deep familiarity with observability tools and practices (OpenTelemetry, CloudWatch, Honeycomb)—bonus points if you embrace a tracing-first philosophy.
- Solid understanding of networking, load balancing, and distributed systems concepts.
- Experience with CI/CD tooling (GitHub Actions, CodeBuild, or equivalent).
- The ability to communicate complex operational issues clearly to both technical and non-technical stakeholders.
- Calm effectiveness during high-pressure incidents and the judgment to balance competing priorities like performance, cost, and reliability.
- A collaborative spirit and the ability to build strong relationships with engineering, product, and operations teams.
- Prior experience working closely with product engineering teams is a strong plus—this role thrives on cross-disciplinary understanding.
- A commitment to continuous learning and improving team practices, systems, and culture.
Benefits
- Give you ownership over infrastructure that powers a globally-used platform, with clear visibility into how your work drives collaboration and productivity.
- Provide meaningful opportunities to learn and grow, whether that's diving deeper into distributed systems, exploring new observability paradigms, or mastering the latest cloud-native technologies.
- Surround you with a team that values blameless postmortems, continuous improvement, and the kind of operational culture where everyone learns from every incident.
- Share the "why" behind architectural decisions and give you a voice in shaping Fixify's reliability engineering principles as we scale.
- Connect you directly with product engineers and users, so you see firsthand how reliable infrastructure translates into delighted customers.
- Let you work across a hybrid container and serverless infrastructure environment, using what works best and leaning into a service’s strengths.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
infrastructure-as-codePulumiTerraformCDKAWSTypeScriptPythonECS FargateKubernetesCI/CD
Soft Skills
communicationcalm effectivenessjudgmentcollaborative spiritmentoringincident responseblameless postmortemsstrategic problem solvingrelationship buildingcontinuous learning