Hyperlayer

Senior Sustaining Engineer

Hyperlayer

full-time

Posted on:

Location Type: Remote

Location: Canada

Visit company website

Explore more

AI Apply
Apply

Job Level

Tech Stack

About the role

  • Act as a primary responder in a 24x7 on-call rotation for high-priority incidents, ensuring fast acknowledgment (MTTA targets) and resolution to minimize customer impact in our event-driven fintech platform
  • Conduct root-cause analysis (RCA) for complex issues, collaborating closely with development teams to implement robust solutions and deliver RCAs within 5 business days for Sev1/Sev2 incidents.
  • Lead the development and deployment of small, customer-facing features and improvements, ensuring alignment with business needs and system requirements while adhering to change success rates ≥99%.
  • Work with mid- and junior-level engineers, providing guidance in incident response, troubleshooting best practices, and coding standards within a global rota, including handovers and knowledge sharing via tools like Rootly.
  • Take ownership of software maintainability initiatives, identifying and implementing optimizations, and enhancing system performance to achieve availability ≥99.99% (four nines).
  • Participate in regular post-incident reviews (blameless retros), documenting lessons learned and suggesting improvements to incident response processes and runbooks for our technology stack.
  • Collaborate with the infrastructure team to monitor system health and proactively identify areas for improvement in stability and efficiency using tools like Datadog, Rootly, and CloudWatch/AppDynamics.

Requirements

  • Bachelor's degree in computer science, Engineering, or a related field.
  • Minimum of 5+ years of experience in sustaining engineering, DevOps, or software engineering with a focus on incident response and system reliability in fintech or regulated environments.
  • Advanced troubleshooting skills and experience with Golang (preferred), Java, or similar languages, plus familiarity with event-driven architectures (e.g., NATS/JetStream, Redis clustering).
  • Strong familiarity with monitoring and incident response tools (e.g., Datadog, Rootly) and experience implementing improvements in similar systems to meet SLAs like MTTA/MTTR.
  • Proven ability to conduct in-depth root-cause analysis and implement long-term fixes in compliance-aware settings (e.g., GDPR/FCA-aligned).
  • Experience mentoring or guiding mid-level engineers, with a focus on knowledge sharing and process improvements in geo-distributed teams.
  • Awareness of ITILv4 principles (e.g., incident/change management) and tools like Rootly for unified workflows.
  • Strong communication skills and the ability to work collaboratively with both technical and non-technical teams across time zones.
Benefits
  • Out‑of‑hours on‑call rotation with additional compensation
  • Equity, diversity, and inclusion initiatives
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
GolangJavaevent-driven architectureroot-cause analysistroubleshootingsystem reliabilitysoftware maintainabilityoptimizationsincident responsechange management
Soft Skills
communicationmentoringcollaborationguidanceknowledge sharingleadershipproblem-solvingprocess improvementteamworkadaptability