The Hashgraph Association

Systems Engineer, Monitoring

The Hashgraph Association

full-time

Posted on:

Location Type: Remote

Location: Morocco

Visit company website

Explore more

AI Apply
Apply

About the role

  • Design, deploy, and maintain monitoring solutions (Prometheus, Grafana) for DLT-specific metrics (consensus finality, node health, on-chain activity)
  • Build custom exporters and dashboards for real-time, actionable insights
  • Distinguish between infrastructure and protocol health to ensure meaningful alerts
  • Integrate and manage PagerDuty for rapid, automated incident response
  • Implement DORA-compliant processes, including automated “kill switches” and regular disaster recovery drills
  • Maintain clear, actionable runbooks for support teams
  • Deploy and manage Mirror Nodes and RPC relays using Terraform/Ansible across AWS/GCP
  • Build CI/CD pipelines for support tooling and state proof verification
  • Automate critical response actions for rapid threat mitigation
  • Serve as the L3 escalation point for complex incidents (“ghost transactions,” API anomalies)
  • Perform root cause analysis using logs (Splunk, Datadog) and collaborate with cross-functional teams

Requirements

  • 4+ years in DevOps, SRE, or NOC roles (with 1–2 years in Web3/Blockchain environments)
  • Deep expertise in Prometheus/Grafana, Linux, Docker/Kubernetes, and scripting (Python, Go, Bash)
  • Proven experience with cloud platforms (AWS/GCP) and IaC tools (Terraform)
  • Strong understanding of Hedera Hashgraph or EVM-based chains, and ability to interpret ledger APIs
  • Familiarity with ITIL/ITSM, DORA, SOC2, or ISO 27001 frameworks.
Benefits
  • 📊 Check your resume score for this job Improve your chances of getting an interview by checking your resume score before you apply. Check Resume Score
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PrometheusGrafanaLinuxDockerKubernetesPythonGoBashTerraformCI/CD
Soft Skills
incident responseroot cause analysiscollaborationcommunicationproblem-solving
Certifications
ITILITSMDORASOC2ISO 27001