Tech Stack
AWSCassandraDistributed SystemsKubernetesRustTypeScript
About the role
- Own reliability end-to-end: design, measure, and improve service availability, latency, and performance across Cryptio’s platform
- Enhance observability: expand and refine metrics, logs, and traces to provide deep insight into our Rust and TypeScript services
- Lead incident management: define playbooks, improve response workflows, and foster a blameless postmortem culture
- Strengthen infrastructure: optimise AWS configurations, CI/CD pipelines, autoscaling, and networking for reliability and cost efficiency
- Collaborate across teams: work with product and engineering leads to ensure reliability is considered at every design stage
- Drive continuous improvement: identify systemic weaknesses, automate recovery where possible, and reduce MTTR across the stack
- Champion SRE best practices: guide teams on capacity planning, runbooks, and resilience testing
Requirements
- 5+ years of experience in Site Reliability, DevOps, or Infrastructure Engineering roles
- Deep understanding of distributed systems and debugging at the network, application, and database layers
- Hands-on experience with AWS, container orchestration (Kubernetes, ECS), and Infrastructure-as-Code tools (Pulumi or similar)
- Comfortable tracing through Rust and TypeScript code to diagnose complex performance or reliability issues
- Experience with (or willingness to learn) Cassandra and ClickHouse in production
- Strong collaborator with excellent communication skills
- Systematic, analytical, and passionate about building reliable systems at scale
- Interest in (or curiosity about) crypto, finance, or large-scale data systems
- Competitive salary and full benefits package
- Freedom to experiment and improve observability, alerting, and recovery pipelines end-to-end
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
Site Reliability EngineeringDevOpsInfrastructure Engineeringdistributed systemsdebuggingAWSKubernetesECSInfrastructure-as-CodeRust
Soft skills
collaborationcommunicationanalyticalsystematicpassionate