Salary
💰 $140,000 - $180,000 per year
Tech Stack
AWSCloudDistributed SystemsGoGrafanaJavaJavaScriptKubernetesNode.jsPrometheusPythonTerraformTypeScript
About the role
- Ensure stability, scalability, and security of systems powering OnePay's financial products for millions of customers
- Design, build, and maintain scalable infrastructure and tooling to improve reliability, performance, and availability across the platform
- Contribute to the evolution of observability stack, platform libraries, cloud architecture, and CI/CD pipelines
- Develop automation and monitoring systems to detect, prevent, and remediate incidents before they impact customers
- Partner closely with product and platform engineering teams to embed reliability best practices in design, development, and deployment
- Lead root cause analysis and postmortems, driving long-term improvements in resiliency and fault tolerance
Requirements
- 5+ years of experience as a Software Engineer focused on building and running reliable, large-scale, distributed systems in production
- 5+ years of operational experience in observability tooling and libraries (metrics, logging, tracing); experience using Datadog or similar tools (Prometheus, Grafana)
- Proficiency in at least one programming language (Python, Go, Java, or Node.js preferred) for automation and tooling
- Proficiency in incident management, going on-call, and writing post-mortem reports
- Excellent collaboration skills with the ability to influence and educate product engineering teams on reliability and observability best practices
- Hands-on experience with cloud platforms (AWS preferred), container orchestration (Kubernetes), and IAC tools (Terraform, Pulumi)
- Drive and proactivity; builder and executor mindset
- Familiarity with functional programming concepts and fp-ts/TypeScript is a plus
- Authorization to work in the United States (application asks about work authorization and sponsorship)