DevOps/Site Reliability Engineer

The Sports Market LLC

full-time

Posted on: 12/16/2025

Location Type: Remote

Location: Remote • 🌎 Anywhere in the World

Visit company website

✨ AI Apply

Apply

Job Level

Mid-LevelSenior

Tech Stack

AWSCloudDistributed SystemsDockerGrafanaKubernetesNode.jsPrometheusTerraform

About the role

Build and maintain AWS infrastructure using Terraform (VPC, EKS, networking, IAM, Secrets Manager, Route53, ALBs/NLBs).
Operate and optimize production-grade EKS clusters: node groups, autoscaling, RBAC, OIDC integration.
Implement TLS, certificates, ingress controllers, and network policies.
Ensure secure, consistent multi-environment deployments across staging and production.
Deploy and manage workloads for integrations, adapters, backend services, ledger components, and payment orchestration.
Configure Helm charts/manifests, resource limits, autoscaling (HPA/VPA), and pod governance.
Support distributed ledger components (via Catalyst Blockchain Manager), including Canton participants and sequencer nodes.
Maintain operational reliability for critical workloads: event ingestion, trading integrations, settlement flows, payment orchestration, and automations.
Build and maintain CI/CD pipelines (GitLab → ArgoCD) for automated deployments and infrastructure provisioning.
Implement GitOps patterns and progressive delivery strategies (blue/green, canary).
Automate secrets management, configuration flows, and cluster operations.
Expand platform observability using Datadog, Prometheus/Grafana, and log aggregation pipelines.
Build dashboards and alerts for Kubernetes, ledger nodes, integrations, payment workflows, and API workloads.
Establish SLIs/SLOs and ensure system reliability targets are consistently met.
Investigate incidents, identify root causes, and implement long-term reliability improvements.
Improve resiliency through redundancy, autoscaling, and failure recovery strategies.
Maintain deployment safety, rollback strategies, and operational runbooks.
Implement IAM least-privilege policies, encryption, secrets management, and secure network segmentation.
Maintain secure ingress patterns for third-party services (payments, KYC, trading).
Ensure operational readiness and compliance alignment with platform standards.
Work closely with backend and full-stack teams to ensure smooth deployments and runtime reliability.
Support teams during platform migration efforts and environment transitions.
Participate in incident response, observability improvements, and overall DevOps best practices.

Requirements

Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
5+ years of DevOps/SRE experience operating production-grade systems.
Strong hands-on experience with:
- Kubernetes operations
- AWS services (EKS, VPC, IAM, LB, Secrets Manager, Route53)
- Terraform (IaC)
- GitOps tooling (ArgoCD)
- CI/CD pipelines (GitLab preferred)
- Docker & containerized systems
- Datadog (APM, logs, dashboards)

Benefits

100% remote workforce
Modern cloud-native architecture
High ownership, fast-moving environment
Direct influence on the next generation of our platform

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills

AWSTerraformKubernetesGitOpsCI/CDDockerDatadogTLSRBACnetwork policies

Soft skills

operational reliabilityincident responsecollaborationproblem-solvingcommunication

Certifications

Bachelor’s degree in Computer ScienceBachelor’s degree in Engineering