DevOps/Site Reliability Engineer

The Sports Market LLC

full-time

Posted on:

Location Type: Remote

Location: Remote • 🌎 Anywhere in the World

Visit company website
AI Apply
Apply

Job Level

Mid-LevelSenior

Tech Stack

AWSCloudDistributed SystemsDockerGrafanaKubernetesNode.jsPrometheusTerraform

About the role

  • Build and maintain AWS infrastructure using Terraform (VPC, EKS, networking, IAM, Secrets Manager, Route53, ALBs/NLBs).
  • Operate and optimize production-grade EKS clusters: node groups, autoscaling, RBAC, OIDC integration.
  • Implement TLS, certificates, ingress controllers, and network policies.
  • Ensure secure, consistent multi-environment deployments across staging and production.
  • Deploy and manage workloads for integrations, adapters, backend services, ledger components, and payment orchestration.
  • Configure Helm charts/manifests, resource limits, autoscaling (HPA/VPA), and pod governance.
  • Support distributed ledger components (via Catalyst Blockchain Manager), including Canton participants and sequencer nodes.
  • Maintain operational reliability for critical workloads: event ingestion, trading integrations, settlement flows, payment orchestration, and automations.
  • Build and maintain CI/CD pipelines (GitLab → ArgoCD) for automated deployments and infrastructure provisioning.
  • Implement GitOps patterns and progressive delivery strategies (blue/green, canary).
  • Automate secrets management, configuration flows, and cluster operations.
  • Expand platform observability using Datadog, Prometheus/Grafana, and log aggregation pipelines.
  • Build dashboards and alerts for Kubernetes, ledger nodes, integrations, payment workflows, and API workloads.
  • Establish SLIs/SLOs and ensure system reliability targets are consistently met.
  • Investigate incidents, identify root causes, and implement long-term reliability improvements.
  • Improve resiliency through redundancy, autoscaling, and failure recovery strategies.
  • Maintain deployment safety, rollback strategies, and operational runbooks.
  • Implement IAM least-privilege policies, encryption, secrets management, and secure network segmentation.
  • Maintain secure ingress patterns for third-party services (payments, KYC, trading).
  • Ensure operational readiness and compliance alignment with platform standards.
  • Work closely with backend and full-stack teams to ensure smooth deployments and runtime reliability.
  • Support teams during platform migration efforts and environment transitions.
  • Participate in incident response, observability improvements, and overall DevOps best practices.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
  • 5+ years of DevOps/SRE experience operating production-grade systems.
  • Strong hands-on experience with:
  • - Kubernetes operations
  • - AWS services (EKS, VPC, IAM, LB, Secrets Manager, Route53)
  • - Terraform (IaC)
  • - GitOps tooling (ArgoCD)
  • - CI/CD pipelines (GitLab preferred)
  • - Docker & containerized systems
  • - Datadog (APM, logs, dashboards)
Benefits
  • 100% remote workforce
  • Modern cloud-native architecture
  • High ownership, fast-moving environment
  • Direct influence on the next generation of our platform

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
AWSTerraformKubernetesGitOpsCI/CDDockerDatadogTLSRBACnetwork policies
Soft skills
operational reliabilityincident responsecollaborationproblem-solvingcommunication
Certifications
Bachelor’s degree in Computer ScienceBachelor’s degree in Engineering