DevOps/Site Reliability Engineer
The Sports Market LLC
full-time
Posted on:
Location Type: Remote
Location: Remote • 🌎 Anywhere in the World
Visit company websiteJob Level
Mid-LevelSenior
Tech Stack
AWSCloudDistributed SystemsDockerGrafanaKubernetesNode.jsPrometheusTerraform
About the role
- Build and maintain AWS infrastructure using Terraform (VPC, EKS, networking, IAM, Secrets Manager, Route53, ALBs/NLBs).
- Operate and optimize production-grade EKS clusters: node groups, autoscaling, RBAC, OIDC integration.
- Implement TLS, certificates, ingress controllers, and network policies.
- Ensure secure, consistent multi-environment deployments across staging and production.
- Deploy and manage workloads for integrations, adapters, backend services, ledger components, and payment orchestration.
- Configure Helm charts/manifests, resource limits, autoscaling (HPA/VPA), and pod governance.
- Support distributed ledger components (via Catalyst Blockchain Manager), including Canton participants and sequencer nodes.
- Maintain operational reliability for critical workloads: event ingestion, trading integrations, settlement flows, payment orchestration, and automations.
- Build and maintain CI/CD pipelines (GitLab → ArgoCD) for automated deployments and infrastructure provisioning.
- Implement GitOps patterns and progressive delivery strategies (blue/green, canary).
- Automate secrets management, configuration flows, and cluster operations.
- Expand platform observability using Datadog, Prometheus/Grafana, and log aggregation pipelines.
- Build dashboards and alerts for Kubernetes, ledger nodes, integrations, payment workflows, and API workloads.
- Establish SLIs/SLOs and ensure system reliability targets are consistently met.
- Investigate incidents, identify root causes, and implement long-term reliability improvements.
- Improve resiliency through redundancy, autoscaling, and failure recovery strategies.
- Maintain deployment safety, rollback strategies, and operational runbooks.
- Implement IAM least-privilege policies, encryption, secrets management, and secure network segmentation.
- Maintain secure ingress patterns for third-party services (payments, KYC, trading).
- Ensure operational readiness and compliance alignment with platform standards.
- Work closely with backend and full-stack teams to ensure smooth deployments and runtime reliability.
- Support teams during platform migration efforts and environment transitions.
- Participate in incident response, observability improvements, and overall DevOps best practices.
Requirements
- Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
- 5+ years of DevOps/SRE experience operating production-grade systems.
- Strong hands-on experience with:
- - Kubernetes operations
- - AWS services (EKS, VPC, IAM, LB, Secrets Manager, Route53)
- - Terraform (IaC)
- - GitOps tooling (ArgoCD)
- - CI/CD pipelines (GitLab preferred)
- - Docker & containerized systems
- - Datadog (APM, logs, dashboards)
Benefits
- 100% remote workforce
- Modern cloud-native architecture
- High ownership, fast-moving environment
- Direct influence on the next generation of our platform
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
AWSTerraformKubernetesGitOpsCI/CDDockerDatadogTLSRBACnetwork policies
Soft skills
operational reliabilityincident responsecollaborationproblem-solvingcommunication
Certifications
Bachelor’s degree in Computer ScienceBachelor’s degree in Engineering