Chainlink Labs

Senior Site Reliability Engineer, Node Platform

Chainlink Labs

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • You will design and build the infrastructure primitives that define how Chainlink Decentralized Oracle Networks (DONs) scale across internal systems and the decentralized ecosystem.
  • You will help create the CRE (Kubernetes-based) control plane that enables:
  • Deterministic horizontal scaling of DONs
  • Safe and repeatable infrastructure expansion
  • Improved operational efficiency and scalability
  • You will develop the core infrastructure components, including Kubernetes Operators and scaling automation, that Product teams will adopt and then might later be distributed to external node operators to improve decentralized scaling.

Requirements

  • 6–9+ years in SRE / Platform / Infrastructure Engineering
  • Proven experience scaling Kubernetes in high-throughput production environments
  • Deep knowledge of:
  • Scheduler behavior
  • StatefulSets & persistent workloads
  • Autoscaling strategies (HPA, VPA, KEDA, custom scaling)
  • Resource management & performance tuning
  • Multi-cluster and multi-region architectures
  • Experience in diagnosing production failures at the cluster scale
  • Strong Terraform or Crossplane experience
  • GitOps workflows (ArgoCD / Flux) experience
  • CI/CD reliability experience
  • Automation-first mindset
  • AWS production experience
  • Proficiency in Go (strongly preferred) or equivalent systems language.
Benefits
  • All roles with Chainlink Labs are global and remote-based.
  • We carefully review all applications and aim to provide a response to every candidate within two weeks after the job posting closes.
  • Commitment to Equal Opportunity
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
KubernetesTerraformCrossplaneGoGitOpsCI/CDAutoscalingResource managementPerformance tuningMulti-cluster architectures
Soft Skills
Automation-first mindsetOperational efficiencyProblem-solving