
Senior Site Reliability Engineer, Node Platform
Chainlink Labs
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Job Level
About the role
- You will design and build the infrastructure primitives that define how Chainlink Decentralized Oracle Networks (DONs) scale across internal systems and the decentralized ecosystem.
- You will help create the CRE (Kubernetes-based) control plane that enables:
- Deterministic horizontal scaling of DONs
- Safe and repeatable infrastructure expansion
- Improved operational efficiency and scalability
- You will develop the core infrastructure components, including Kubernetes Operators and scaling automation, that Product teams will adopt and then might later be distributed to external node operators to improve decentralized scaling.
Requirements
- 6–9+ years in SRE / Platform / Infrastructure Engineering
- Proven experience scaling Kubernetes in high-throughput production environments
- Deep knowledge of:
- Scheduler behavior
- StatefulSets & persistent workloads
- Autoscaling strategies (HPA, VPA, KEDA, custom scaling)
- Resource management & performance tuning
- Multi-cluster and multi-region architectures
- Experience in diagnosing production failures at the cluster scale
- Strong Terraform or Crossplane experience
- GitOps workflows (ArgoCD / Flux) experience
- CI/CD reliability experience
- Automation-first mindset
- AWS production experience
- Proficiency in Go (strongly preferred) or equivalent systems language.
Benefits
- All roles with Chainlink Labs are global and remote-based.
- We carefully review all applications and aim to provide a response to every candidate within two weeks after the job posting closes.
- Commitment to Equal Opportunity
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
KubernetesTerraformCrossplaneGoGitOpsCI/CDAutoscalingResource managementPerformance tuningMulti-cluster architectures
Soft Skills
Automation-first mindsetOperational efficiencyProblem-solving