Chess.com

Senior SRE – Distributed Systems, Cloud Infrastructure

Chess.com

full-time

Posted on:

Location Type: Remote

Location: Remote • 🌎 Anywhere in the World

Visit company website
AI Apply
Apply

Job Level

Senior

Tech Stack

CloudDistributed SystemsGoKubernetesTerraformTypeScript

About the role

  • Lead the design and optimization of cloud-native services using Kubernetes, Terraform, and GitOps tools like ArgoCD
  • Develop high-performance integration patterns and manage scalable, distributed systems handling extensive data volumes
  • Dive into Golang and TypeScript codebases to identify and resolve performance bottlenecks at scale
  • Optimize infrastructure and application code to achieve aggressive performance and reliability targets, with a focus on chess programming at the bits level
  • Work closely with development teams to refine cloud service integration architectures and implement best practices
  • Monitor and enhance system reliability and performance through effective collaboration and innovative solutions
  • Participate in incident response for critical infrastructure issues, ensuring rapid resolution and minimal downtime
  • Drive improvements in infrastructure reliability, scalability, and operational efficiency
  • Utilize Terraform and Kubernetes to manage and scale our cloud infrastructure, ensuring robust, automated deployment processes

Requirements

  • 5+ years of experience managing and scaling large-scale, cloud-native distributed systems
  • Deep understanding of Kubernetes, Terraform, and GitOps practices
  • Expert in observability practices and ability to support incident response / on call
  • Extensive experience in high-performance service development with Golang
  • Proven ability to profile and optimize applications for high throughput and reliable operation
  • Strong knowledge of distributed systems design, failure modes, and robust architectural principles
  • Experience with data modeling and indexing strategies to support efficient service operations
  • Demonstrated experience improving system reliability and performance through deep code-level and architectural analysis
  • Excellent written and verbal communication skills
  • Experience working in globally distributed teams
Benefits
  • 100% remote (work from anywhere!)

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
KubernetesTerraformGitOpsGolangTypeScriptcloud-native serviceshigh-performance service developmentdata modelingindexing strategiessystem reliability
Soft skills
communication skillscollaborationincident responseproblem-solvingleadership