
Senior SRE – Distributed Systems, Cloud Infrastructure
Chess.com
full-time
Posted on:
Location Type: Remote
Location: Remote • 🌎 Anywhere in the World
Visit company websiteJob Level
Senior
Tech Stack
CloudDistributed SystemsGoKubernetesTerraformTypeScript
About the role
- Lead the design and optimization of cloud-native services using Kubernetes, Terraform, and GitOps tools like ArgoCD
- Develop high-performance integration patterns and manage scalable, distributed systems handling extensive data volumes
- Dive into Golang and TypeScript codebases to identify and resolve performance bottlenecks at scale
- Optimize infrastructure and application code to achieve aggressive performance and reliability targets, with a focus on chess programming at the bits level
- Work closely with development teams to refine cloud service integration architectures and implement best practices
- Monitor and enhance system reliability and performance through effective collaboration and innovative solutions
- Participate in incident response for critical infrastructure issues, ensuring rapid resolution and minimal downtime
- Drive improvements in infrastructure reliability, scalability, and operational efficiency
- Utilize Terraform and Kubernetes to manage and scale our cloud infrastructure, ensuring robust, automated deployment processes
Requirements
- 5+ years of experience managing and scaling large-scale, cloud-native distributed systems
- Deep understanding of Kubernetes, Terraform, and GitOps practices
- Expert in observability practices and ability to support incident response / on call
- Extensive experience in high-performance service development with Golang
- Proven ability to profile and optimize applications for high throughput and reliable operation
- Strong knowledge of distributed systems design, failure modes, and robust architectural principles
- Experience with data modeling and indexing strategies to support efficient service operations
- Demonstrated experience improving system reliability and performance through deep code-level and architectural analysis
- Excellent written and verbal communication skills
- Experience working in globally distributed teams
Benefits
- 100% remote (work from anywhere!)
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
KubernetesTerraformGitOpsGolangTypeScriptcloud-native serviceshigh-performance service developmentdata modelingindexing strategiessystem reliability
Soft skills
communication skillscollaborationincident responseproblem-solvingleadership