Tech Stack
AWSAzureCloudDistributed SystemsEC2GoGoogle Cloud PlatformJavaKubernetesSQL
About the role
- Benchmark system performance, database performance analysis, capacity sizing and optimization.
- Ability to troubleshoot and debug application and server errors and logs and triage accordingly.
- Recommend configuration tuning/optimizations for performance bottlenecks.
- Work closely with ClickHouse core development team, cloud team, security team and partner with them to improve the performance of ClickHouse Cloud.
- Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities.
- Develop, deploy and manage tools to systematically run chaos experiments and measure impact.
- Enjoy working on, and gaining a deep understanding of, large scale distributed systems.
- Study the problems in the software resilience, operational, and delivery spaces.
- Extend our entire backend to enable Chaos Engineering techniques in the system.
- Observe running systems, and determine/prioritize innovative ways to disrupt them.
Requirements
- 6+ years of relevant software development industry experience building and operating scalable, fault-tolerant, distributed systems.
- Proven track record of understanding the performance limits of different distributed databases and creating tools for measuring the performance and scalability of complex systems.
- Experience with database benchmarking, test automation, system engineering, performance analysis, and capacity management.
- Software development experience in Go, C/C++, Java, or similar.
- Experience with concurrency, multithreading, and the deployment of distributed system architectures.
- Experience developing cloud infrastructure services, preferably with Kubernetes.
- Experience leading and shipping large scope technical projects in collaboration with multiple experienced engineers.
- Expertise with a public cloud provider (AWS, GCP, Azure) and their infrastructure as a service offering (e.g. EC2).
- Excellent communication skills and the ability to work well within a team and across engineering teams.
- Strong problem solver with solid production debugging skills.
- Passionate about efficiency, availability, scalability and data governance.
- Thrive in a fast paced environment and see yourself as a partner with the business.
- High level of responsibility, ownership, and accountability.