Tech Stack
Distributed SystemsDockerGoGrafanaGraphQLKafkaKubernetesLinuxPrometheusSaltStackSQLTerraform
About the role
- Design, develop, and maintain scalable and reliable distributed systems across the entire data lifecycle.
- Build and operate distributed data delivery pipeline for high-throughput, low-latency ingestion, processing, and routing of massive volumes of data.
- Contribute to the analytical database platform (ClickHouse) to extend functionality and performance and scale database clusters.
- Develop and enhance customer-facing GraphQL APIs, log delivery, and alerting solutions focusing on performance, reliability, and user experience.
- Identify and remove bottlenecks across data platforms, from streamlining data ingestion to optimizing query performance.
- Collaborate with other teams across Cloudflare to understand data needs and build solutions that empower data-driven decisions.
- Collaborate with the ClickHouse open-source community and participate in development of next-generation data platforms, researching and evaluating new technologies.
Requirements
- 3+ years of experience working in software development covering distributed systems and databases.
- Strong programming skills (Golang is preferable).
- Hands-on experience with modern observability stacks, including Prometheus and Grafana, and a strong understanding of handling high-cardinality metrics at scale.
- Strong knowledge of SQL and database internals, including experience with database design, optimisation, and performance tuning.
- A solid foundation in computer science, including algorithms, data structures, distributed systems, and concurrency.
- Strong analytical and problem-solving skills, with a willingness to debug, troubleshoot, and learn about complex problems at high scale.
- Ability to work collaboratively in a team environment and communicate effectively with other teams across Cloudflare.
- Experience with ClickHouse is a plus.
- Experience with data streaming technologies (e.g., Kafka, Flink) is a plus.
- Experience developing and scaling APIs, particularly GraphQL, is a plus.
- Experience with Infrastructure as Code tools like SALT or Terraform is a plus.
- Experience with Linux container technologies, such as Docker and Kubernetes, is a plus.
- Flexibility to be on-call outside of standard working hours to address technical issues as needed.