Tech Stack
AWSAzureCloudDistributed SystemsEC2GoGoogle Cloud PlatformKafkaKubernetesNumpyPandasPythonSparkSQL
About the role
- Build a cutting-edge Cloud Native platform on top of the public cloud.
- Improve the metrics pipeline and build algorithms to generate better autoscaling statistics and recommendations.
- Work on the autoscale and Kubernetes operator to support seamless Vertical and Horizontal Auto-scaling.
- Work closely with our ClickHouse core development team and other data plane teams, partnering with them to support auto-scaling use cases as well as other internal infrastructure improvements.
- Architecting and building a robust, scalable, and highly available distributed infrastructure
Requirements
- 5+ years of relevant software development industry experience building and operating scalable, fault-tolerant, distributed systems.
- Experience building operators with Kubernetes, controller runtime
- Production experience with programming languages like Go, C++
- You are not a stranger to PagerDuty On-call, debugging things in production and are a strong problem-solver
- Expertise with a public cloud provider (AWS, GCP, Azure) and their infrastructure as a service offering (e.g., EC2).
- Experience with Data Storage, Ingestion, and Transformation (Spark, Kafka or similar tools).
- You are passionate about solving data problems at Scale.
- Experience with Python (uv, rye, fastAPI) Data Science (Pandas, NumPy etc) is good to have.
- You have excellent communication skills and the ability to work well within and across engineering teams