Tech Stack
AirflowAWSCloudKubernetesOpen SourcePythonSQL
About the role
- This remote position requires an overlap with US Pacific Timezone.
- Build PB scale data pipelines that can scale to 100K+ jobs and handle PB scale data per day
- Design and implement cloud-native data infrastructure on AWS using Kubernetes and Airflow
- Design and develop SQL intelligence systems for query optimization, dynamic pipeline generation, and data lineage tracking
- Contribute to open-source initiatives
- Work with team of engineers to integrate AI into data operations
Requirements
- 8+ years of experience in data engineering, with a focus on building scalable data pipelines and systems
- Strong proficiency in Python and SQL
- Extensive experience with SQL query profiling, optimization, and performance tuning, preferably with Snowflake
- Deep understanding of SQL Abstract Syntax Tree (AST) and experience working with SQL parsers (e.g., sqlglot) for generating column-level lineage and dynamic ETLs
- Experience in building data pipelines using Airflow or dbt
- [Optional] Solid understanding of cloud platforms, particularly AWS
- [Optional] Familiarity with Kubernetes (K8s) for containerized deployments