Tech Stack
AirflowApacheAWSCloudNode.jsPostgresPySparkPythonSQLTypeScript
About the role
- Build, optimize, and scale data pipelines and infrastructure using Python, TypeScript, Apache Airflow, PySpark, AWS Glue, and Snowflake.
- Design, operationalize, and monitor ingest and transformation workflows: DAGs, alerting, retries, SLAs, lineage, and cost controls.
- Collaborate with platform and AI/ML teams to automate ingestion, validation, and real-time compute workflows; work toward a feature store.
- Integrate pipeline health and metrics into engineering dashboards for full visibility and observability.
- Model data and implement efficient, scalable transformations in Snowflake and PostgreSQL.
- Build reusable frameworks and connectors to standardize internal data publishing and consumption.
Requirements
- 4+ years of production data engineering experience.
- Deep, hands-on experience with Apache Airflow, AWS Glue, PySpark, and Python-based data pipelines.
- Strong SQL skills and experience operating PostgreSQL in live environments.
- Solid understanding of cloud-native data workflows (AWS preferred) and pipeline observability (metrics, logging, tracing, alerting).
- Proven experience owning pipelines end-to-end: design, implementation, testing, deployment, monitoring, and iteration.
- Experience with Snowflake performance tuning and cost optimization (warehouses, partitions, clustering, query profiling) (preferred).
- Real-time or near-real-time processing experience (streaming ingestion, incremental models, CDC) (preferred).
- Hands-on experience with a backend TypeScript framework (e.g., NestJS) is a strong plus.
- Experience with data quality frameworks, contract testing, or schema management (e.g., Great Expectations, dbt tests, OpenAPI/Protobuf/Avro).
- Background in building internal developer platforms or data platform components (connectors, SDKs, CI/CD for data) (preferred).
- Work hours aligned with EST or PT time zone.