Wizdaa

Data Engineer, Python, PySpark, AWS Glue, Amazon Athena, SQL, Apache Airflow

Wizdaa

full-time

Posted on:

Origin:  • 🇵🇰 Pakistan

Visit company website
AI Apply
Apply

Job Level

Mid-LevelSenior

Tech Stack

AirflowApacheAWSCloudNode.jsPostgresPySparkPythonSQLTypeScript

About the role

  • Build, optimize, and scale data pipelines and infrastructure using Python, TypeScript, Apache Airflow, PySpark, AWS Glue, and Snowflake.
  • Design, operationalize, and monitor ingest and transformation workflows: DAGs, alerting, retries, SLAs, lineage, and cost controls.
  • Collaborate with platform and AI/ML teams to automate ingestion, validation, and real-time compute workflows; work toward a feature store.
  • Integrate pipeline health and metrics into engineering dashboards for full visibility and observability.
  • Model data and implement efficient, scalable transformations in Snowflake and PostgreSQL.
  • Build reusable frameworks and connectors to standardize internal data publishing and consumption.

Requirements

  • 4+ years of production data engineering experience.
  • Deep, hands-on experience with Apache Airflow, AWS Glue, PySpark, and Python-based data pipelines.
  • Strong SQL skills and experience operating PostgreSQL in live environments.
  • Solid understanding of cloud-native data workflows (AWS preferred) and pipeline observability (metrics, logging, tracing, alerting).
  • Proven experience owning pipelines end-to-end: design, implementation, testing, deployment, monitoring, and iteration.
  • Experience with Snowflake performance tuning and cost optimization (warehouses, partitions, clustering, query profiling) (preferred).
  • Real-time or near-real-time processing experience (streaming ingestion, incremental models, CDC) (preferred).
  • Hands-on experience with a backend TypeScript framework (e.g., NestJS) is a strong plus.
  • Experience with data quality frameworks, contract testing, or schema management (e.g., Great Expectations, dbt tests, OpenAPI/Protobuf/Avro).
  • Background in building internal developer platforms or data platform components (connectors, SDKs, CI/CD for data) (preferred).
  • Work hours aligned with EST or PT time zone.