Tech Stack
AzureCloudNode.jsPySparkPython
About the role
- Build and maintain Python Notebooks to ingest data from third-party APIs
- Design and implement Medallion layer architecture (Bronze, Silver, Gold) for structured data organization and progressive data refinement
- Store and manage data within Microsoft Fabric's Data Lake and Warehouse using delta parquet file formats
- Set up data pipelines and sync key datasets to Azure Synapse Analytics
- Develop PySpark-based data transformation processes across Bronze, Silver, and Gold layers
- Collaborate with developers, analysts, and stakeholders to ensure data availability and accuracy
- Monitor, test, and optimize data flows for reliability and performance
- Document processes and contribute to best practices for data ingestion and transformation
Requirements
- Strong experience with Python for data ingestion and transformation
- Proficiency with PySpark for large-scale data processing
- Proficiency in working with RESTful APIs and handling large datasets
- Experience with Microsoft Fabric or similar modern data platforms
- Understanding of Medallion architecture (Bronze, Silver, Gold) and data lakehouse concepts
- Experience working with Delta Lake and parquet file formats
- Understanding of data warehousing concepts and performance tuning
- Familiarity with cloud-based workflows, especially within the Azure ecosystem
- Nice to have: Experience with marketing APIs such as Google Ads or Google Analytics 4
- Nice to have: Familiarity with Azure Synapse and Data Factory pipeline design
- Nice to have: Understanding of data modeling for analytics and reporting use cases
- Nice to have: Experience with AI coding tools
- Nice to have: Experience with Fivetran, Aribyte, and Riverly.