Data Engineer

• Architect, develop, and maintain scalable, efficient, and fault-tolerant data pipelines using Python and PySpark.
• Design pipeline workflows for batch and real-time data processing using orchestration tools like Apache Airflow or Azure Data Factory.
• Implement automated data ingestion frameworks to extract data from structured, semi-structured, and unstructured sources such as APIs, FTP, and data streams.
• Architect and optimize scalable Data Warehouse and Data Lake solutions using Snowflake, Azure Data Lake, or AWS S3.
• Implement partitioning, bucketing, and indexing strategies for efficient querying and data storage management.
• Develop ETL/ELT pipelines using tools like Azure Data Factory or Snowflake to handle complex data transformations and business logic.
• Integrate DBT to automate data transformations, ensuring modularity and testability.
• Ensure pipelines are optimized for cost-efficiency and high performance.
• Write, optimize, and troubleshoot complex SQL queries for data manipulation, aggregation, and reporting.
• Design and implement dimensional and normalized data models (star and snowflake schemas) for analytics use cases.
• Deploy and manage data workflows on cloud platforms using services like AWS Glue, Azure Synapse Analytics, or Databricks.
• Monitor resource usage and costs, implementing cost-saving measures such as data lifecycle management and auto-scaling.
• Implement data quality frameworks to validate, clean, and enrich datasets.
• Build self-healing mechanisms to minimize downtime and ensure reliability of critical pipelines.
• Optimize Spark workflows by tuning executor memory and partitioning.
• Conduct profiling and debugging of data workflows to identify and resolve bottlenecks.
• Collaborate with data analysts, scientists, and stakeholders to define requirements and deliver usable datasets.
• Maintain clear documentation for pipelines, workflows, and architectural decisions.
• Conduct code reviews to ensure best practices in coding and performance optimization.

Senior Data Engineer, Snowflake

AWS Data Engineer

Senior Data Engineer, Data Bricks