Tech Stack
AirflowAmazon RedshiftAWSETLHadoopJavaPythonScalaSparkSQLTerraform
About the role
- Lead a team of data engineers in designing, building, and maintaining highly scalable, robust, and reliable data pipelines within an AWS environment.
- Drive the strategic direction for data warehousing, data modeling, data processing frameworks, and big data initiatives.
- Mentor team members and collaborate with cross-functional stakeholders to align engineering efforts with organizational data goals.
- Design, develop, and optimize ETL/ELT pipelines using AWS Glue, Redshift, and related AWS services.
- Establish best practices for data ingestion, transformations, orchestration, and scheduling.
- Oversee development of inbound and outbound data feeds for integrations with third-party services and downstream applications.
- Create and maintain scalable data models in Redshift; ensure architecture adheres to industry best practices for performance optimization.
- Develop and maintain big data solutions (Spark, EMR, Hadoop) for large-scale data processing.
- Implement and manage data lake solutions with AWS S3 and associated ingestion strategies.
- Leverage Infrastructure-as-Code (CloudFormation, Terraform) to automate resource provisioning and configuration.
- Analyze project requirements to accurately estimate effort and resource needs; develop timelines and monitor ongoing projects.
- Partner with analytics, product, business, and reporting teams to gather data requirements and translate them into technical solutions; conduct design reviews.
- Implement data governance best practices to ensure data quality, integrity, and security; collaborate with Security and Compliance teams.
- Establish and monitor metrics for ETL performance; optimize Redshift cluster configurations, WLM, and query performance.
- Oversee release planning and deployment of data platform enhancements; ensure testing, documentation, and rollback strategies.
- Manage data job orchestration using Airflow (or Step Functions); optimize ETL job performance and ensure best practices for DAGs.
- Support Data Ops team to resolve high-priority issues and act as escalation point for complex data engineering challenges.
- Implement and enforce code review best practices and promote efficient unit testing frameworks and methodologies.
Requirements
- AWS Services: Deep hands-on experience with Redshift, Glue, S3, IAM, Lambda, and related data pipeline services.
- Proficiency in Airflow (preferred) or similar workflow orchestration tools.
- Familiarity with scheduling, logging, alerting, and error-handling best practices.
- Strong knowledge of schema design, dimensional modeling, and normalization best practices.
- Experience with Spark, EMR, Hadoop, or other large-scale data processing frameworks.
- Proficiency in Python, SQL; familiarity with Java/Scala is a plus.
- Familiarity with Git, automated testing, and continuous integration practices.
- Demonstrated experience designing and implementing external data integrations (APIs, file-based, streaming) and delivering data feeds to downstream applications or third-party services.
- Knowledge of security and compliance best practices for external data exchange (e.g., authentication, encryption, data masking).
- Demonstrated experience leading high-performing technical teams, including mentorship, performance management, and capacity planning.
- Excellent communication skills to translate complex data requirements into actionable engineering plans and vice versa.
- Proven ability to collaborate effectively with cross-functional stakeholders (Product, Analytics, Reporting, BI, etc.).
- Experience managing multiple data initiatives simultaneously, prioritizing tasks, and meeting deadlines.
- Proven track record in accurately estimating project effort, mitigating risks, and staying within budget.