Tech Stack
AirflowAmazon RedshiftAWSCloudEC2ETLNoSQLPySparkPythonSparkSQL
About the role
- Design and implement modern ETL/ELT architectures using cloud-based technologies.
- Develop and maintain data pipelines to pre-process and load data into databases for BI and application needs.
- Administer and manage databases to ensure high performance and availability.
- Work with the BI team to optimize the user experience in BI tools by aligning data preprocessing, database schema design, and data modeling.
- Monitor and optimize data storage, processing, and serving costs, and proactively suggest cost-effective improvements.
- Build data pipelines by integrating with REST APIs.
- Design and implement real-time data pipeline architectures for critical workflows.
- Develop a robust alerting system to detect and resolve data discrepancies efficiently.
Requirements
- Bachelor’s degree in Computer Science or a related quantitative field; relevant professional experience may be considered in lieu of formal education.
- Minimum 2 years of experience working as a Data Engineer.
- Strong proficiency in Python, SQL, Spark, and EC2, with hands-on experience in the AWS ecosystem.
- Practical experience with PySpark Glue jobs, Lambda functions, NoSQL databases, job orchestration using Airflow, and/or managing Redshift databases is a strong plus.
- Detail-oriented with a keen interest in examining data transformations and their impact on business outcomes.
- Excellent problem-solving and time management skills.
- Flexible and able to work effectively in a fast-paced, dynamic environment.
- Prior experience in project or team management is preferred; aptitude for mentoring and assisting others is a plus.