Tech Stack
AWSEC2HadoopLinuxPythonSparkSQL
About the role
- Lead and steer data engineering projects that redefine domains such as urban living, media channels, and healthcare.
- Orchestrate and design complex data workflows and pipelines (Snowflake, Luigi, Hadoop, Spark/Hive).
- Develop automation and scripting for data workflows using Python and Linux Bash.
- Deploy and manage data solutions on AWS (EC2, S3, RDS, EMR) and optimize data storage and queries.
- Monitor, troubleshoot, and implement monitoring/alerting for real-time data pipelines.
- Design or optimize data lake and data architecture solutions and ensure data governance/security compliance.
- Collaborate with stakeholders, lead teams, mentor junior engineers, and communicate technical direction.
- Support CI/CD and automation for deployment and testing of data engineering solutions.
Requirements
- Experience: 5-10 years of experience in data engineering
- Strong exp in Snowflake.
- Strong programming skills in Python and Linux Bash for automation and data workflows.
- Hands-on experience with Luigi for orchestrating complex data workflows.
- Expertise in Hadoop ecosystem tools and managing SQL databases for data storage and query optimization.
- In-depth knowledge of AWS EC2, S3, RDS, and EMR to deploy and manage data solutions.
- Familiarity with monitoring solutions for real-time tracking and troubleshooting of data pipelines.
- Proven ability to lead projects, communicate with stakeholders, and guide junior team members.
- Experience designing or optimizing data lake solutions.
- Understanding of data security practices, data governance, and compliance for secure data processing.
- Familiarity with CI/CD tools to support automation of deployment and testing.
- Knowledge of big data processing tools like Spark, Hive, or related AWS services.
- Background in analytics or data science to contribute to more data-driven decision-making.
- Experience collaborating with non-technical teams on business goals and technical solutions.