Tech Stack
ApacheAWSCloudEC2ETLMongoDBNoSQLPySparkPythonSQLTableau
About the role
- Lead design, development, and optimization of high-performance Big Data ETL/ELT pipelines using PySpark
- Own data lake strategy from ingestion through processing and storage for large-scale datasets
- Take ownership of end-to-end lifecycle for core data storage systems and execute complex data migrations
- Manage and optimize AWS Big Data ecosystem (EMR, Glue, Lambda, S3, Athena) with security best practices
- Drive cloud migration initiatives and implement cost optimization and governance for cloud data platform
- Ensure data quality through testing, validation, monitoring, and governance frameworks
- Support BI initiatives by delivering analytics-ready data for tools like Power BI
- Lead strategic projects including public API development and AI/ML data foundations
- Collaborate with Data Science, Analytics, and Engineering teams to translate business requirements
- Provide documentation and take ownership of complex data challenges and platform budget
Requirements
- 4+ years of hands-on Data Engineering experience with large-scale Big Data systems
- Bachelor's degree in Computer Science, Systems Engineering, Data Engineering, or related technical field
- Advanced proficiency in PySpark including optimization and performance tuning
- Strong Python programming skills with clean, efficient code practices
- Deep experience with AWS (EMR, Glue, Lambda, S3, Athena, Step Functions, IAM, API Gateway, EC2, VPC)
- Advanced SQL skills including complex queries, optimization, and data modeling
- Experience with Big Data fundamentals, distributed computing, and data warehousing
- Proven hands-on experience with at least one data lakehouse framework (Apache Hudi, Apache Iceberg, or Delta Lake)
- Experience with data pipeline design, lifecycle management, testing, validation, and data governance
- Experience supporting BI tools (Power BI) and analytics-ready data
- Advanced English and Spanish proficiency (B2 or higher)
- Nice to have: Power BI/Tableau/QuickSight, MongoDB, AWS CloudFormation, Agile/SCRUM