Tech Stack
AWSAzureCloudETLGoogle Cloud PlatformPySparkPythonSQL
About the role
- Report to the Lead Data Engineer and work closely with data developers and engineers within the data team
- Focus on Python/PySpark development for ETL processes and data transformation
- Leverage SQL to query the lakehouse and support analytics
- Use GitHub for version control and collaborative coding
- Design, build, and manage data pipelines, orchestrations, and other internal services in cloud platforms (GCP, Azure, AWS)
- Build, optimize, and manage scalable pipelines in cloud environments
- Support analytics teams by delivering reliable and well-structured data
- Collaborate with team members to ensure smooth integration and delivery of data services
Requirements
- Proficient in Python for data engineering and transformation
- Strong hands-on experience with PySpark and Databricks for big data processing
- Skilled in SQL for querying and managing data in lakehouse and relational systems
- Familiarity with YAML for cluster and workflow configuration
- Experience with GitHub for version control and collaborative coding
- Proficient with cloud platforms (AWS, Azure, GCP) for pipeline orchestration, data storage, and service management
- Exposure to internally built orchestration tools for ETL workflows
- Strong analytical and problem-solving skills with a focus on data quality
- Effective team player with clear communication across technical and non-technical teams
- Quick learner, adaptable to evolving tools and practices in data engineering
- Detail-oriented with the ability to work under tight deadlines
- Proactive and collaborative mindset in delivering high-quality solutions
- Working with Agile mentality and framework