Tech Stack
Amazon RedshiftCloudETLMongoDBMySQLNoSQLPySparkPython
About the role
- Consume and extract data from relational databases (like MySQL) and NoSQL databases (like MongoDB).
- Build and maintain data ingestion pipelines using integration platforms (like Hevo) to move data from source systems.
- Transform and process data using Python and distributed processing frameworks (like PySpark) for warehouse optimization.
- Design and implement efficient data loading strategies into cloud data warehouses (like Amazon Redshift).
- Develop data transformation models and workflows using modern ETL frameworks (like dbt).
- Optimize warehouse schemas and queries for Business Intelligence consumption.
- Collaborate with database administrators and analysts to ensure seamless data flow.
- Monitor and troubleshoot data pipelines from source databases through to the data warehouse Ensure data quality, consistency, and governance throughout the ingestion process.
- Support BI teams with data warehouse preparation and access patterns
Requirements
- Strong proficiency in Python programming.
- Experience with distributed data processing frameworks for transformation and processing
- Hands-on experience with relational database querying and administration.
- Experience with cloud data warehouse design and optimization.
- Experience with data integration platforms.
- Experience with Business Intelligence tools (like PowerBI)
- Preferred Qualifications:
- Experience with advanced BI platforms (like ThoughtSpot) is a significant advantage.
- NoSQL database experience for diverse data extraction and processing.
- Experience with modern ETL modeling tools.
- Advanced cloud data warehouse performance tuning and optimization.
- Experience with enterprise data integration pipeline configuration and management.