Tech Stack
BigQueryCloudETLGoogle Cloud PlatformPythonSQL
About the role
- Design, develop, and maintain scalable data pipelines for ingesting, transforming, and loading data from various sources into BigQuery.
- Work with diverse data formats including JSON, CSV, Avro, and Parquet, ensuring efficient data handling and storage.
- Implement and manage data storage solutions within Google Cloud Storage (GCS) for raw data and backups; optimize BigQuery for performance and cost-efficiency.
- Develop and orchestrate ETL/ELT processes, including managed daily batch ingestion, to support data analytics and reporting needs.
- Collaborate with data analysts and product teams to understand data requirements and translate them into technical solutions.
- Ensure data quality, governance, and security by implementing IAM roles, adhering to logging/auditing standards, and managing schema.
- Utilize SQL and Python for data manipulation and pipeline development; contribute to CI/CD processes and maintain documentation for data solutions.
Requirements
- Proven experience as a Data Engineer, with a strong focus on Google Cloud Platform (GCP).
- Expertise in BigQuery for data warehousing and optimization.
- Experience with Google Cloud Storage (GCS) for data lake and backup solutions.
- Proficiency in SQL and Python for data engineering tasks; understanding of Bash scripting.
- Familiarity with data ingestion techniques, including connectors and custom pipelines.
- Understanding of data partitioning, schema management, and data retention policies.
- Experience with data visualization tools like Looker Studio (or similar) and their integration with BigQuery.
- Knowledge of data governance, security (IAM), and quality best practices.
- Ability to work collaboratively within a global team and agile environment, utilizing tools like Jira and Confluence.