
Mid-level Data Engineer – GCP
Leega
full-time
Posted on:
Location Type: Remote
Location: Brasil
Visit company websiteExplore more
About the role
- **Load/Pipeline Analysis and Planning:**
- Assess the data warehouse architecture and requirements.
- **Map data, transformations and processes across GCP services (Cloud Storage, BigQuery, Dataproc).**
- Define the data migration strategy (full load, incremental, CDC).
- Develop a data architecture plan on GCP.
- **Design and Data Modeling on GCP:**
- Design table schemas in BigQuery with consideration for performance, cost and scalability.
- Define partitioning and clustering strategies for BigQuery.
- Model data zones in Cloud Storage (Bronze, Silver and Gold).
- **ELT/ETL Pipeline Development:**
- Create data transformation routines using Dataproc (Spark) or Dataflow to load data into BigQuery.
- Translate business logic and existing transformations into GCP.
- Implement data validation and quality mechanisms.
- **Performance and Cost Optimization:**
- Optimize BigQuery queries to reduce costs and improve performance.
- Tune and optimize Spark jobs on Dataproc.
- Monitor and optimize GCP resource usage to control costs.
- **Data Security and Governance:**
- Implement and ensure data security in transit and at rest.
- Define and enforce IAM policies to control access to data and resources.
- Ensure compliance with data governance policies.
- **Monitoring and Support:**
- Troubleshoot performance and functional issues in data pipelines and GCP resources.
- **Documentation:**
- Document architecture, data pipelines, data models and operational procedures.
- **Communication:**
- Communicate effectively with team members, stakeholders and other business areas.
- Ensure clear communication between architectural definitions and software components, supporting the evolution and quality of the team's deliverables.
- **Jira / Agile Methodologies:**
- Familiarity with agile methodologies and rituals, with proficiency in the Jira tool.
Requirements
- **Google Cloud Platform (GCP):**
- **BigQuery:** Deep knowledge of data modeling, query optimization, partitioning, clustering, data loading (streaming and batch), security and data governance.
- **Cloud Storage:** Experience managing buckets, storage classes, lifecycle policies, access control (IAM) and data security.
- **Dataproc:** Skill in provisioning, configuring and managing Spark/Hadoop clusters, job optimization and integration with other GCP services.
- **Dataflow/Composer/DBT:** Knowledge of orchestration and data processing tools for ELT/ETL pipelines.
- **Cloud IAM (Identity and Access Management):** Experience implementing security policies and granular access control.
- **VPC, Networking and Security:** Understanding of networks, subnets, firewall rules and cloud security best practices.
- **Programming Languages:**
- **Python and PySpark:** Essential for automation scripts, data pipeline development and integration with GCP APIs.
- **SQL (advanced):** For BigQuery, DBT and data transformations.
- **Shell Scripting:** For task automation.
- **Version Control:**
- Git / GitHub / Bitbucket.
Benefits
- 🏥 Porto Seguro health insurance
- 🦷 Porto Seguro dental plan
- 💰 Profit Sharing (PLR)
- 👶 Childcare assistance
- 🍽️ Alelo food and meal vouchers
- 💻 Home office allowance
- 📚 Partnerships with educational institutions
- 🚀 Support for certifications, including cloud certifications
- 🎁 Livelo points
- 🏋️♂️ TotalPass
- 🧘♂️ Mindself
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
data modelingquery optimizationpartitioningclusteringdata loadingdata transformationdata validationperformance optimizationcost optimizationdata security
Soft Skills
communicationtroubleshootingdocumentation