Leega

Mid-level Data Engineer – GCP

Leega

full-time

Posted on:

Location Type: Remote

Location: Brasil

Visit company website

Explore more

AI Apply
Apply

About the role

  • **Load/Pipeline Analysis and Planning:**
  • Assess the data warehouse architecture and requirements.
  • **Map data, transformations and processes across GCP services (Cloud Storage, BigQuery, Dataproc).**
  • Define the data migration strategy (full load, incremental, CDC).
  • Develop a data architecture plan on GCP.
  • **Design and Data Modeling on GCP:**
  • Design table schemas in BigQuery with consideration for performance, cost and scalability.
  • Define partitioning and clustering strategies for BigQuery.
  • Model data zones in Cloud Storage (Bronze, Silver and Gold).
  • **ELT/ETL Pipeline Development:**
  • Create data transformation routines using Dataproc (Spark) or Dataflow to load data into BigQuery.
  • Translate business logic and existing transformations into GCP.
  • Implement data validation and quality mechanisms.
  • **Performance and Cost Optimization:**
  • Optimize BigQuery queries to reduce costs and improve performance.
  • Tune and optimize Spark jobs on Dataproc.
  • Monitor and optimize GCP resource usage to control costs.
  • **Data Security and Governance:**
  • Implement and ensure data security in transit and at rest.
  • Define and enforce IAM policies to control access to data and resources.
  • Ensure compliance with data governance policies.
  • **Monitoring and Support:**
  • Troubleshoot performance and functional issues in data pipelines and GCP resources.
  • **Documentation:**
  • Document architecture, data pipelines, data models and operational procedures.
  • **Communication:**
  • Communicate effectively with team members, stakeholders and other business areas.
  • Ensure clear communication between architectural definitions and software components, supporting the evolution and quality of the team's deliverables.
  • **Jira / Agile Methodologies:**
  • Familiarity with agile methodologies and rituals, with proficiency in the Jira tool.

Requirements

  • **Google Cloud Platform (GCP):**
  • **BigQuery:** Deep knowledge of data modeling, query optimization, partitioning, clustering, data loading (streaming and batch), security and data governance.
  • **Cloud Storage:** Experience managing buckets, storage classes, lifecycle policies, access control (IAM) and data security.
  • **Dataproc:** Skill in provisioning, configuring and managing Spark/Hadoop clusters, job optimization and integration with other GCP services.
  • **Dataflow/Composer/DBT:** Knowledge of orchestration and data processing tools for ELT/ETL pipelines.
  • **Cloud IAM (Identity and Access Management):** Experience implementing security policies and granular access control.
  • **VPC, Networking and Security:** Understanding of networks, subnets, firewall rules and cloud security best practices.
  • **Programming Languages:**
  • **Python and PySpark:** Essential for automation scripts, data pipeline development and integration with GCP APIs.
  • **SQL (advanced):** For BigQuery, DBT and data transformations.
  • **Shell Scripting:** For task automation.
  • **Version Control:**
  • Git / GitHub / Bitbucket.
Benefits
  • 🏥 Porto Seguro health insurance
  • 🦷 Porto Seguro dental plan
  • 💰 Profit Sharing (PLR)
  • 👶 Childcare assistance
  • 🍽️ Alelo food and meal vouchers
  • 💻 Home office allowance
  • 📚 Partnerships with educational institutions
  • 🚀 Support for certifications, including cloud certifications
  • 🎁 Livelo points
  • 🏋️‍♂️ TotalPass
  • 🧘‍♂️ Mindself
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
data modelingquery optimizationpartitioningclusteringdata loadingdata transformationdata validationperformance optimizationcost optimizationdata security
Soft Skills
communicationtroubleshootingdocumentation