Leega

Senior Data Engineer – GCP

Leega

full-time

Posted on:

Location Type: Remote

Location: Brazil

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • **Analysis and Pipeline Load Planning:**
  • Assess data warehouse architecture and requirements.
  • **Map data, transformations and processes across GCP services (Cloud Storage, BigQuery, Dataproc).**
  • Define data migration strategy (full load, incremental, CDC).
  • Develop a data architecture plan on GCP.
  • **Design and Data Modeling on GCP:**
  • Design table schemas in BigQuery, considering performance, cost and scalability.
  • Define partitioning and clustering strategies for BigQuery.
  • Model data zones in Cloud Storage (Bronze, Silver and Gold).
  • **ELT/ETL Pipeline Development:**
  • Create data transformation routines using Dataproc (Spark) or Dataflow to load data into BigQuery.
  • Translate business logic and existing transformations into GCP.
  • Implement data validation and quality mechanisms.
  • **Infrastructure Provisioning and Management:**
  • Use IaC tools (Terraform) to provision and manage GCP resources (BigQuery datasets/tables, Cloud Storage buckets, Dataproc clusters).
  • Configure and optimize Dataproc clusters for different workloads.
  • Manage networking, security (IAM) and access on GCP.
  • **Performance and Cost Optimization:**
  • Optimize BigQuery queries to reduce cost and improve performance.
  • Tune and optimize Spark jobs on Dataproc.
  • Monitor and optimize GCP resource usage to control costs.
  • **Data Security and Governance:**
  • Implement and ensure data security in transit and at rest.
  • Define and enforce IAM policies to control access to data and resources.
  • Ensure compliance with data governance policies.
  • **Monitoring and Support:**
  • Troubleshoot performance and functionality issues for data pipelines and GCP resources.
  • **Documentation:**
  • Document architecture, data pipelines, data models and operational procedures.
  • **Communication:**
  • Communicate effectively with team members, stakeholders and other company areas.
  • Ensure clear communication between architecture definitions and software components, and the evolution and quality of the team’s deliverables.
  • **Jira / Agile Methodologies:**
  • Be familiar with agile methodologies, their ceremonies and proficient with Jira.

Requirements

  • **Google Cloud Platform (GCP):**
  • **BigQuery:** Deep knowledge of data modeling, query optimization, partitioning, clustering, data loading (streaming and batch), security and data governance.
  • **Cloud Storage:** Experience managing buckets, storage classes, lifecycle policies, access control (IAM) and data security.
  • **Dataproc:** Skill in provisioning, configuring and managing Spark/Hadoop clusters, job optimization, and integration with other GCP services.
  • **Dataflow/Composer/DBT:** Familiarity with orchestration and data processing tools for ELT/ETL pipelines.
  • **Cloud IAM (Identity and Access Management):** Implementing security policies and granular access control.
  • **VPC, Networking and Security:** Understanding of networks, subnets, firewall rules and cloud security best practices.
  • **Programming Languages:**
  • **Python and PySpark:** Essential for automation scripts, data pipeline development and integration with GCP APIs.
  • **Advanced SQL:** For BigQuery, DBT and data transformations.
  • **Shell Scripting:** For task automation.
  • **Version Control:**
  • Git/GitHub/Bitbucket.
  • **100% remote work**
  • **Knowledge of DBT is a plus**
Benefits
  • 🏥 Health plan (Porto Seguro)
  • 🦷 Dental plan (Porto Seguro)
  • 💰 Profit Sharing (PLR)
  • 👶 Childcare allowance
  • 🍽️ Food and Meal Vouchers (Alelo)
  • 💻 Home Office allowance
  • 📚 Partnerships with educational institutions
  • 🚀 Support for certifications, including cloud
  • 🎁 Livelo points
  • 🏋️‍♂️ TotalPass
  • 🧘‍♂️ Mindself
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
data modelingquery optimizationpartitioningclusteringdata loadingsecuritydata governancePythonPySparkAdvanced SQL
Soft Skills
communicationtroubleshootingdocumentationteam collaborationstakeholder engagement