
Senior Data Engineer – GCP
Leega
full-time
Posted on:
Location Type: Remote
Location: Brazil
Visit company websiteExplore more
Job Level
About the role
- **Analysis and Pipeline Load Planning:**
- Assess data warehouse architecture and requirements.
- **Map data, transformations and processes across GCP services (Cloud Storage, BigQuery, Dataproc).**
- Define data migration strategy (full load, incremental, CDC).
- Develop a data architecture plan on GCP.
- **Design and Data Modeling on GCP:**
- Design table schemas in BigQuery, considering performance, cost and scalability.
- Define partitioning and clustering strategies for BigQuery.
- Model data zones in Cloud Storage (Bronze, Silver and Gold).
- **ELT/ETL Pipeline Development:**
- Create data transformation routines using Dataproc (Spark) or Dataflow to load data into BigQuery.
- Translate business logic and existing transformations into GCP.
- Implement data validation and quality mechanisms.
- **Infrastructure Provisioning and Management:**
- Use IaC tools (Terraform) to provision and manage GCP resources (BigQuery datasets/tables, Cloud Storage buckets, Dataproc clusters).
- Configure and optimize Dataproc clusters for different workloads.
- Manage networking, security (IAM) and access on GCP.
- **Performance and Cost Optimization:**
- Optimize BigQuery queries to reduce cost and improve performance.
- Tune and optimize Spark jobs on Dataproc.
- Monitor and optimize GCP resource usage to control costs.
- **Data Security and Governance:**
- Implement and ensure data security in transit and at rest.
- Define and enforce IAM policies to control access to data and resources.
- Ensure compliance with data governance policies.
- **Monitoring and Support:**
- Troubleshoot performance and functionality issues for data pipelines and GCP resources.
- **Documentation:**
- Document architecture, data pipelines, data models and operational procedures.
- **Communication:**
- Communicate effectively with team members, stakeholders and other company areas.
- Ensure clear communication between architecture definitions and software components, and the evolution and quality of the team’s deliverables.
- **Jira / Agile Methodologies:**
- Be familiar with agile methodologies, their ceremonies and proficient with Jira.
Requirements
- **Google Cloud Platform (GCP):**
- **BigQuery:** Deep knowledge of data modeling, query optimization, partitioning, clustering, data loading (streaming and batch), security and data governance.
- **Cloud Storage:** Experience managing buckets, storage classes, lifecycle policies, access control (IAM) and data security.
- **Dataproc:** Skill in provisioning, configuring and managing Spark/Hadoop clusters, job optimization, and integration with other GCP services.
- **Dataflow/Composer/DBT:** Familiarity with orchestration and data processing tools for ELT/ETL pipelines.
- **Cloud IAM (Identity and Access Management):** Implementing security policies and granular access control.
- **VPC, Networking and Security:** Understanding of networks, subnets, firewall rules and cloud security best practices.
- **Programming Languages:**
- **Python and PySpark:** Essential for automation scripts, data pipeline development and integration with GCP APIs.
- **Advanced SQL:** For BigQuery, DBT and data transformations.
- **Shell Scripting:** For task automation.
- **Version Control:**
- Git/GitHub/Bitbucket.
- **100% remote work**
- **Knowledge of DBT is a plus**
Benefits
- 🏥 Health plan (Porto Seguro)
- 🦷 Dental plan (Porto Seguro)
- 💰 Profit Sharing (PLR)
- 👶 Childcare allowance
- 🍽️ Food and Meal Vouchers (Alelo)
- 💻 Home Office allowance
- 📚 Partnerships with educational institutions
- 🚀 Support for certifications, including cloud
- 🎁 Livelo points
- 🏋️♂️ TotalPass
- 🧘♂️ Mindself
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
data modelingquery optimizationpartitioningclusteringdata loadingsecuritydata governancePythonPySparkAdvanced SQL
Soft Skills
communicationtroubleshootingdocumentationteam collaborationstakeholder engagement