Apply

Ready to go for it?

AI Apply speeds things up—apply directly if you prefer.

FREE ACCESS
5,000–10,000 jobs/day
JobTailor Logo

See all jobs on JobTailor

Search thousands of fresh jobs every day.

Discover
  • Fresh listings
  • Fast filters
  • No subscription required
Create a free account and start exploring right away.
Leega

Senior Data Engineer

Leega

Architect and evolve the datalake system for dynamic pricing and machine learning at Leega. Ensure data governance, quality, and responsiveness in a multi-tenant Lakehouse architecture.

Posted 6/10/2026full-timeRemote • 🇧🇷 BrazilSeniorWebsite

Tech Stack

Tools & technologies
AirflowApacheJavaScriptKafkaPySparkPythonSQL

About the role

Key responsibilities & impact
  • You will architect and evolve the datalake that is the company's data nervous system — the foundation that feeds, in real time, the dynamic pricing engine, ML models, and the group's business intelligence.
  • This is an ownership role: you define the multi-tenant Lakehouse architecture, from streaming to the semantic layer, and are responsible for its reliability, governance, and cost.
  • Design and evolve the data lake on Apache Iceberg over S3 — well-defined layers, partitioning and compaction, time-travel and support for DELETE/UPDATE for LGPD (Brazilian data protection law).
  • Build real-time ingestion (Kafka, Flink, CDC with Debezium) with controlled schema evolution (Schema Registry) and delivery guarantees.
  • Model the transformation layer in dbt and orchestrate batch and quality flows in Airflow, from crawler to backfill.
  • Maintain metric definitions in Cube.js — the single source that feeds BI and AI agents and ensures consistency across the company.
  • Operate federated and low-latency OLAP queries over the lake, with cost and access isolation by tenant and performant queries.
  • Ensure data testing, lineage and cost efficiency, keeping the platform reliable as it scales.

Requirements

What you’ll need
  • Strong command of SQL and query optimization in distributed environments (Minimum 5 years).
  • Python with solid experience in PySpark or distributed processing.
  • Orchestration (Airflow), ELT and dbt applied at scale (Minimum 4 years).
  • Streaming (Kafka, Flink) and Lakehouse architectures with Apache Iceberg (Minimum 3 years).
  • Strong understanding of data governance, quality, and modeling.
  • Comfortable with AI-assisted development (e.g., Claude Code).
  • CDC (Debezium) and low-latency OLAP (ClickHouse, Pinot, Trino/Athena).
  • Semantic layers (Cube.js, dbt) and Data Mesh architectures.
  • Governance and catalog tools (OpenMetadata, Lake Formation).
  • Vector databases (Qdrant) and data pipelines for ML.

Benefits

Comp & perks
  • Remote work
  • Project duration: 6 months, with possibility of extension or conversion to permanent employment.

ATS Keywords

✓ Tailor your resume
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
SQLPythonPySparkApache IcebergKafkaFlinkdbtAirflowCDCOLAP
Soft Skills
data governancedata qualitydata modelingownershipreliabilitycost efficiency