FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesAirflowApacheJavaScriptKafkaPySparkPythonSQL
About the role
Key responsibilities & impact- You will architect and evolve the datalake that is the company's data nervous system — the foundation that feeds, in real time, the dynamic pricing engine, ML models, and the group's business intelligence.
- This is an ownership role: you define the multi-tenant Lakehouse architecture, from streaming to the semantic layer, and are responsible for its reliability, governance, and cost.
- Design and evolve the data lake on Apache Iceberg over S3 — well-defined layers, partitioning and compaction, time-travel and support for DELETE/UPDATE for LGPD (Brazilian data protection law).
- Build real-time ingestion (Kafka, Flink, CDC with Debezium) with controlled schema evolution (Schema Registry) and delivery guarantees.
- Model the transformation layer in dbt and orchestrate batch and quality flows in Airflow, from crawler to backfill.
- Maintain metric definitions in Cube.js — the single source that feeds BI and AI agents and ensures consistency across the company.
- Operate federated and low-latency OLAP queries over the lake, with cost and access isolation by tenant and performant queries.
- Ensure data testing, lineage and cost efficiency, keeping the platform reliable as it scales.
Requirements
What you’ll need- Strong command of SQL and query optimization in distributed environments (Minimum 5 years).
- Python with solid experience in PySpark or distributed processing.
- Orchestration (Airflow), ELT and dbt applied at scale (Minimum 4 years).
- Streaming (Kafka, Flink) and Lakehouse architectures with Apache Iceberg (Minimum 3 years).
- Strong understanding of data governance, quality, and modeling.
- Comfortable with AI-assisted development (e.g., Claude Code).
- CDC (Debezium) and low-latency OLAP (ClickHouse, Pinot, Trino/Athena).
- Semantic layers (Cube.js, dbt) and Data Mesh architectures.
- Governance and catalog tools (OpenMetadata, Lake Formation).
- Vector databases (Qdrant) and data pipelines for ML.
Benefits
Comp & perks- Remote work
- Project duration: 6 months, with possibility of extension or conversion to permanent employment.
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
SQLPythonPySparkApache IcebergKafkaFlinkdbtAirflowCDCOLAP
Soft Skills
data governancedata qualitydata modelingownershipreliabilitycost efficiency
