
Senior Data Engineer – Crawlers & Orchestration Specialist
Cortex
full-time
Posted on:
Location Type: Remote
Location: Brazil
Visit company websiteExplore more
Job Level
About the role
- Develop and maintain high-performance, resilient crawlers/bots for large-scale data extraction.
- Design and implement complex data pipelines using Databricks (Spark) for batch and streaming processing.
- Ensure the health and reliability of data flows using advanced orchestration tools.
- Manage and optimize resources within the AWS ecosystem to ensure scalability and cost efficiency.
- Implement error-handling techniques, block-workarounds (proxies, captchas) and data quality validation for collected data.
Requirements
- Deep proficiency in Python (focused on scraping libraries such as Scrapy, Playwright, Selenium, or Beautiful Soup).
- Solid experience with Databricks and Apache Spark (PySpark).
- Experience with services such as S3, Lambda, Glue, Athena, EC2, and EKS.
- Advanced knowledge of orchestration tools such as Airflow, Dagster, or Prefect.
- Experience with SQL and NoSQL databases, and an understanding of Data Lakehouses (Delta Lake).
- Familiarity with Docker, Kubernetes, and CI/CD pipelines.
Benefits
- Meal and food vouchers (Vale Refeição and Vale Alimentação).
- Gympass/TotalPass.
- Home-office allowance.
- Health plan and Dental plan (dental optional).
- Childcare assistance (up to the child’s 6th birthday).
- Extended maternity, paternity, and adoptive leave (#todasasfamíliasimportam / #allfamiliesmatter).
- Life insurance.
- Birthday Day Off (one day off to take on your birthday or during your birthday month).
- Family Day (one day off for parents to take between May and August and enjoy as they wish).
- Mental Break (one full week off in December to rest and recharge).
- *Benefits according to current policy*
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonScrapyPlaywrightSeleniumBeautiful SoupDatabricksApache SparkSQLNoSQLData Lakehouses