Cortex

Senior Data Engineer – Crawlers & Orchestration Specialist

Cortex

full-time

Posted on:

Location Type: Remote

Location: Brazil

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Develop and maintain high-performance, resilient crawlers/bots for large-scale data extraction.
  • Design and implement complex data pipelines using Databricks (Spark) for batch and streaming processing.
  • Ensure the health and reliability of data flows using advanced orchestration tools.
  • Manage and optimize resources within the AWS ecosystem to ensure scalability and cost efficiency.
  • Implement error-handling techniques, block-workarounds (proxies, captchas) and data quality validation for collected data.

Requirements

  • Deep proficiency in Python (focused on scraping libraries such as Scrapy, Playwright, Selenium, or Beautiful Soup).
  • Solid experience with Databricks and Apache Spark (PySpark).
  • Experience with services such as S3, Lambda, Glue, Athena, EC2, and EKS.
  • Advanced knowledge of orchestration tools such as Airflow, Dagster, or Prefect.
  • Experience with SQL and NoSQL databases, and an understanding of Data Lakehouses (Delta Lake).
  • Familiarity with Docker, Kubernetes, and CI/CD pipelines.
Benefits
  • Meal and food vouchers (Vale Refeição and Vale Alimentação).
  • Gympass/TotalPass.
  • Home-office allowance.
  • Health plan and Dental plan (dental optional).
  • Childcare assistance (up to the child’s 6th birthday).
  • Extended maternity, paternity, and adoptive leave (#todasasfamíliasimportam / #allfamiliesmatter).
  • Life insurance.
  • Birthday Day Off (one day off to take on your birthday or during your birthday month).
  • Family Day (one day off for parents to take between May and August and enjoy as they wish).
  • Mental Break (one full week off in December to rest and recharge).
  • *Benefits according to current policy*
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PythonScrapyPlaywrightSeleniumBeautiful SoupDatabricksApache SparkSQLNoSQLData Lakehouses