Tech Stack
AirflowApacheAWSAzureCloudETLGoogle Cloud PlatformIoTKafkaMongoDBPySparkPythonScalaSparkSQL
About the role
- Develop ETL/ELT pipelines in Databricks (PySpark notebooks) or Snowflake (SQL/Snowpark), ingesting from sources like Confluent Kafka
- Handle data storage optimizations using Delta Lake/Iceberg formats, ensuring reliability (e.g., time travel for auditing in fintech pipelines)
- Integrate with Azure ecosystems (e.g., Fabric for warehousing, Event Hubs for streaming) and support BI/ML teams (prepare features for demand forecasting models)
- Contribute to real-world use cases such as dashboards for healthcare outcomes or optimizing logistics routes with aggregated IoT data
- Write clean, maintainable code in Python or Scala
- Collaborate with analysts, engineers, and product teams to translate data needs into scalable solutions
- Ensure data quality, reliability, and observability across the pipelines
Requirements
- 3–6 years of hands-on experience in data engineering
- Experience with Databricks / Apache Spark for large-scale data processing
- Familiarity with Kafka, Kafka Connect, and streaming data use cases
- Proficiency in Snowflake — including ELT design, performance tuning, and query optimization
- Exposure to MongoDB and working with flexible document-based schemas
- Strong programming skills in Python or Scala
- Comfort with CI/CD pipelines, data testing, and monitoring tools
- Good to have: Experience with Airflow, dbt, or similar orchestration tools
- Good to have: Worked on cloud-native stacks (AWS, GCP, or Azure)
- Good to have: Contributed to data governance and access control practices