Salary
💰 $120,000 - $145,000 per year
Tech Stack
ApacheAWSAzureCloudETLPySparkPythonScalaSparkSQL
About the role
- Lead design, development, and optimization of scalable data pipelines using Databricks on Azure
- Build and optimize distributed data processing jobs using Apache Spark on Databricks; implement Delta Lake, DLT pipelines, and Medallion architecture
- Design and automate ETL pipelines using Azure Data Factory, Databricks, and Synapse Analytics; integrate data from Salesforce, Workday, Duckcreek, and external APIs
- Develop dimensional models (Star/Snowflake schemas), stored procedures, and views for data warehouses; ensure efficient querying and transformation using SQL, T-SQL, and PySpark
- Leverage Azure DevOps, CI/CD pipelines, and GitHub for version control and deployment; utilize Azure Logic Apps and ML Flow for workflow automation and model training
- Implement RBAC, data encryption, and auditing; ensure compliance with enterprise data governance policies
- Collaborate with data scientists, analysts, and business stakeholders; mentor junior engineers; contribute to code reviews and architectural decisions
Requirements
- Bachelor's or Master’s degree in Computer Science, Engineering, or related field
- 60 months of experience in data engineering (5+ years) with at least 2 years on Databricks
- Proficiency in Python, Scala, SQL, and Spark
- Hands-on experience with Azure Data Services (ADF, ADLS, Synapse)
- Strong understanding of ETL, data warehousing, and data modeling concepts
- Experience with Power BI, including DAX and advanced visualizations
- Familiarity with MLflow, LangChain, and LLM integration is a plus