Lead Data Engineer

Brillio

full-time

Posted on: 9/24/2025

Location: California • 🇺🇸 United States

Visit company website

✨ AI Apply

Apply

Salary

💰 $120,000 - $130,000 per year

Job Level

Senior

Tech Stack

AWSCloudDistributed SystemsEC2ETLPySparkPythonSparkSQLTableau

About the role

Design, build, and maintain scalable data pipelines to collect, process, and store from multiple datasets.
Optimize data storage solutions for better performance, scalability, and cost-efficiency.
Develop and manage ETL/ELT processes to transform data as per schema definitions and make it available for downstream jobs and other teams.
Collaborate with cross-functional teams to understand product functionality and capture evolving data requirements.
Engage stakeholders to gather requirements and create curated datasets for downstream consumption and end-user reporting.
Automate deployment and CI/CD processes using GitHub workflows to reduce manual work.
Ensure compliance with data governance, privacy regulations, and security protocols.
Use AWS and Databricks for data processing and S3 storage.
Work with distributed systems and big data technologies (Python, PySpark, Spark, Advanced SQL, Delta Lake).
Integrate with SFTP to securely transfer data from Databricks to remote locations.
Analyze Spark query execution plans and fine-tune queries for performance.
Troubleshoot and solve problems in large-scale distributed systems.
Contribute to analytics and insights projects based on big data.

Requirements

Athena, Step Functions, Spark - Pyspark, ETL Fundamentals, SQL (Basic + Advanced), Glue, Python, Lambda, Data Warehousing, EBS /EFS, AWS EC2, Lake Formation, Aurora, S3, Modern Data Platform Fundamentals, PLSQL, Data Modelling Fundamentals, Cloud front
Remote- (need to work in PST time)
Design, build, and maintain scalable data pipelines to collect, process, and store from multiple datasets.
Optimize data storage solutions for better performance, scalability, and cost-efficiency.
Develop and manage ETL/ELT processes to transform data as per schema definitions, apply slicing and dicing, and make it available for downstream jobs and other teams.
Collaborate closely with cross-functional teams to understand system and product functionalities, pace up feature development, and capture evolving data requirements.
Engage with stakeholders to gather requirements and create curated datasets for downstream consumption and end-user reporting.
Automate deployment and CI/CD processes using GitHub workflows, identifying areas to reduce manual, repetitive work.
Ensure compliance with data governance policies, privacy regulations, and security protocols.
Utilize cloud platforms like AWS and work on Databricks for data processing with S3 Storage.
Work with distributed systems and big data technologies such as Python, Pyspark, Spark, Advanced SQL, and Delta Lake.
Integrate with SFTP to push data securely from Databricks to remote locations.
Analyze and interpret spark query execution plans to fine-tune queries for faster and more efficient processing.
Strong problem-solving and troubleshooting skills in large-scale distributed systems.
Have couple of projects having work exposure on Analytics and Insights based on Big Data.
Exposure on building Data sets on complex Big Data would be an advantage.
BE / B Tech in Engineering.
Skill set: SQL, Python, AWS, Databricks. Tableau exposure would be optional, good to have.