
Lead Data Engineer
Brillio
full-time
Posted on:
Location: California • 🇺🇸 United States
Visit company websiteSalary
💰 $120,000 - $130,000 per year
Job Level
Senior
Tech Stack
AWSCloudDistributed SystemsEC2ETLPySparkPythonSparkSQLTableau
About the role
- Design, build, and maintain scalable data pipelines to collect, process, and store from multiple datasets.
- Optimize data storage solutions for better performance, scalability, and cost-efficiency.
- Develop and manage ETL/ELT processes to transform data as per schema definitions and make it available for downstream jobs and other teams.
- Collaborate with cross-functional teams to understand product functionality and capture evolving data requirements.
- Engage stakeholders to gather requirements and create curated datasets for downstream consumption and end-user reporting.
- Automate deployment and CI/CD processes using GitHub workflows to reduce manual work.
- Ensure compliance with data governance, privacy regulations, and security protocols.
- Use AWS and Databricks for data processing and S3 storage.
- Work with distributed systems and big data technologies (Python, PySpark, Spark, Advanced SQL, Delta Lake).
- Integrate with SFTP to securely transfer data from Databricks to remote locations.
- Analyze Spark query execution plans and fine-tune queries for performance.
- Troubleshoot and solve problems in large-scale distributed systems.
- Contribute to analytics and insights projects based on big data.
Requirements
- Athena, Step Functions, Spark - Pyspark, ETL Fundamentals, SQL (Basic + Advanced), Glue, Python, Lambda, Data Warehousing, EBS /EFS, AWS EC2, Lake Formation, Aurora, S3, Modern Data Platform Fundamentals, PLSQL, Data Modelling Fundamentals, Cloud front
- Remote- (need to work in PST time)
- Design, build, and maintain scalable data pipelines to collect, process, and store from multiple datasets.
- Optimize data storage solutions for better performance, scalability, and cost-efficiency.
- Develop and manage ETL/ELT processes to transform data as per schema definitions, apply slicing and dicing, and make it available for downstream jobs and other teams.
- Collaborate closely with cross-functional teams to understand system and product functionalities, pace up feature development, and capture evolving data requirements.
- Engage with stakeholders to gather requirements and create curated datasets for downstream consumption and end-user reporting.
- Automate deployment and CI/CD processes using GitHub workflows, identifying areas to reduce manual, repetitive work.
- Ensure compliance with data governance policies, privacy regulations, and security protocols.
- Utilize cloud platforms like AWS and work on Databricks for data processing with S3 Storage.
- Work with distributed systems and big data technologies such as Python, Pyspark, Spark, Advanced SQL, and Delta Lake.
- Integrate with SFTP to push data securely from Databricks to remote locations.
- Analyze and interpret spark query execution plans to fine-tune queries for faster and more efficient processing.
- Strong problem-solving and troubleshooting skills in large-scale distributed systems.
- Have couple of projects having work exposure on Analytics and Insights based on Big Data.
- Exposure on building Data sets on complex Big Data would be an advantage.
- BE / B Tech in Engineering.
- Skill set: SQL, Python, AWS, Databricks. Tableau exposure would be optional, good to have.