Brillio

Lead Data Engineer

Brillio

full-time

Posted on:

Location: California • 🇺🇸 United States

Visit company website
AI Apply
Apply

Salary

💰 $120,000 - $130,000 per year

Job Level

Senior

Tech Stack

AWSCloudDistributed SystemsEC2ETLPySparkPythonSparkSQLTableau

About the role

  • Design, build, and maintain scalable data pipelines to collect, process, and store from multiple datasets.
  • Optimize data storage solutions for better performance, scalability, and cost-efficiency.
  • Develop and manage ETL/ELT processes to transform data as per schema definitions and make it available for downstream jobs and other teams.
  • Collaborate with cross-functional teams to understand product functionality and capture evolving data requirements.
  • Engage stakeholders to gather requirements and create curated datasets for downstream consumption and end-user reporting.
  • Automate deployment and CI/CD processes using GitHub workflows to reduce manual work.
  • Ensure compliance with data governance, privacy regulations, and security protocols.
  • Use AWS and Databricks for data processing and S3 storage.
  • Work with distributed systems and big data technologies (Python, PySpark, Spark, Advanced SQL, Delta Lake).
  • Integrate with SFTP to securely transfer data from Databricks to remote locations.
  • Analyze Spark query execution plans and fine-tune queries for performance.
  • Troubleshoot and solve problems in large-scale distributed systems.
  • Contribute to analytics and insights projects based on big data.

Requirements

  • Athena, Step Functions, Spark - Pyspark, ETL Fundamentals, SQL (Basic + Advanced), Glue, Python, Lambda, Data Warehousing, EBS /EFS, AWS EC2, Lake Formation, Aurora, S3, Modern Data Platform Fundamentals, PLSQL, Data Modelling Fundamentals, Cloud front
  • Remote- (need to work in PST time)
  • Design, build, and maintain scalable data pipelines to collect, process, and store from multiple datasets.
  • Optimize data storage solutions for better performance, scalability, and cost-efficiency.
  • Develop and manage ETL/ELT processes to transform data as per schema definitions, apply slicing and dicing, and make it available for downstream jobs and other teams.
  • Collaborate closely with cross-functional teams to understand system and product functionalities, pace up feature development, and capture evolving data requirements.
  • Engage with stakeholders to gather requirements and create curated datasets for downstream consumption and end-user reporting.
  • Automate deployment and CI/CD processes using GitHub workflows, identifying areas to reduce manual, repetitive work.
  • Ensure compliance with data governance policies, privacy regulations, and security protocols.
  • Utilize cloud platforms like AWS and work on Databricks for data processing with S3 Storage.
  • Work with distributed systems and big data technologies such as Python, Pyspark, Spark, Advanced SQL, and Delta Lake.
  • Integrate with SFTP to push data securely from Databricks to remote locations.
  • Analyze and interpret spark query execution plans to fine-tune queries for faster and more efficient processing.
  • Strong problem-solving and troubleshooting skills in large-scale distributed systems.
  • Have couple of projects having work exposure on Analytics and Insights based on Big Data.
  • Exposure on building Data sets on complex Big Data would be an advantage.
  • BE / B Tech in Engineering.
  • Skill set: SQL, Python, AWS, Databricks. Tableau exposure would be optional, good to have.