Lead Data Engineer, PySpark

Tide

full-time

Posted on: 9/29/2025

Location Type: Hybrid

Location: Hyderabad • 🇮🇳 India

✨ AI Apply

Senior

ApacheETLPySparkPythonSparkSQL

About the role

Design, develop, and optimize next-generation data pipelines and data platforms using PySpark
Work with large-scale datasets and solve complex data challenges
Feature development, data quality checks, and deploying/integrating ML models with backend services and the Tide platform
Identify, diagnose, and resolve complex performance bottlenecks in PySpark jobs and Spark clusters using Spark UI and query plans
Lead design and implementation of scalable, fault-tolerant ETL/ELT pipelines for batch and potential real-time processing
Collaborate with data scientists, analysts, and product teams to design efficient data models (star/snowflake schemas, SCDs)
Implement data quality checks, monitoring, and alerting to ensure accuracy and reliability of data assets
Contribute to data architecture strategy and evaluate new technologies and best practices
Promote engineering best practices: code quality, testing, documentation, version control; participate in code reviews
Mentor junior data engineers and foster continuous learning
Work closely with cross-functional teams including software engineers, data scientists, product managers, and business stakeholders

8+ years of professional experience in data engineering
At least 4+ years specifically focused on PySpark development and optimization in a production environment
Expert-level proficiency in PySpark including Spark SQL, DataFrames, RDDs, and understanding of Spark architecture (Driver, Executors, Cluster Manager, DAG)
Strong hands-on experience with optimizing PySpark performance on large datasets, debugging slow jobs using Spark UI, and addressing data skew, shuffles, and memory management
Excellent programming skills in Python
Proficiency in SQL for complex data manipulation, aggregation, and querying
Basic understanding of data warehousing concepts (dimensional modeling, ETL/ELT processes, data lakes, data marts)
Experience with distributed data storage solutions such as Delta Lake and Apache Parquet
Familiarity with version control systems (Git)
Strong problem-solving abilities, analytical skills, and attention to detail
Excellent communication and interpersonal skills
Bachelor's or Master's degree in Computer Science, Engineering, or a related quantitative field

Benefits

Make work, work for you! Flexible Working Out of Office (WOO) policy: work remotely from home or anywhere in your assigned Indian state
Ability to work from a different country or Indian state for 90 days of the year
Competitive salary
Self & Family Health Insurance
Term & Life Insurance
OPD Benefits
Mental wellbeing through Plumm
Learning & Development Budget
WFH Setup allowance
15 days of Privilege leaves
12 days of Casual leaves
12 days of Sick leaves
3 paid days off for volunteering or L&D activities
Stock Options

Tip: use these terms in your resume and cover letter to boost ATS matches.

PySparkSpark SQLDataFramesRDDsPythonSQLETLELTdata warehousingdata modeling

problem-solvinganalytical skillsattention to detailcommunicationinterpersonal skillsmentoringcollaborationleadershipcontinuous learningcode review

Bachelor's degree in Computer ScienceMaster's degree in Computer ScienceBachelor's degree in EngineeringMaster's degree in Engineering

Posted: 2 hours agoSource: boards.greenhouse.io