Tide

Lead Data Engineer, PySpark

Tide

full-time

Posted on:

Location Type: Hybrid

Location: Hyderabad • 🇮🇳 India

Visit company website
AI Apply
Apply

Job Level

Senior

Tech Stack

ApacheETLPySparkPythonSparkSQL

About the role

  • Design, develop, and optimize next-generation data pipelines and data platforms using PySpark
  • Work with large-scale datasets and solve complex data challenges
  • Feature development, data quality checks, and deploying/integrating ML models with backend services and the Tide platform
  • Identify, diagnose, and resolve complex performance bottlenecks in PySpark jobs and Spark clusters using Spark UI and query plans
  • Lead design and implementation of scalable, fault-tolerant ETL/ELT pipelines for batch and potential real-time processing
  • Collaborate with data scientists, analysts, and product teams to design efficient data models (star/snowflake schemas, SCDs)
  • Implement data quality checks, monitoring, and alerting to ensure accuracy and reliability of data assets
  • Contribute to data architecture strategy and evaluate new technologies and best practices
  • Promote engineering best practices: code quality, testing, documentation, version control; participate in code reviews
  • Mentor junior data engineers and foster continuous learning
  • Work closely with cross-functional teams including software engineers, data scientists, product managers, and business stakeholders

Requirements

  • 8+ years of professional experience in data engineering
  • At least 4+ years specifically focused on PySpark development and optimization in a production environment
  • Expert-level proficiency in PySpark including Spark SQL, DataFrames, RDDs, and understanding of Spark architecture (Driver, Executors, Cluster Manager, DAG)
  • Strong hands-on experience with optimizing PySpark performance on large datasets, debugging slow jobs using Spark UI, and addressing data skew, shuffles, and memory management
  • Excellent programming skills in Python
  • Proficiency in SQL for complex data manipulation, aggregation, and querying
  • Basic understanding of data warehousing concepts (dimensional modeling, ETL/ELT processes, data lakes, data marts)
  • Experience with distributed data storage solutions such as Delta Lake and Apache Parquet
  • Familiarity with version control systems (Git)
  • Strong problem-solving abilities, analytical skills, and attention to detail
  • Excellent communication and interpersonal skills
  • Bachelor's or Master's degree in Computer Science, Engineering, or a related quantitative field
Benefits
  • Make work, work for you! Flexible Working Out of Office (WOO) policy: work remotely from home or anywhere in your assigned Indian state
  • Ability to work from a different country or Indian state for 90 days of the year
  • Competitive salary
  • Self & Family Health Insurance
  • Term & Life Insurance
  • OPD Benefits
  • Mental wellbeing through Plumm
  • Learning & Development Budget
  • WFH Setup allowance
  • 15 days of Privilege leaves
  • 12 days of Casual leaves
  • 12 days of Sick leaves
  • 3 paid days off for volunteering or L&D activities
  • Stock Options

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
PySparkSpark SQLDataFramesRDDsPythonSQLETLELTdata warehousingdata modeling
Soft skills
problem-solvinganalytical skillsattention to detailcommunicationinterpersonal skillsmentoringcollaborationleadershipcontinuous learningcode review
Certifications
Bachelor's degree in Computer ScienceMaster's degree in Computer ScienceBachelor's degree in EngineeringMaster's degree in Engineering
Tide

Lead Data Engineer, Snowflake/DBT

Tide
Seniorfull-time🇮🇳 India
Posted: 2 hours agoSource: boards.greenhouse.io
AirflowApacheAWSETLPythonSQL