Tech Stack
CloudGoogle Cloud PlatformHadoopPySparkPythonShell ScriptingSQLUnix
About the role
- Develop and maintain PySpark data pipelines and transformations
- Perform data cleaning, feature engineering, and build statistical/ML models
- Write SQL for data analytics and data transformations
- Work on Unix-based platforms including shell scripting and scheduling cron jobs
- Work within big data ecosystem tools such as Hadoop and Hive and use GitHub for version control
- Use GCP for cloud-based data solutions, data modelling, and data quality assessment and control
- Collaborate on projects involving banking and financial services data and adapt to changing technical environments
Requirements
- 5+ years of experience in Data Analytics role with at least 2 years of development experience in PySpark
- Very strong in SQL
- Expertise in programming with Python with experience in Data Cleaning, Feature Engineering, Transformation and building statistical/ML models
- Experience working on Unix based platforms with basic knowledge of shell scripting, writing Cron jobs etc.
- Knowledge of big data ecosystem with knowledge of Hadoop, Hive & GitHub version management
- Knowledge of Cloud Computing (GCP), Data Modelling, exposure to Data Quality assessment and control
- Exposure to working on data pertaining to banking and financial services domain
- Highly adaptable in quickly changing technical environments with strong organizational and analytical skills