
Explore more
About the role
- Design, develop, and maintain ETL pipelines using PySpark, Apache Airflow, and Azure Data Factory (ADF)
- Build and optimize distributed data processing jobs using PySpark
- Orchestrate and schedule workflows using Apache Airflow
- Develop and manage data ingestion and transformation pipelines in Azure Data Factory
- Write clean, efficient, and reusable code using Python
- Develop and optimize complex SQL queries for MySQL and PostgreSQL databases
- Work with MongoDB for handling semi-structured and unstructured data
- Perform data analysis using Pandas and NumPy to support business insights
- Create basic to intermediate data visualizations using Matplotlib, Power BI, and Streamlit
- Monitor data pipelines, troubleshoot issues, and ensure data quality and performance
- Collaborate with cross-functional teams including analysts, data scientists, and product teams
Requirements
- 2+ years of experience in building and maintaining scalable data solutions
- Proficiency in Python
- Strong experience with MySQL and PostgreSQL
- Hands-on exposure to MongoDB
- Experience building ETL/ELT pipelines
- Experience using Apache Airflow for workflow orchestration
- Experience with Azure Data Factory (ADF)
- Hands-on experience with Pandas and NumPy
- Ability to create visualizations using Matplotlib
- Familiarity with Git and version control best practices
- Basic understanding of data warehousing concepts
- Exposure to cloud-based deployments and CI/CD pipelines
Benefits
- Remote work
- Health insurance
- Professional development opportunities
- Paid time off
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PySparkApache AirflowAzure Data FactoryPythonSQLMySQLPostgreSQLMongoDBPandasNumPy
Soft Skills
collaborationtroubleshootingdata quality assurance