Tech Stack
Amazon RedshiftAWSCloudETLGrafanaMySQLPostgresPySparkPythonSQL
About the role
- Participate in project architecture and design creation, code review, CI/CD
- Develop, optimize and maintain ETL/ ELT processes to provide data quality and consistency
- Collaborate with data analysis and product development teams to ensure data pipelines are efficient, reliable, and scalable
- Data integration and delivery
- Test new approaches by creating PoC, performing benchmarks
- Monitor and troubleshoot data processing systems to ensure they are functioning properly
Requirements
- Proven expertise as a Data engineer or Python developer
- Experience in/ with: AWS Cloud Infrastructure (Redshift + RsSpectrum, S3, MWAA, RDS, Kinesis, SQS, Lambda, Glue)
- AWS Redshift administration and optimization
- automation, using Python + SQL
- streaming and micro - batching ETL
- DBT
- GitLab administration
- designing and creating API
- creating design solutions and large-scale architecture, balancing between various types of constraint
- third parties integration (marketing networks, payment systems, various other platforms)
- database administration (PostgreSQL, MySQL)
- data transformation ( aggregation, enrichment, filtration)
- developing structured data marts from raw data
- ETL problem-solving
- Understanding of queueing mechanisms
- CI/CD knowledge — Gitlab CI
- Monitoring/logging knowledge (Grafana, Graylog, Data dog)