Tech Stack
AirflowAmazon RedshiftAWSCloudETLGoGrafanaPySparkPythonSQLTerraform
About the role
- Build and maintain ETL/ELT workflows using modern data tools and cloud services to ensure accurate, timely, and accessible data
- Implement and maintain data models for analytics use cases, working closely with senior engineers and analysts
- Collaborate with data analysts, product teams, and fellow engineers to gather requirements and support data delivery
- Apply basic data validation checks and contribute to maintaining high data quality across pipelines
- Monitor pipeline health and troubleshoot operational issues in collaboration with more senior engineers
- Contribute to scripting and automation of data workflows using Python and SQL, improving reliability and reducing manual work
- Work with GitLab CI pipelines for data deployments and gradually build expertise in automated testing and deployment practices
- Learn and apply best practices in data engineering under guidance from senior team members and peer reviews
Requirements
- 2–3+ years of experience in data engineering, backend development, or similar technical roles
- Practical experience with AWS services, especially Redshift, S3, and Glue
- Familiarity with AWS Lambda, RDS, and Kinesis (plus)
- Solid skills in Python and SQL for data extraction, transformation, and scripting
- Familiarity with dbt for data transformation and documentation or strong interest to learn it
- Exposure to streaming or micro-batching concepts and tools
- Understanding of CI/CD workflows, ideally using GitLab CI
- Willingness to work with event schema validation tools (e.g., Iglu, Schema Registry)
- Strong communication and collaboration skills
- Experience working with APIs and integrations (REST, OAuth, webhook ingestion) or eagerness to build those skills
- Detail-oriented with a focus on producing clean, reliable, and maintainable data pipelines
- A proactive, curious mindset with a desire to continuously learn and grow
- Bonus: experience with Snowflake, event collection tools (Snowplow, Rudderstack), PySpark, and infrastructure-as-code (Terraform)