Salary
💰 $153,000 - $191,000 per year
Tech Stack
AirflowAWSCloudETLGoogle Cloud PlatformJavaKubernetesPythonScalaSparkSQLTerraform
About the role
- Design, build, and maintain scalable ETL/ELT pipelines for structured and unstructured data.
- Contribute to and maintain the enterprise data model – the source of truth in our Snowflake warehouse.
- Write and optimize complex SQL queries (including window functions, temp tables, and query performance tuning) to support analytics and reporting needs.
- Take part in designing and maintaining centralized model layer.
- Support data warehousing solutions via Snowflake + dbt.
- Develop automation scripts in Bash, Python, or other programming languages.
- Manage cloud environments (AWS, OCI) in collaboration with infrastructure teams.
- Maintain and optimize Kubernetes (EKS) cluster for scalable workloads.
- Implement and maintain infrastructure-as-code using tools like Terraform, YAML, and Argo for reproducible and reliable deployments.
- Debug and troubleshoot data pipelines and data quality issues across systems.
- Collaborate with stakeholders of varying technical backgrounds to translate business requirements into scalable technical solutions.
- Be an active contributor to our ETL/ELT framework, proposing improvements and optimizations.
- Contribute to best practices for data modeling, governance, and quality control.
- Explore and recommend AI tools and modern data solutions for efficiency and automation.
Requirements
- Strong understanding of data engineering concepts and data warehousing fundamentals.
- Advanced SQL skills, including debugging and performance tuning.
- Proficiency in at least one general-purpose programming language (e.g., Python, Java, Scala); Python used by team.
- Familiarity with Kimball (Dimensional) Modeling.
- Basic scripting knowledge (Bash) for automation and operational workflows.
- Familiarity with cloud platforms (AWS, GCP, or OCI).
- Solid communication and collaboration skills to work effectively with technical and non-technical stakeholders.
- Familiarity with Git.
- Preferred: Experience with distributed computing frameworks such as Dask or Spark.
- Preferred: Hands-on experience managing and deploying workloads in Kubernetes.
- Preferred: Exposure to infrastructure-as-code (Terraform, Helm, Argo, etc.).
- Preferred: Experience with workflow orchestration systems (Airflow, Dagster, Argo Workflows, etc.).
- Preferred: Experience implementing Change Data Capture (CDC) pipelines.
- Preferred: Strong debugging and problem-solving skills for troubleshooting complex data issues.
- Preferred: Knowledge of AI tools and when to apply them in a data engineering context.