Allata

Data Engineer, Snowflake

Allata

full-time

Posted on:

Location Type: Office

Location: Vadodara • 🇮🇳 India

Visit company website
AI Apply
Apply

Job Level

Mid-LevelSenior

Tech Stack

AirflowApacheAWSAzureCloudETLGoogle Cloud PlatformHadoopKafkaPySparkPythonSparkSQLTableau

About the role

  • Architect, develop, and maintain scalable, efficient, and fault-tolerant data pipelines using Python and PySpark.
  • Design and implement modern Data Warehouse and Data Lake solutions on cloud platforms (Azure or AWS).
  • Build and automate ETL/ELT workflows using Snowflake, Azure Data Factory, or similar platforms.
  • Leverage DBT to define, document, and execute data transformation and modeling workflows.
  • Write optimized SQL queries for data retrieval, aggregation, and transformation to support analytics.
  • Design pipeline workflows for batch and real-time data processing using orchestration tools like Apache Airflow or Azure Data Factory.
  • Implement automated data ingestion frameworks for structured, semi-structured, and unstructured sources (APIs, FTP, data streams).
  • Architect and optimize scalable Data Warehouse and Data Lake solutions using Snowflake, Azure Data Lake, or AWS S3.
  • Implement partitioning, bucketing, and indexing strategies for efficient querying and storage management.
  • Develop ETL/ELT pipelines to handle complex transformations and business logic; integrate DBT for modularity and testability.
  • Ensure pipelines are cost-efficient and high-performance, leveraging pushdown optimization and parallel processing.
  • Implement data quality frameworks to validate, clean, and enrich datasets and build self-healing mechanisms for reliability.
  • Optimize distributed processing workflows for Spark by tuning executor memory and partitioning; profile and debug workflows.
  • Deploy and manage data workflows on cloud services (AWS Glue, Azure Synapse, Databricks) and monitor resource usage and costs.
  • Collaborate with data analysts, scientists, and stakeholders; maintain documentation and conduct code reviews.

Requirements

  • Advanced skills in Python and PySpark for high-performance distributed data processing.
  • Proficient in creating data pipelines with orchestration frameworks like Apache Airflow or Azure Data Factory.
  • Strong experience with Snowflake, SQL Data Warehouse, and Data Lake architectures.
  • Ability to write, optimize, and troubleshoot complex SQL queries and stored procedures.
  • Deep understanding of building and managing ETL/ELT workflows using tools such as DBT, Snowflake, or Azure Data Factory.
  • Hands-on experience with cloud platforms such as Azure or AWS, including services like S3, Lambda, Glue, or Azure Blob Storage.
  • Proficient in designing and implementing data models, including star and snowflake schemas.
  • Familiarity with distributed processing systems and concepts such as Spark, Hadoop, or Databricks.
  • Experience with real-time data processing frameworks such as Kafka or Kinesis (good-to-have).
  • Snowflake, Azure, AWS, or GCP certifications (good-to-have).
  • Knowledge of data visualization platforms such as Power BI, Tableau, or Looker (good-to-have).
  • Strong teamwork and communication skills, intellectual curiosity, and resourcefulness.
  • Commitment to delivering high-quality, accurate, and reliable data products.

ATS Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
PythonPySparkSQLETLELTDBTSnowflakeApache AirflowAzure Data FactoryData Warehouse
Soft skills
teamworkcommunicationintellectual curiosityresourcefulnesscommitment to quality
Certifications
Snowflake certificationAzure certificationAWS certificationGCP certification
Amgen

Associate Data Engineer

Amgen
Junior · Midfull-time🇵🇹 Portugal
Posted: 1 day agoSource: amgen.wd1.myworkdayjobs.com
AirflowApacheAWSAzureCloudETLGoogle Cloud PlatformNoSQLPySparkPythonSparkSQL
ClickHouse

Senior Software Engineer

ClickHouse
Seniorfull-time🇩🇪 Germany
Posted: 16 days agoSource: boards.greenhouse.io
AirflowApacheAWSAzureCloudETLGoogle Cloud PlatformGrafanaJavaKafkaPandasPySpark+5 more
ClickHouse

Senior Software Engineer

ClickHouse
Seniorfull-time🇨🇦 Canada
Posted: 16 days agoSource: boards.greenhouse.io
AirflowApacheAWSAzureCloudETLGoogle Cloud PlatformGrafanaJavaKafkaPandasPySpark+5 more
ClickHouse

Senior Software Engineer

ClickHouse
Seniorfull-time🇮🇱 Israel
Posted: 16 days agoSource: boards.greenhouse.io
AirflowApacheAWSAzureCloudETLGoogle Cloud PlatformGrafanaJavaKafkaPandasPySpark+5 more
InfraCloud Technologies

Data Engineer

InfraCloud Technologies
Mid · Seniorfull-time🇮🇳 India
Posted: 18 days agoSource: infracloud.hire.trakstar.com
AirflowApacheAWSAzureCloudETLGoogle Cloud PlatformIoTKafkaMongoDBPySparkPython+3 more