Data Engineer

DATAMAXIS, Inc

full-time

Posted on: 9/7/2025

Location: 🇮🇳 India

✨ AI Apply

Mid-LevelSenior

AirflowAmazon RedshiftApacheAWSCloudETLGraphQLPandasPythonRaySpark

About the role

Design, build, and maintain ETL/ELT pipelines to extract, transform, and load data from various sources into cloud-based data platforms
Develop and manage data architectures, data lakes, and data warehouses on AWS (S3, Redshift, Glue, Athena)
Collaborate with data scientists, analysts, and business stakeholders to ensure data accessibility, quality, and security
Optimize performance of large-scale data systems and implement monitoring, logging, and alerting for pipelines
Work with both structured and unstructured data, ensuring reliability and scalability
Implement data governance, security, and compliance standards
Continuously improve data workflows by leveraging automation, CI/CD, and Infrastructure-as-Code (IaC)

Hands-on expertise in AWS native data services: S3, Glue (Schema Registry, Data Catalog), Step Functions, Lambda, Lake Formation, Athena, MSK/Kinesis, EMR (Spark), SageMaker (including Feature Store)
Experience designing and optimizing batch (Step Functions) and streaming (Kinesis/MSK) ingestion pipelines
Deep understanding of data mesh principles, domain-oriented ownership, data-as-a-product, and federated governance
Experience enabling self-service platforms, decentralized ingestion, and transformation workflows
Advanced knowledge of schema enforcement, evolution, and validation (preferably AWS Glue Schema Registry/JSON/Avro)
Proficiency with ELT/ETL stack: Spark (EMR), dbt, AWS Glue, and Python (pandas)
Experience designing and supporting vector stores (OpenSearch), feature stores (SageMaker Feature Store), and integrating with MLOps/data pipelines for AI/semantic search and RAG workloads
Familiarity with metadata, catalog, and lineage solutions (Glue Data Catalog, Collibra, Atlan, Amundsen, etc.)
Knowledge of data security and compliance: row/column-level security (Lake Formation), KMS encryption, role-based access, AuthN/AuthZ standards (JWT/OIDC), GDPR/SOC2/ISO 27001-aligned policies
Experience with pipeline orchestration (AWS Step Functions, Apache Airflow/MWAA) and monitoring (CloudWatch, X-Ray)
API design experience for batch and real-time data delivery (REST, GraphQL)