DATAMAXIS, Inc

Data Engineer

DATAMAXIS, Inc

full-time

Posted on:

Origin:  • 🇮🇳 India

Visit company website
AI Apply
Manual Apply

Job Level

Mid-LevelSenior

Tech Stack

AirflowAmazon RedshiftApacheAWSCloudETLGraphQLPandasPythonRaySpark

About the role

  • Design, build, and maintain ETL/ELT pipelines to extract, transform, and load data from various sources into cloud-based data platforms
  • Develop and manage data architectures, data lakes, and data warehouses on AWS (S3, Redshift, Glue, Athena)
  • Collaborate with data scientists, analysts, and business stakeholders to ensure data accessibility, quality, and security
  • Optimize performance of large-scale data systems and implement monitoring, logging, and alerting for pipelines
  • Work with both structured and unstructured data, ensuring reliability and scalability
  • Implement data governance, security, and compliance standards
  • Continuously improve data workflows by leveraging automation, CI/CD, and Infrastructure-as-Code (IaC)

Requirements

  • Hands-on expertise in AWS native data services: S3, Glue (Schema Registry, Data Catalog), Step Functions, Lambda, Lake Formation, Athena, MSK/Kinesis, EMR (Spark), SageMaker (including Feature Store)
  • Experience designing and optimizing batch (Step Functions) and streaming (Kinesis/MSK) ingestion pipelines
  • Deep understanding of data mesh principles, domain-oriented ownership, data-as-a-product, and federated governance
  • Experience enabling self-service platforms, decentralized ingestion, and transformation workflows
  • Advanced knowledge of schema enforcement, evolution, and validation (preferably AWS Glue Schema Registry/JSON/Avro)
  • Proficiency with ELT/ETL stack: Spark (EMR), dbt, AWS Glue, and Python (pandas)
  • Experience designing and supporting vector stores (OpenSearch), feature stores (SageMaker Feature Store), and integrating with MLOps/data pipelines for AI/semantic search and RAG workloads
  • Familiarity with metadata, catalog, and lineage solutions (Glue Data Catalog, Collibra, Atlan, Amundsen, etc.)
  • Knowledge of data security and compliance: row/column-level security (Lake Formation), KMS encryption, role-based access, AuthN/AuthZ standards (JWT/OIDC), GDPR/SOC2/ISO 27001-aligned policies
  • Experience with pipeline orchestration (AWS Step Functions, Apache Airflow/MWAA) and monitoring (CloudWatch, X-Ray)
  • API design experience for batch and real-time data delivery (REST, GraphQL)