Tech Stack
AirflowAmazon RedshiftApacheAWSCloudDockerETLKafkaKubernetesRDBMSSparkSQL
About the role
- Design and implement scalable ETL/ELT workflows supporting batch and streaming data using AWS primitives (S3, Kinesis, Glue, Redshift, Athena)
- Architect and maintain cloud-native data platforms with automated ingestion, transformation, and governance using DBT, Apache Spark, Delta Lake, Airflow, Databricks
- Work with Product, BI, and Support teams to assist with data-related technical challenges and support infrastructure needs
- Collaborate with cross-functional engineering teams to predict data needs and proactively deliver solutions
- Assist in optimization of data lake/lakehouse infrastructure to support AI workloads and large-scale analytics
- Ensure data quality, lineage, and observability; develop and enforce data governance, compliance monitoring, and privacy protection
- Partner with Data Scientists to optimize pipelines for model training, inference, and continuous learning workflows
- Build self-healing data pipelines with AI-driven error detection, root cause analysis, and automated remediation
- Implement intelligent data lineage tracking to discover relationships between datasets and predict downstream impact
- Create AI-assisted data discovery systems with natural language interfaces for dataset discovery and semantics
- Participate in on-call rotation as needed
- Leverage AI coding assistants to accelerate development, generate complex SQL, and optimize pipeline code
- Develop data quality monitoring using anomaly detection and data profiling
- Optimize pipeline orchestration with ML for scheduling, resource allocation, and failure recovery
- Generate and maintain living documentation that evolves with code changes
- Mentor junior engineers and lead tool evaluation and adoption for human-AI collaboration
Requirements
- 3+ years of experience in data engineering, including building and maintaining large-scale data pipelines
- Extensive experience in SQL RDBMS (SQLServer or similar) with dimensional modeling using star schema and foundational data warehousing concepts
- Hands-on experience with AWS services such as Redshift, Athena, S3, Kinesis, Lambda, Glue
- Experience with DBT, Databricks or similar data platform tooling
- Experience working with structured and unstructured data and implementing data quality frameworks
- Excellent communication and collaboration skills
- Demonstrated experience using AI coding tools (GitHub Copilot, Cursor, or similar) with understanding of prompt engineering
- Understanding of AI/ML concepts and data requirements, including feature stores, model versioning, and real-time inference pipelines
- Preferred: Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field
- Preferred: Experience in a SaaS or e-commerce environment with AI/ML products
- Preferred: Knowledge of stream processing frameworks like Kafka, Flink, or Spark Structured Streaming
- Preferred: Familiarity with LLMOps and AI model deployment patterns in data infrastructure
- Preferred: Experience with AI-powered data tools such as automated data catalogs, intelligent monitoring systems, or AI-assisted query optimization
- Preferred: Experience with containerization and orchestration tools like Docker and Kubernetes
- Willingness to travel up to 10%