Tech Stack
AirflowAmazon RedshiftAWSCloudDistributed SystemsDockerETLKubernetesPySparkPythonSQL
About the role
- AWS Data Engineer building cloud data pipelines for analytics.
- The job entails designing, building, and maintaining large-scale data pipelines and cloud-based data platforms using AWS services.
- Design, develop, and optimize data pipelines using Python, PySpark, and SQL.
- Build and manage ETL/ELT workflows for structured and unstructured data.
- Leverage AWS services (S3, Glue, EMR, Redshift, Lambda, Athena, Kinesis, Step Functions, RDS) for data engineering solutions.
- Implement data lake/data warehouse architectures and ensure data quality, consistency, and security.
- Work with large-scale distributed systems for real-time and batch data processing.
- Collaborate with data scientists, analysts, and business stakeholders to deliver high-quality, reliable data solutions.
- Develop and enforce data governance, monitoring, and best practices for performance optimization.
- Deploy and manage CI/CD pipelines for data workflows using AWS tools (CodePipeline, CodeBuild) or GitHub Actions.
Requirements
- Strong programming skills in Python and hands-on experience with PySpark.
- Proficiency in SQL for complex queries, transformations, and performance tuning.
- Solid experience with AWS cloud ecosystem (S3, Glue, EMR, Redshift, Athena, Lambda, etc.).
- Experience working with data lakes, data warehouses, and distributed systems.
- Knowledge of ETL frameworks, workflow orchestration (Airflow, Step Functions, or similar), and automation.
- Familiarity with Docker, Kubernetes, or containerized deployments.
- Strong understanding of data modeling, partitioning, and optimization techniques.
- Excellent problem-solving, debugging, and communication skills.