Tech Stack
AirflowApacheAWSCloudDockerKafkaKubernetesPandasPostgresPythonRabbitMQSQLTerraform
About the role
- Design, build, and fine-tune data pipelines to load data into data warehouse from sources like databases, cloud storage, or streaming platforms.
- Analyze and refactor existing Python-based file ingestion processes.
- Troubleshoot pipeline failure and implement quality checks, retries, and safeguards to prevent pipeline failures.
- Reduce manual intervention through automation in data pipelines.
- Create detailed runbooks documenting existing and new data platform processes.
- Add unit tests to improve reliability and maintainability.
- Design modular, reusable Python code via IaC that will be able to deploy AWS resources and build data platform components.
- Assist with Data architecture, building and validating data architecture.
- Contribute to the design, development, testing, deployment, and support of data pipelines and warehouses in a cloud environment.
- Build and maintain data platform with goal of scalability and validation.
- Advocate engineering best practices, including the use of design patterns, code review, and automated unit/functional testing.
- Collaborate efficiently with product management, technical program management, operations, and other engineering teams.
Requirements
- Minimum 3+ years of related experience in Cloud development crafting and implementing solutions using AWS services, with understanding of IAM, S3, ECS, Lambda, SNS
- Strong programming skills in Python and other object-oriented languages, including libraries such as pandas, pydantic, and polars
- Experience using source control systems (GitLab, GitHub) and CI/CD pipelines
- Experience working with IAC, such as Terraform, CloudFormation, AWS CDK
- Experience in SQL (Postgres, Snowflake, etc.) and understanding of trade-offs between different data storage systems and architectures
- Experience with scripting using BASH, or SHELL
- Experience creating orchestration patterns and infrastructure using Apache Airflow, or other orchestration tools (preferred)
- Experience using Queuing tech, such as SQS, RabbitMQ, Kafka, Kinesis (preferred)
- Experience with data pipeline creation (preferred)
- Experience with security protocols or related data (preferred)
- Experience with AGILE/Scrum practices (preferred)
- Experience building, maintaining and using container technologies, such as Docker, Kubernetes, AWS Fargate (preferred)
- Bachelor’s degree in Computer Science, Information Systems, Software, Electrical or Electronics Engineering, or comparable field of study, and/or equivalent work experience