Tech Stack
AWSAzureDjangoDockerPostgresPythonSQL
About the role
- Design, implement, and maintain scalable data pipelines to support growing data processing needs.
- Perform data validation and debugging to ensure high data quality and reliability.
- Design effective data models and architectures to optimize data processing and support downstream Data Science and Machine Learning workflows.
- Implement and automate data quality checks and validation processes.
- Build integrations with diverse source systems and RESTful APIs.
- Monitor and optimize performance of data pipelines and databases.
- Debug and resolve data-related issues in a timely manner.
- Document data pipeline architectures, processes, and procedures.
- Collaborate with Product Managers, Quality Assurance, and DevOps teams to deliver high-performance, reliable software solutions.
- Use AWS/Azure services, CloudFormation, and CI/CD for deployment and infrastructure automation.
Requirements
- Designing and implementing scalable, robust, and maintainable data pipelines using AWS/Azure services.
- Implementing data quality checks and validation processes to ensure accuracy, completeness, and integrity of data.
- Designing effective data models and architectures to optimize data processing and facilitate downstream Data Science and Machine Learning workflows.
- Utilizing data quality validation tools to automate and streamline validation processes.
- Work closely with Product Managers, Quality Assurance, and DevOps teams to deliver high-performance and reliable software solutions.
- Monitoring and optimizing the performance of data pipelines.
- Debugging issues and resolving data-related problems in a timely manner.
- Documenting data pipeline architectures, processes, and procedures.
- Stay up-to-date with new technologies, tools, and best practices in the data engineering field.
- Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent work experience).
- Minimum 6+ years of experience in Data Engineering.
- Proficiency in data modeling and architecture, leveraging AWS/Azure services to architect scalable and efficient data solutions.
- Experience designing and building data pipelines from diverse source systems.
- Solid understanding of data lake and warehousing concepts and best practices.
- Proficiency in programming languages used for data manipulation and transformation (e.g., Python, SQL).
- Extensive knowledge and hands-on experience with the Django framework.
- Experience with relational databases like PostgreSQL (PSQL).
- Knowledge of RESTful Web Services and API development.
- Understanding of database systems, including schema design, SQL querying, and performance optimization.
- Experience implementing data quality checks and validation processes.
- Experience using CloudFormation and CI/CD for deployment.
- Proficiency in version control systems, particularly Git.
- Ability to work independently and as part of a team in a fast-paced environment.
- Excellent problem-solving skills and attention to detail.
- AWS certification (e.g., AWS Certified Solutions Architect, AWS Certified Data Analytics Specialty) preferred.
- Familiarity with serverless computing (e.g., AWS Lambda) and containerization (e.g., Docker).
- Experience with LLMs and machine learning is highly preferred.