Tech Stack
ApacheAWSETLGrafanaKafkaLinuxMySQLNumpyPandasPythonSQL
About the role
- Design, build, and maintain scalable real-time and batch data pipelines using Kafka, AWS, and Python.
- Develop event-driven data workflows leveraging Kafka topics, Lambda functions, and downstream sinks (Snowflake, S3).
- Build and maintain robust ETL processes across Snowflake and MySQL.
- Ensure high-quality data delivery by writing modular, well-tested Python code.
- Monitor pipelines using CloudWatch, Grafana, or custom metrics.
- Contribute to technical design, architectural reviews, and infrastructure improvements.
- Maintain data integrity and scalability in high-volume environments.
- Collaborate with cross-functional teams to create reliable data products that support advanced analytics and personalization.
- Continuously explore and implement best practices in data engineering, streaming, and analytics infrastructure.
Requirements
- 4+ years of experience as a Data Engineer in a production environment.
- Fluency in Python with strong experience in a Linux environment.
- Strong SQL skills, with hands-on experience in Snowflake and MySQL.
- Solid understanding of data modeling, schema design, and performance optimization.
- Proficiency with Python data libraries: NumPy, Pandas, SciPy.
- Experience building and maintaining real-time pipelines with Kafka (Apache Kafka or AWS MSK).
- Familiarity with AWS services: Lambda, Kinesis, S3, CloudWatch.
- Experience with event-driven architectures and streaming data patterns.
- Knowledge of RESTful APIs, data integration, and Git workflows.
- Comfortable in Agile environments with strong testing and code review practices.
- Preferred: Experience with large-scale streaming data systems; exposure to performance monitoring and debugging tools; familiarity with infrastructure-as-code and CI/CD pipelines.