Tech Stack
AirflowBigQueryCassandraCloudETLGoogle Cloud PlatformHadoopHBaseJavaJenkinsKafkaMicroservicesMongoDBNoSQLPostgresPythonSparkSpring BootSpringBootSQLTerraform
About the role
- Design & Develop: Build and maintain scalable data platform frameworks leveraging Big Data technologies (Spark, Hadoop, Kafka, Hive, etc.) and GCP services (BigQuery, Dataflow, Pub/Sub, etc.).
- Data Pipeline Development: Develop, optimize, and manage batch and real-time data pipelines to support business intelligence, analytics, and AI/ML workloads.
- Java Development: Utilize Java to build efficient, high-performance data processing applications and frameworks.
- Cloud Architecture: Design and implement cloud-native data solutions on GCP, ensuring reliability, security, and cost efficiency.
- ETL & Data Integration: Work with structured and unstructured data sources, integrating data from multiple systems into a unified platform.
- Performance Tuning: Optimize data processing performance by fine-tuning Spark jobs, SQL queries, and distributed computing environments.
- Collaboration: Work closely with data scientists, analysts, and software engineers to deliver high-quality data solutions.
- Automation & Monitoring: Implement CI/CD pipelines for data workflows and set up monitoring solutions to track system health and performance.
Requirements
- Experience Level: 5+ Years
- Strong proficiency in Java for data engineering and backend development (springboot, microservices).
- Hands-on experience with Big Data technologies (Hadoop, Spark, Kafka, Hive, HBase, etc.).
- Expertise in GCP services: Big Query, Dataflow, Pub/Sub, Cloud Storage, Composer (Airflow), Python Dataproc, etc.
- Experience in developing data platform frameworks to support scalable and reusable data solutions.
- SQL & NoSQL database experience (e.g., Big Query, PostgreSQL, Cassandra, MongoDB).
- Knowledge of ETL/ELT processes and data modeling concepts.
- Experience with CI/CD tools (Git, Jenkins, Terraform) and infrastructure as code (IaC).
- Understanding of distributed computing principles and high-performance data processing.
- Strong problem-solving skills and ability to work in a fast-paced, agile environment.