NucleusTeq

Data Engineer, Java

NucleusTeq

full-time

Posted on:

Origin:  • 🇮🇳 India

Visit company website
AI Apply
Manual Apply

Job Level

Mid-LevelSenior

Tech Stack

AirflowBigQueryCassandraCloudETLGoogle Cloud PlatformHadoopHBaseJavaJenkinsKafkaMicroservicesMongoDBNoSQLPostgresPythonSparkSpring BootSpringBootSQLTerraform

About the role

  • Design & Develop: Build and maintain scalable data platform frameworks leveraging Big Data technologies (Spark, Hadoop, Kafka, Hive, etc.) and GCP services (BigQuery, Dataflow, Pub/Sub, etc.).
  • Data Pipeline Development: Develop, optimize, and manage batch and real-time data pipelines to support business intelligence, analytics, and AI/ML workloads.
  • Java Development: Utilize Java to build efficient, high-performance data processing applications and frameworks.
  • Cloud Architecture: Design and implement cloud-native data solutions on GCP, ensuring reliability, security, and cost efficiency.
  • ETL & Data Integration: Work with structured and unstructured data sources, integrating data from multiple systems into a unified platform.
  • Performance Tuning: Optimize data processing performance by fine-tuning Spark jobs, SQL queries, and distributed computing environments.
  • Collaboration: Work closely with data scientists, analysts, and software engineers to deliver high-quality data solutions.
  • Automation & Monitoring: Implement CI/CD pipelines for data workflows and set up monitoring solutions to track system health and performance.

Requirements

  • Experience Level: 5+ Years
  • Strong proficiency in Java for data engineering and backend development (springboot, microservices).
  • Hands-on experience with Big Data technologies (Hadoop, Spark, Kafka, Hive, HBase, etc.).
  • Expertise in GCP services: Big Query, Dataflow, Pub/Sub, Cloud Storage, Composer (Airflow), Python Dataproc, etc.
  • Experience in developing data platform frameworks to support scalable and reusable data solutions.
  • SQL & NoSQL database experience (e.g., Big Query, PostgreSQL, Cassandra, MongoDB).
  • Knowledge of ETL/ELT processes and data modeling concepts.
  • Experience with CI/CD tools (Git, Jenkins, Terraform) and infrastructure as code (IaC).
  • Understanding of distributed computing principles and high-performance data processing.
  • Strong problem-solving skills and ability to work in a fast-paced, agile environment.