Dun & Bradstreet

Principal Engineer

Dun & Bradstreet

full-time

Posted on:

Origin:  • 🇮🇳 India

Visit company website
AI Apply
Apply

Job Level

Lead

Tech Stack

AirflowApacheAWSCloudGoogle Cloud PlatformHadoopNoSQLPySparkPythonSpark

About the role

  • Design and develop scalable and efficient data pipelines within the Big Data ecosystem using Apache Spark and Apache Airflow; document new and existing pipelines and datasets to ensure clarity and maintainability.
  • Demonstrate and implement data architecture and management practices across data pipelines, data lakes, and modern data warehousing, including virtual data warehouses and push-down analytics.
  • Write clean, efficient, and maintainable code in Python to support data processing and platform functionality.
  • Utilize cloud-based infrastructures (AWS/GCP) and their services, including compute resources, databases, and data warehouses; manage and optimize cloud-based data infrastructure.
  • Develop and manage workflows using Apache Airflow for scheduling and orchestrating data processing jobs and create/maintain Airflow DAGs.
  • Implement and maintain Big Data architecture including cluster installation, configuration, monitoring, security, resource management, maintenance, and performance tuning.
  • Create detailed designs and proof-of-concepts (POCs) to enable new workloads and technical capabilities on the platform; collaborate with platform and infrastructure engineers to implement capabilities in production.
  • Manage workloads and optimize resource allocation and scheduling across multiple tenants to fulfill SLAs.
  • Participate in planning activities and collaborate with data science teams to enhance platform skills and capabilities.

Requirements

  • Minimum 10+ years of hands-on experience in Big Data technologies, including a minimum of 3 year's experience working with Spark, Pyspark.
  • Experience with Google Cloud Platform (GCP) is preferred, particularly with Dataproc, and at least 6 years of experience in cloud environments is required.
  • Must have hands-on experience in managing cloud-deployed solutions, preferably on AWS, along with NoSQL and Graph databases.
  • Prior experience working in a global organization and within a DevOps model is considered a strong plus.
  • Exhibit expert-level programming skills in Python, with the ability to write clean, efficient, and maintainable code.
  • Demonstrate familiarity with data pipelines, data lakes, and modern data warehousing practices, including virtual data warehouses and push-down analytics.
  • Design and implement distributed data processing solutions using technologies like Apache Spark and Hadoop.
  • Develop and manage workflows using Apache Airflow for scheduling and orchestrating data processing jobs; create and maintain Apache Airflow DAGs.
  • Possess strong knowledge of Big Data architecture, including cluster installation, configuration, monitoring, security, resource management, maintenance, and performance tuning.