Principal Engineer

Dun & Bradstreet

full-time

Posted on: 9/9/2025

Location: 🇮🇳 India

✨ AI Apply

Lead

AirflowApacheAWSCloudGoogle Cloud PlatformHadoopNoSQLPySparkPythonSpark

About the role

Design and develop scalable and efficient data pipelines within the Big Data ecosystem using Apache Spark and Apache Airflow; document new and existing pipelines and datasets to ensure clarity and maintainability.
Demonstrate and implement data architecture and management practices across data pipelines, data lakes, and modern data warehousing, including virtual data warehouses and push-down analytics.
Write clean, efficient, and maintainable code in Python to support data processing and platform functionality.
Utilize cloud-based infrastructures (AWS/GCP) and their services, including compute resources, databases, and data warehouses; manage and optimize cloud-based data infrastructure.
Develop and manage workflows using Apache Airflow for scheduling and orchestrating data processing jobs and create/maintain Airflow DAGs.
Implement and maintain Big Data architecture including cluster installation, configuration, monitoring, security, resource management, maintenance, and performance tuning.
Create detailed designs and proof-of-concepts (POCs) to enable new workloads and technical capabilities on the platform; collaborate with platform and infrastructure engineers to implement capabilities in production.
Manage workloads and optimize resource allocation and scheduling across multiple tenants to fulfill SLAs.
Participate in planning activities and collaborate with data science teams to enhance platform skills and capabilities.

Minimum 10+ years of hands-on experience in Big Data technologies, including a minimum of 3 year's experience working with Spark, Pyspark.
Experience with Google Cloud Platform (GCP) is preferred, particularly with Dataproc, and at least 6 years of experience in cloud environments is required.
Must have hands-on experience in managing cloud-deployed solutions, preferably on AWS, along with NoSQL and Graph databases.
Prior experience working in a global organization and within a DevOps model is considered a strong plus.
Exhibit expert-level programming skills in Python, with the ability to write clean, efficient, and maintainable code.
Demonstrate familiarity with data pipelines, data lakes, and modern data warehousing practices, including virtual data warehouses and push-down analytics.
Design and implement distributed data processing solutions using technologies like Apache Spark and Hadoop.
Develop and manage workflows using Apache Airflow for scheduling and orchestrating data processing jobs; create and maintain Apache Airflow DAGs.
Possess strong knowledge of Big Data architecture, including cluster installation, configuration, monitoring, security, resource management, maintenance, and performance tuning.