Senior Data Engineer

Zantech

full-time

Posted on: 9/26/2025

Location: District of Columbia, Washington • 🇺🇸 United States

Visit company website

✨ AI Apply

Apply

Job Level

Senior

Tech Stack

ApacheAWSDistributed SystemsETLPySparkPythonSparkSQL

About the role

Develop Spark applications in AWS Databricks, utilizing Python, pySpark, SQL and to meet project requirements and data processing needs.
Design and implement robust ETL pipelines using Apache Spark in Databricks, ensuring data integrity, efficiency, and scalability.
Collaborate with cross-functional teams to understand business requirements and design solutions that leverage structured, semi-structured, and unstructured data effectively.
Write high-quality code in a timely manner, adhering to coding standards, best practices, and established development processes.
Utilize version control systems like Git to manage codebase and ensure seamless collaboration within the team.
Merge and consolidate various data sets using Pyspark code, enabling streamlined data processing and analysis.
Work with APIs to facilitate data ingestion from diverse sources and integrate data into the ecosystem.
Apply expertise in Databricks delta lake to optimize data storage, query performance, and overall data processing efficiency.
Demonstrate knowledge of application development life cycles and promote continuous integration/deployment practices for efficient project delivery.
Perform query tuning, performance tuning, troubleshooting, and debugging for Spark and other big data solutions to enhance system efficiency and reliability.
Exhibit expertise in database concepts and SQL to efficiently manipulate, process, and extract insights from complex datasets.
Apply database engineering and design principles to ensure data infrastructure meets high standards of scalability, reliability, and performance.
Leverage previous experience in handling large-scale distributed systems to deliver and operate data solutions efficiently.
Demonstrate a successful track record of extracting value from extensive, disconnected datasets to drive data-driven decision-making.

Requirements

A minimum of 8+ years of hands-on experience in Spark, with proficiency in either Python or pySpark.
Databricks Certified Data Engineer Associate or Professional Certification preferred.
Strong knowledge of the Databricks platform and previous experience working with it.
Extensive experience with Apache Spark and a proven history of successful development in this environment.
Proficiency in at least one programming language (Python, pySpark).
Previous experience in ETL and data application development, coupled with expertise in version control systems like Git.
Ability to write Pyspark code for data merging and transformation.
Experience working with APIs for data ingestion and integration.
Familiarity with Databricks delta lake and expertise in query optimization techniques.
Sound understanding of application development lifecycles and continuous integration/deployment practices.
Proven experience in query tuning, performance tuning, troubleshooting, and debugging Spark and other big data solutions.
Solid knowledge of database concepts and SQL.
Strong background in handling large and complex datasets from various sources and databases.
Proficient understanding of database engineering and design principles.
Required Security Clearance: US Citizenship and the ability to obtain and maintain an active Public trust or higher clearance, per contract requirements.