Design, build, and maintain ETL/ELT pipelines to ingest, transform, curate and store data from multiple sources
Optimize data processing workflows for performance, reliability, and scalability
Implement real-time and batch data processing using technologies like Apache Spark, Kafka, and Databricks
Work with structured and unstructured data
Implement data validation, cleansing, and monitoring to ensure high-quality datasets
Implement data governance, security, and compliance policies (e.g., GDPR, CCPA)
Maintain metadata management, data lineage, and documentation for data assets
Deploy and manage data solutions on cloud platforms (Azure, Databricks)
Develop and maintain documentation, data models, and technical standards
Optimize query performance, cost efficiency, and storage utilization
Monitor, troubleshoot, and resolve issues in production data pipelines and environments
Stay current with the latest advancements in data engineering, cloud computing, and analytics technologies on the Databricks ecosystem
Partner with data analysts, and software engineers to support analytics initiatives

Requirements

Bachelor’s degree in Computer Science, Engineering, or a related field
8+ years of Data Engineering, with a strong understanding of cloud-based data solutions
At least 3 years hands-on experience building and delivering data products on Databricks
Proven experience in data engineering and pipeline development on Databricks
Hands-on expertise across the data lifecycle: ingestion, transformation, modelling, governance, and consumption
Deep expertise with the Databricks platform (SQL, Python and PySpark, Delta Lake, Unity Catalog, MLflow)
Strong SQL and Python skills for data processing and data manipulation
Strong problem-solving skills and an analytical mindset
Excellent verbal and written communication skills, with the ability to explain technical concepts to non-technical audiences
Extensive experience with data ingestion methodology including ADF
Proficiency in Python, SQL, or Scala for data processing
Experience with cloud data services (Azure Data Factory, Databricks)
Hands-on experience with big data frameworks (Databricks, Apache Spark)
Strong knowledge of data modeling, database optimization, and API-based data integration
Proficiency in designing and implementing the Medallion Architecture on Databricks
Experience with code repositories, CI/CD processes and release management
Work visa sponsorship is not available for this position

Benefits

Multiple medical insurance plan options + dental and vision insurance
401K retirement plan with employer contributions matching 100% of the first 3% of employee contributions and 50% on the next 2% of employee contributions
Company provided life insurance + optional employee paid voluntary life insurance, dependent life coverage and voluntary accident coverage
Short term and long-term disability
3 weeks of paid time off for new employees + 11 company paid holidays
Vacation accrues on a monthly basis, unless applicable federal, state and local law requires a faster accrual
Paid sick time in accordance of the federal, state and local law
Paid parental leave and tuition reimbursement after 6 months of continuous service

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

ETLELTApache SparkKafkaDatabricksSQLPythonPySparkDelta Lakedata modeling

Soft Skills

problem-solvinganalytical mindsetverbal communicationwritten communication