Design and maintain high-performance data pipelines, automate processes, and implement scalable data storage solutions using Databricks on Azure
Collaborate with cross-functional teams to enhance data processing, integrate new data sources, and apply best practices in data lake architecture
Drive continuous improvement initiatives to enhance data processing capabilities across the organization
Design and implement efficient, high-performance data ingestion pipelines from diverse sources utilizing Databricks on Azure
Develop and maintain reliable data pipelines, automating manual processes to improve efficiency and scalability
Collaborate with the Data Engineering team to design and support robust processes for loading data into relational, dimensional, and NoSQL database systems
Implement best practices in data lake architecture to ensure scalable, standardized, and optimized data storage and processing solutions
Partner with cross-functional teams to identify, evaluate, and acquire new data sources that align with organizational objectives
Actively participate in agile development processes, contributing to sprint planning, refinement, and delivery of high-quality solutions
Requirements
4+ years experience in a highly technical and hands-on data engineering, or data science role
Proficiency with Databricks and/or PySpark is a strong advantage
Hands-on experience in developing ETL processes
Familiarity with Data Lakehouse systems and/or migrating RDBMS to Lakehouse architectures
Strong organizational and time management skills to manage multiple projects with minimal supervision
Ability to build and refine processes from the ground up
Excellent problem-solving skills, with a keen ability to investigate beyond initial findings
Exceptional attention to detail, with a commitment to testing and validating results
Comfortable working with unstructured data and navigating ambiguous outcomes
A plus if you have: Bachelor's degree in a quantitative field such as statistics, mathematics, engineering, or computer science
Knowledge of common healthcare data types (e.g., medical claims, eligibility, provider networks, Rx claims) from various sources
Python or Databricks certifications
Benefits
Competitive benefits package with generous employer subsidies
Flexible and remote working options
401k with generous employer match and immediate vesting
Personal and professional development opportunities
Supportive family benefits, including paid leave for new family members
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
data engineeringdata scienceDatabricksPySparkETL processesdata lake architecturedata ingestion pipelinesNoSQL databasesRDBMSLakehouse architecture
Soft skills
organizational skillstime managementproblem-solvingattention to detailprocess refinementcollaborationcontinuous improvementagile developmentinvestigative skillsadaptability
Certifications
Python certificationDatabricks certificationBachelor's degree in statisticsBachelor's degree in mathematicsBachelor's degree in engineeringBachelor's degree in computer science