Salary
💰 $157,250 - $212,750 per year
Tech Stack
ApacheAWSDockerETLNoSQLPySparkPythonSaltStackScalaSparkSQL
About the role
- Ingest, transform, and warehouse healthcare-related data from various sources to produce strategic data engineering solutions
- Collaborate with Data Scientists, Product Managers, Machine Learning Engineers, and Business Intelligence Analysts to produce quality data flows and transformations that support advanced analytics and AI/ML model development
- Develop tools and solutions to facilitate data integration, data warehousing, and data modeling
- Enable Data Engineers and Data Scientists to experiment and train machine learning models to produce useful insights for customers
- Optimize data infrastructure and processes to ensure optimal performance and scalability
- Develop and maintain data documentation and data lineage
- Stay current with emerging technologies and industry trends related to data engineering
Requirements
- 7+ years of experience in data engineering
- Extensive experience in the design, build, and maintenance of data ETL pipelines
- Extensive knowledge of coding in Python or Scala with a focus on data processing
- Experience using Apache Spark (PySpark or Scala)
- Experience with AWS technology stack (S3, Glue, Athena, EMR, etc.)
- Experience with data and entity relationship modeling to support data warehouses and analytics solutions
- Deep understanding of relational and non-relational databases (SQL/NOSQL)
- Comfortable working with unstructured and semi-structured data (Web scraping)
- Experience working in a professional software environment using source control (git), an issue tracker (JIRA, Confluence, etc.), continuous integration, code reviews, and agile development process (Scrum/Lean)
- Basic data privacy and security principles
- Interest and/or experience in AI/ML applications, including support for model development or deployment workflows
- Proactive mindset around exploring emerging technologies in AI and data science to drive innovation
- Knowledge of, or experience with, healthcare data standards such as HL7, FHIR, ICD, SNOMED, LOINC (nice-to-have)
- Experience with Delta Lake and/or Databricks (nice-to-have)
- Hands-on experience with machine learning workflows, including preparing data for AI model training and evaluation (nice-to-have)
- Experience validating data quality, preferably with test automation (nice-to-have)
- Experience with containerization using Docker (nice-to-have)