Salary
💰 $127,500 - $172,500 per year
Tech Stack
ApacheAWSDockerETLNoSQLPySparkPythonSaltStackScalaSparkSQL
About the role
- Ingest, transform, and warehouse healthcare-related data from various sources to support analytics and ML.
- Collaborate with Data Scientists, Product Managers, and Machine Learning Engineers to produce quality data flows and transformations that support advanced analytics and AI/ML model development.
- Develop tools and solutions to facilitate data integration, data warehousing, and data modeling.
- Enable Data Engineers and Data Scientists to experiment and train machine learning models and produce insights for customers.
- Collaborate with Data Scientists and Business Intelligence Analysts to ensure efficient and effective data processing and analysis.
- Optimize data infrastructure and processes to ensure optimal performance and scalability.
- Develop and maintain data documentation and data lineage.
- Stay current with emerging technologies and industry trends related to data engineering.
Requirements
- 5+ years of experience in data engineering.
- Extensive experience in the design, build, and maintenance of data ETL pipelines.
- Extensive knowledge of coding in Python or Scala with a focus on data processing.
- Experience using Apache Spark (PySpark or Scala).
- Experience with AWS technology stack (S3, Glue, Athena, EMR, etc.).
- Experience with data and entity relationship modeling to support data warehouses and analytics solutions.
- Deep understanding of relational and non-relational databases (SQL/NOSQL).
- Comfortable working with unstructured and semi-structured data (Web scraping).
- Experience working in a professional software environment using source control (git), an issue tracker (JIRA, Confluence, etc.), continuous integration, code reviews, and agile development process (Scrum/Lean).
- Basic data privacy and security principles.
- Interest and/or experience in AI/ML applications, including support for model development or deployment workflows.
- Proactive mindset around exploring emerging technologies in AI and data science to drive innovation.
- Knowledge of, or experience with, healthcare data standards such as HL7, FHIR, ICD, SNOMED, LOINC.
- Experience with Delta Lake and/or Databricks.
- Hands-on experience with machine learning workflows, including preparing data for AI model training and evaluation.
- Experience with machine learning workflows and data requirements for use with ML frameworks.
- Experience validating data quality, preferably with test automation.
- Experience with containerization using Docker.