
Senior Platform Data Engineer
Geisinger
full-time
Posted on:
Location Type: Remote
Location: Pennsylvania • United States
Visit company websiteExplore more
Job Level
About the role
- The Senior Platform Data Engineer owns roadmap, priorities, platform standards, and architecture reviews; provides formal input on performance reviews.
- This position makes clinical data ready for AI at scale: owning the shared data products, retrieval infrastructure, and platform administration that the entire AI portfolio depends on.
- Owns Real-time data feeds. Reusable clinical data models and feature pipelines. RAG retrieval infrastructure (ingestion, chunking, embeddings, vector DB, retrieval pipelines).
- Streams data from Epic SDE, ADT feeds, lab results, and other clinical sources into Databricks for downstream model consumption.
- Curates shared clinical feature tables (patient demographics, labs, vitals, diagnoses, utilization history, imaging metadata) in Databricks/Unity Catalog that multiple AI programs consume for model training, validation, and monitoring.
- Designs and operates document ingestion pipelines: normalizing clinical documents, policies, guidelines, and unstructured data sources into formats ready for embedding and retrieval.
- Implements and optimizes chunking strategies tailored to healthcare content (e.g., preserving clinical note structure, section-aware chunking for guidelines and protocols).
- Establishes data quality gates for RAG: automated profiling, completeness checks, and accuracy scoring before content enters the vector store.
Requirements
- 5+ years in data engineering, with strong experience building both batch and streaming data pipelines
- Expert-level Databricks skills: Delta Live Tables, PySpark, Unity Catalog, Feature Store
- Hands-on experience with real-time data ingestion (Kafka, Spark Structured Streaming, or comparable frameworks)
- Strong SQL and Python (pandas, PySpark) skills for data transformation and feature engineering
- Experience administering Databricks workspaces: cluster policies, compute management, access controls, cost monitoring
- Familiarity with clinical data models and healthcare data sources (EHR extracts, ADT feeds, lab results, claims data) strongly preferred
- Experience with Epic data extraction methods (SDE, FHIR, epic-ws) a significant plus
- Understanding of data governance principles: lineage, quality monitoring, access controls.
Benefits
- We offer healthcare benefits for full time and part time positions from day one, including vision, dental and domestic partners.
- We encourage an atmosphere of collaboration, cooperation and collegiality.
- We know that a diverse workforce with unique experiences and backgrounds makes our team stronger.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
data engineeringbatch data pipelinesstreaming data pipelinesDatabricksDelta Live TablesPySparkSQLPythondata transformationfeature engineering
Soft Skills
leadershiporganizational skillscommunication