Senior Platform Data Engineer

Geisinger

Senior Platform Data Engineer managing clinical data pipelines for AI readiness at Geisinger. Owning shared data products and leading data ingestion and transformation efforts.

Posted 4/16/2026full-timeRemote • Pennsylvania • 🇺🇸 United StatesSeniorWebsite

Tech Stack

Tools & technologies

KafkaPandasPySparkPythonSparkSQLUnity

About the role

Key responsibilities & impact

The Senior Platform Data Engineer owns roadmap, priorities, platform standards, and architecture reviews; provides formal input on performance reviews.
This position makes clinical data ready for AI at scale: owning the shared data products, retrieval infrastructure, and platform administration that the entire AI portfolio depends on.
Owns Real-time data feeds. Reusable clinical data models and feature pipelines. RAG retrieval infrastructure (ingestion, chunking, embeddings, vector DB, retrieval pipelines).
Streams data from Epic SDE, ADT feeds, lab results, and other clinical sources into Databricks for downstream model consumption.
Curates shared clinical feature tables (patient demographics, labs, vitals, diagnoses, utilization history, imaging metadata) in Databricks/Unity Catalog that multiple AI programs consume for model training, validation, and monitoring.
Designs and operates document ingestion pipelines: normalizing clinical documents, policies, guidelines, and unstructured data sources into formats ready for embedding and retrieval.
Implements and optimizes chunking strategies tailored to healthcare content (e.g., preserving clinical note structure, section-aware chunking for guidelines and protocols).
Establishes data quality gates for RAG: automated profiling, completeness checks, and accuracy scoring before content enters the vector store.

Requirements

What you’ll need

5+ years in data engineering, with strong experience building both batch and streaming data pipelines
Expert-level Databricks skills: Delta Live Tables, PySpark, Unity Catalog, Feature Store
Hands-on experience with real-time data ingestion (Kafka, Spark Structured Streaming, or comparable frameworks)
Strong SQL and Python (pandas, PySpark) skills for data transformation and feature engineering
Experience administering Databricks workspaces: cluster policies, compute management, access controls, cost monitoring
Familiarity with clinical data models and healthcare data sources (EHR extracts, ADT feeds, lab results, claims data) strongly preferred
Experience with Epic data extraction methods (SDE, FHIR, epic-ws) a significant plus
Understanding of data governance principles: lineage, quality monitoring, access controls.

Benefits

Comp & perks

We offer healthcare benefits for full time and part time positions from day one, including vision, dental and domestic partners.
We encourage an atmosphere of collaboration, cooperation and collegiality.
We know that a diverse workforce with unique experiences and backgrounds makes our team stronger.

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

data engineeringbatch data pipelinesstreaming data pipelinesDatabricksDelta Live TablesPySparkSQLPythondata transformationfeature engineering

Soft Skills

leadershiporganizational skillscommunication