Geisinger

Senior Platform Data Engineer

Geisinger

full-time

Posted on:

Location Type: Remote

Location: PennsylvaniaUnited States

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • The Senior Platform Data Engineer owns roadmap, priorities, platform standards, and architecture reviews; provides formal input on performance reviews.
  • This position makes clinical data ready for AI at scale: owning the shared data products, retrieval infrastructure, and platform administration that the entire AI portfolio depends on.
  • Owns Real-time data feeds. Reusable clinical data models and feature pipelines. RAG retrieval infrastructure (ingestion, chunking, embeddings, vector DB, retrieval pipelines).
  • Streams data from Epic SDE, ADT feeds, lab results, and other clinical sources into Databricks for downstream model consumption.
  • Curates shared clinical feature tables (patient demographics, labs, vitals, diagnoses, utilization history, imaging metadata) in Databricks/Unity Catalog that multiple AI programs consume for model training, validation, and monitoring.
  • Designs and operates document ingestion pipelines: normalizing clinical documents, policies, guidelines, and unstructured data sources into formats ready for embedding and retrieval.
  • Implements and optimizes chunking strategies tailored to healthcare content (e.g., preserving clinical note structure, section-aware chunking for guidelines and protocols).
  • Establishes data quality gates for RAG: automated profiling, completeness checks, and accuracy scoring before content enters the vector store.

Requirements

  • 5+ years in data engineering, with strong experience building both batch and streaming data pipelines
  • Expert-level Databricks skills: Delta Live Tables, PySpark, Unity Catalog, Feature Store
  • Hands-on experience with real-time data ingestion (Kafka, Spark Structured Streaming, or comparable frameworks)
  • Strong SQL and Python (pandas, PySpark) skills for data transformation and feature engineering
  • Experience administering Databricks workspaces: cluster policies, compute management, access controls, cost monitoring
  • Familiarity with clinical data models and healthcare data sources (EHR extracts, ADT feeds, lab results, claims data) strongly preferred
  • Experience with Epic data extraction methods (SDE, FHIR, epic-ws) a significant plus
  • Understanding of data governance principles: lineage, quality monitoring, access controls.
Benefits
  • We offer healthcare benefits for full time and part time positions from day one, including vision, dental and domestic partners.
  • We encourage an atmosphere of collaboration, cooperation and collegiality.
  • We know that a diverse workforce with unique experiences and backgrounds makes our team stronger.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
data engineeringbatch data pipelinesstreaming data pipelinesDatabricksDelta Live TablesPySparkSQLPythondata transformationfeature engineering
Soft Skills
leadershiporganizational skillscommunication