Senior Data Engineer

Ceresti Health

Senior Data Engineer at Ceresti designing end-to-end data architecture to improve dementia care outcomes. Collaborating with cross-functional teams and ensuring data quality for healthcare solutions.

Posted 6/3/2026full-timeRemote • 🇺🇸 United StatesSeniorWebsite

ATS Keywords

Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills

PostgreSQLdata pipelinesdata ingestiondbtPythondata validationdata governanceinfrastructure-as-codeCI/CDmachine learning

Soft Skills

communicationmentoringcollaborationproblem-solvingAgile methodology

Tools & Technologies

AWSS3DagsterPrefectAirflowGreat ExpectationsPanderaSodadata warehousedata lake

Certifications & Qualifications

BS/BA in Computer ScienceHITRUSTSOC 2

Industry Keywords

HIPAAPHIPIIdata qualityschema design

Tech Stack

Tools & technologies

AirflowAWSCloudPostgresPythonSQLVault

About the role

Key responsibilities & impact

Design and own Ceresti’s end-to-end data architecture: a landing zone with secure cloud object storage for raw partner files and API payloads, validated ingestion pipelines into our transactional Postgres, and a curated analytics layer that decouples reporting and AI workloads from production
Build ingestion pipelines for the data we receive today, including partner data files (CSV/JSON/XML/HL7/X12 as applicable) and REST/SFTP API integrations with schema validation, quarantine of bad records, and full lineage from raw bytes to curated row
Stand up and operate the curated layer (data warehouse / lakehouse-lite) so analytics and ML models can consume data without slowing down the transactional system
Choose, integrate, and operate the smallest set of tools needed, including object storage, an orchestrator (Dagster, Prefect, Airflow, etc.), dbt or similar for transformations, a single validation library (Great Expectations / Pandera / Soda)
Design and enforce data governance for a HIPAA-regulated environment: PHI/PII classification, encryption in transit and at rest, role-based access, audit logging, retention and minimum-necessary policies, and de-identification where appropriate
Partner with backend, ML, product, and clinical stakeholders to define data contracts with our health plan and ACO partners and hold the line on data quality
Build and maintain reliable feature data for ML models, including embeddings (e.g., pgvector) and curated feature tables for risk stratification, engagement, and outcomes work
Instrument the data platform for observability including pipeline SLAs, data freshness, schema drift, quality metrics, and act on what the data tells you
Participate fully in our Agile process: backlog grooming, sprint planning, demos, and retrospectives
Mentor engineers across the team on SQL, schema design, and the craft of building data systems that are boring in the best possible way

Requirements

What you’ll need

BS/BA degree or higher in Computer Science, Engineering, or a related technical field
8+ years of professional data engineering experience, with a track record of shipping production data systems end-to-end
Mastery of PostgreSQL: schema design, indexing, query tuning, partitioning, logical replication, JSONB, extensions (pg_partman, pg_cron, pgvector, etc.), and operating Postgres at scale
Strong experience designing and operating data pipelines, including file-based ingestion (SFTP / object storage drops) and API-based ingestion (REST, webhooks)
Hands-on experience with one or more cloud platforms (AWS preferred) and their data primitives: object storage (S3), managed Postgres
Experience designing data warehouses and/or data lakes and the judgment to know which one a given problem actually needs
Strong experience with dbt (or equivalent SQL-based transformation framework) and modern data modeling patterns (Kimball dimensional, Data Vault, One Big Table — and an opinion about when each is right)
Experience with at least one orchestration framework (Dagster, Prefect, or Airflow) and a clear point of view on which to use when
Strong Python skills for ingestion, validation, and tooling
Experience with data validation and data-quality frameworks (Great Expectations, Pandera, Soda, or equivalent)
Experience with change-data-capture from Postgres (logical replication, or equivalent)
Data governance experience in a HIPAA-regulated environment or, at minimum, demonstrated instincts for protecting PHI and PII (encryption, least privilege, audit, de-identification, BAA-aware vendor selection); HITRUST or SOC 2 experience is a strong plus
Comfortable with infrastructure-as-code and CI/CD for data systems
Experience supporting ML workloads: building feature tables, managing training data, serving features at inference time; familiarity with embeddings, vector search (pgvector or equivalent), and LLM integration patterns (RAG, prompt-grounded analytics) is a plus
Excellent written and verbal communication skills: you can explain a tricky schema decision to a business stakeholder and a data contract to a partner with equal clarity
Demonstrated experience working in Agile/Scrum teams

Benefits

Comp & perks

Competitive salary and benefits package
Opportunities for professional growth and development
Collaborative and dynamic work environment
Flexible work arrangements and remote work options
Access to cutting-edge technologies and tools