Phamily

Data Engineer

Phamily

full-time

Posted on:

Location: 🇺🇸 United States

Visit company website
AI Apply
Apply

Job Level

JuniorMid-Level

Tech Stack

AirflowBigQueryCloudPythonSQL

About the role

  • Design, build, and automate batch data pipelines that ingest from files, APIs, and SFTP into BigQuery and product environments.
  • Implement scheduling, monitoring, retries, and backfills to ensure reliable and repeatable workflows.
  • Establish guardrails such as schema management, versioning, and basic SLAs for data freshness and reliability.
  • Productionize ML/AI batch jobs and publish outputs into analytics-ready tables.
  • Maintain and refresh healthcare reference datasets (e.g., NPI, codesets, CMS lists) on schedule.
  • Document pipelines clearly and make outputs consumable for analytics and BI teams.
  • Handle PHI with care and follow HIPAA-aligned data governance practices.

Requirements

  • 2+ years of experience building batch data workflows using Python and SQL, publishing to cloud data warehouses (BigQuery preferred).
  • Proficiency with modern schedulers/orchestrators (e.g., Airflow, Prefect, Dagster) and containerized environments.
  • Experience ingesting data from APIs, files, and SFTP sources, including schema evolution management.
  • Strong debugging, monitoring, and CI/CD fundamentals.
  • Excellent documentation and communication skills with an ownership mindset.
  • Bonus: Experience with dbt, data quality testing, or healthcare data formats (claims, EHR, CMS datasets).
  • Bonus: Familiarity with running ML jobs in production.