Tech Stack
AirflowBigQueryCloudPythonSQL
About the role
- Design, build, and automate batch data pipelines that ingest from files, APIs, and SFTP into BigQuery and product environments.
- Implement scheduling, monitoring, retries, and backfills to ensure reliable and repeatable workflows.
- Establish guardrails such as schema management, versioning, and basic SLAs for data freshness and reliability.
- Productionize ML/AI batch jobs and publish outputs into analytics-ready tables.
- Maintain and refresh healthcare reference datasets (e.g., NPI, codesets, CMS lists) on schedule.
- Document pipelines clearly and make outputs consumable for analytics and BI teams.
- Handle PHI with care and follow HIPAA-aligned data governance practices.
Requirements
- 2+ years of experience building batch data workflows using Python and SQL, publishing to cloud data warehouses (BigQuery preferred).
- Proficiency with modern schedulers/orchestrators (e.g., Airflow, Prefect, Dagster) and containerized environments.
- Experience ingesting data from APIs, files, and SFTP sources, including schema evolution management.
- Strong debugging, monitoring, and CI/CD fundamentals.
- Excellent documentation and communication skills with an ownership mindset.
- Bonus: Experience with dbt, data quality testing, or healthcare data formats (claims, EHR, CMS datasets).
- Bonus: Familiarity with running ML jobs in production.