
Senior Data Engineer, Data and Applied AI
Plume
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
Salary
💰 $158,000 - $168,000 per year
Job Level
About the role
- Building and maintaining production-grade data pipelines in cloud data warehouses such as Google BigQuery or equivalent, following architectural standards set by the Director of Data and AI.
- Designing and developing dbt models across bronze, silver, and gold layers, including a focus on quality and governance via automated tests, documentation, and incremental load strategies.
- Creating and optimizing Airflow DAGs for data workflow orchestration, including scheduling, dependency management, error handling, and alerting.
- Implement dimensional data models and data mart structures — guided by the team's modeling standards — that support clinical BI and ML feature consumption.
- Crafting easy-to-understand visualizations and dashboards that align with commonly used business analytic standards in Looker or equivalent BI tools in close collaboration with product analytics, finance, operations, growth, and clinical stakeholders.
- Integrating healthcare data from sources such as EHRs, Stripe, 3rd-party APIs, and application database feeds, normalizing incoming data into the unified data platform.
- Applying HIPAA-compliant data handling practices, including PHI/PII masking, tokenization, audit logging, and role-based access controls across all pipeline and AI system work.
- Architecting and implementing RAG pipelines — including document ingestion, chunking, embedding generation, and retrieval — using frameworks such as LangChain or LangGraph
- Supporting MLOps workflows, including model training pipeline maintenance, deployment support, performance monitoring, and retraining triggers.
- Code reviewing PRs from teammates, providing constructive technical feedback to peers, and upholding the team's engineering standards.
- Collaborating closely with product managers to understand requirements and deliver reliable data and AI products.
- Monitoring and triaging assigned pipeline and data quality failures, escalating architectural issues as appropriate.
- Documenting pipeline designs, data models, and technical decisions in alignment with the team's governance and lineage tracking standards.
- Evaluating new tools and frameworks, providing hands-on prototyping and technical assessments.
Requirements
- 5+ years of hands-on experience in data engineering, analytics engineering, or a closely related role.
- 2+ years of experience working within the healthcare industry, including working knowledge of healthcare data standards, clinical workflows, regulated data environments, and domain-specific data visualizations.
- Working knowledge of HIPAA — including PHI/PII classification, data masking, audit logging, and access control requirements.
- Proven production experience with at least one major cloud data warehouse: BigQuery, Snowflake, or Redshift — including advanced SQL and query optimization.
- Strong hands-on experience with dbt (Core or Cloud), including incremental models, tests, documentation, and multi-environment workflows.
- Deep experience with Apache Airflow for workflow orchestration, including DAG design, scheduling, monitoring, and failure handling.
- Demonstrated knowledge of dimensional data modeling — star/snowflake schemas, SCD Types 1/2, fact and dimension table design.
- Hands-on experience delivering dashboards and reports in at least one enterprise BI tool: Looker, Power BI, Tableau, Qlik, etc.
- Proficiency in Python for data pipeline development, API integrations, and automation (Pandas, PySpark, or similar).
- Practical exposure to RAG pipeline development and LLM integration using LangChain, LangGraph, or LlamaIndex
- Hands-on exposure to MLOps concepts — model deployment, monitoring, and retraining workflows
- Knowledge of CI/CD tooling for data and AI workloads (GitHub Actions, dbt Cloud CI)
- Strong understanding of data quality and governance principles: lineage, access controls, data contracts, and automated testing and experience with data governance tools such as OpenMetadata
- Excellent written and verbal communication skills with the ability to collaborate effectively across engineering, analytics, and clinical teams
- Ability to work independently on assigned workstreams while keeping the Director and team informed of progress, blockers, and risks
Benefits
- Ground-Floor Equity (Series B)
- Free Medical, Dental, and Vision on the first of the month after you start full-time work
- Unlimited PTO
- 11 paid holidays and company shut-down for a week in December
- 401(k)
- Free Plume and BetterHelp Subscriptions
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
data engineeringanalytics engineeringcloud data warehouseSQLdbtApache Airflowdimensional data modelingPythonMLOpsRAG pipeline development
Soft Skills
communicationcollaborationindependencetechnical feedbackproblem-solvingdocumentationmonitoringtriagingevaluating new toolsstakeholder engagement