Machinify, Inc.

Senior Data Engineer – Analytics

Machinify, Inc.

full-time

Posted on:

Location Type: Remote

Location: CaliforniaUnited States

Visit company website

Explore more

AI Apply
Apply

Job Level

About the role

  • Design and implement robust, production-grade pipelines using Python, Spark SQL, and Airflow to process high-volume file-based datasets (CSV, Parquet, JSON).
  • Lead efforts to canonicalize raw healthcare data (837 claims, EHR, partner data, flat files) into internal models.
  • Own the full lifecycle of core pipelines — from file ingestion to validated, queryable datasets — ensuring high reliability and performance.
  • Onboard new customers by integrating their raw data into internal pipelines and canonical models; collaborate with SMEs, Account Managers, and Product to ensure successful implementation and troubleshooting.
  • Build resilient, idempotent transformation logic with data quality checks, validation layers, and observability.
  • Refactor and scale existing pipelines to meet growing data and business needs.
  • Tune Spark jobs and optimize distributed processing performance.
  • Implement schema enforcement and versioning aligned with internal data standards.
  • Collaborate deeply with Data Analysts, Data Scientists, Product Managers, Engineering, Platform, SMEs, and AMs to ensure pipelines meet evolving business needs.
  • Monitor pipeline health, participate in on-call rotations, and proactively debug and resolve production data flow issues.
  • Contribute to the evolution of our data platform — driving toward mature patterns in observability, testing, and automation.
  • Build and enhance streaming pipelines (Kafka, SQS, or similar) where needed to support near-real-time data needs.
  • Help develop and champion internal best practices around pipeline development and data modeling.

Requirements

  • 4+ years of experience as a Data Engineer (or equivalent), building production-grade pipelines.
  • Strong expertise in Python, Spark SQL, and Airflow.
  • Experience processing large-scale file-based datasets (CSV, Parquet, JSON, etc) in production environments.
  • Experience mapping and standardizing raw external data into canonical models.
  • Familiarity with AWS (or any cloud), including file storage and distributed compute concepts.
  • Experience onboarding new customers and integrating external customer data with non-standard formats.
  • Ability to work across teams, manage priorities, and own complex data workflows with minimal supervision.
  • Strong written and verbal communication skills — able to explain technical concepts to non-engineering partners.
  • Comfortable designing pipelines from scratch and improving existing pipelines.
  • Experience working with large-scale or messy datasets (healthcare, financial, logs, etc).
  • Experience building or willingness to learn streaming pipelines using tools such as Kafka or SQS.
  • Bonus: Familiarity with healthcare data (837, 835, EHR, UB04, claims normalization).
Benefits
  • Real impact — your pipelines will directly support decision-making and claims payment outcomes from day one.
  • High visibility — partner with ML, Product, Analytics, Platform, Operations, and Customer teams on critical data initiatives.
  • Total ownership — you’ll drive the lifecycle of core datasets powering our platform.
  • Customer-facing impact — you will directly contribute to successful customer onboarding and data integration.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
PythonSpark SQLAirflowdata pipeline developmentdata transformationdata quality checksschema enforcementversioningstreaming pipelinesdata modeling
Soft skills
communication skillscollaborationproblem-solvingprioritizationindependenceleadershipcustomer onboardingtroubleshootingadaptabilityteamwork