
Senior Data Engineer – Analytics
Machinify, Inc.
full-time
Posted on:
Location Type: Remote
Location: California • United States
Visit company websiteExplore more
Job Level
About the role
- Design and implement robust, production-grade pipelines using Python, Spark SQL, and Airflow to process high-volume file-based datasets (CSV, Parquet, JSON).
- Lead efforts to canonicalize raw healthcare data (837 claims, EHR, partner data, flat files) into internal models.
- Own the full lifecycle of core pipelines — from file ingestion to validated, queryable datasets — ensuring high reliability and performance.
- Onboard new customers by integrating their raw data into internal pipelines and canonical models; collaborate with SMEs, Account Managers, and Product to ensure successful implementation and troubleshooting.
- Build resilient, idempotent transformation logic with data quality checks, validation layers, and observability.
- Refactor and scale existing pipelines to meet growing data and business needs.
- Tune Spark jobs and optimize distributed processing performance.
- Implement schema enforcement and versioning aligned with internal data standards.
- Collaborate deeply with Data Analysts, Data Scientists, Product Managers, Engineering, Platform, SMEs, and AMs to ensure pipelines meet evolving business needs.
- Monitor pipeline health, participate in on-call rotations, and proactively debug and resolve production data flow issues.
- Contribute to the evolution of our data platform — driving toward mature patterns in observability, testing, and automation.
- Build and enhance streaming pipelines (Kafka, SQS, or similar) where needed to support near-real-time data needs.
- Help develop and champion internal best practices around pipeline development and data modeling.
Requirements
- 4+ years of experience as a Data Engineer (or equivalent), building production-grade pipelines.
- Strong expertise in Python, Spark SQL, and Airflow.
- Experience processing large-scale file-based datasets (CSV, Parquet, JSON, etc) in production environments.
- Experience mapping and standardizing raw external data into canonical models.
- Familiarity with AWS (or any cloud), including file storage and distributed compute concepts.
- Experience onboarding new customers and integrating external customer data with non-standard formats.
- Ability to work across teams, manage priorities, and own complex data workflows with minimal supervision.
- Strong written and verbal communication skills — able to explain technical concepts to non-engineering partners.
- Comfortable designing pipelines from scratch and improving existing pipelines.
- Experience working with large-scale or messy datasets (healthcare, financial, logs, etc).
- Experience building or willingness to learn streaming pipelines using tools such as Kafka or SQS.
- Bonus: Familiarity with healthcare data (837, 835, EHR, UB04, claims normalization).
Benefits
- Real impact — your pipelines will directly support decision-making and claims payment outcomes from day one.
- High visibility — partner with ML, Product, Analytics, Platform, Operations, and Customer teams on critical data initiatives.
- Total ownership — you’ll drive the lifecycle of core datasets powering our platform.
- Customer-facing impact — you will directly contribute to successful customer onboarding and data integration.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
PythonSpark SQLAirflowdata pipeline developmentdata transformationdata quality checksschema enforcementversioningstreaming pipelinesdata modeling
Soft skills
communication skillscollaborationproblem-solvingprioritizationindependenceleadershipcustomer onboardingtroubleshootingadaptabilityteamwork