
Data Engineer
ARETUM
full-time
Posted on:
Location Type: Remote
Location: Virginia • United States
Visit company websiteExplore more
About the role
- Ingest data from FHIR APIs, CDW, and other VA sources
- Normalize and reconcile medication and patient data
- Build transformation pipelines for risk scoring inputs
- Support batch and near-real-time processing
- Ensure data quality, consistency, and traceability
Requirements
- Programming: Python (primary), SQL (advanced), optional Scala
- Data Processing Frameworks: Apache Spark, AWS EMR, Databricks (preferred)
- ETL/ELT Design: Pipeline orchestration, incremental vs full loads, data validation
- API Integration: REST APIs, JSON parsing, pagination, authentication (OAuth2)
- FHIR Data Handling: Patient, MedicationRequest, Observation, etc.
- Data Modeling: Relational and semi-structured schema design
- Data Quality & Validation: Deduplication, reconciliation logic, anomaly detection
- Streaming vs Batch Processing: Understanding tradeoffs and implementation patterns
- Storage Technologies: S3, relational DBs, NoSQL basics
- Performance Optimization: Partitioning, parallelization, query tuning
- Versioning & Lineage: Data version control, reproducibility of datasets
Benefits
- Health Care Plan (Medical, Dental & Vision)
- Retirement Plan (401k)
- Life Insurance (Basic, Voluntary & AD&D)
- Paid Time Off
- Family Leave (Maternity, Paternity)
- Short Term & Long-Term Disability
- Training & Development
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonSQLScalaApache SparkAWS EMRDatabricksETLAPI IntegrationData ModelingPerformance Optimization
Soft Skills
data qualitydata consistencydata traceabilityanomaly detectionreconciliation logic