
Data Engineer
INVID
contract
Posted on:
Location Type: Hybrid
Location: San Juan • United States
Visit company websiteExplore more
About the role
- Build labeling pipelines that join behavioral events to outcome data (sanctions designations, flag changes, detentions)
- Implement proxy labeling strategies that create training signal from observable outcomes
- Build weak supervision infrastructure to combine multiple noisy labeling rules
- Create and maintain ML training datasets at scale
- Build data validation and quality monitoring systems
- Implement versioning for reproducible model training
- Integrate LRIT position data for prediction validation
- Build pipelines that compare predicted locations against actual LRIT reports
- Create feedback loops that improve model accuracy over time
- Scale data infrastructure as models and data sources grow
Requirements
- 4+ years data engineering experience
- Strong SQL skills, including complex joins across large datasets
- Experience with Spark, Airflow, or equivalent distributed processing frameworks
- Python for data processing and pipeline orchestration
- AWS experience
- Understanding of ML training data requirements
- Bachelor's Degree in Computer Science, Engineering, or related field
Benefits
- Collaborative and flexible work environment
- Professional development opportunities
- Access to industry events
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
SQLSparkAirflowPythonML training data requirementsdata validationquality monitoringversioningdata processingpipeline orchestration