INVID

Data Engineer

INVID

contract

Posted on:

Location Type: Hybrid

Location: San JuanUnited States

Visit company website

Explore more

AI Apply
Apply

About the role

  • Build labeling pipelines that join behavioral events to outcome data (sanctions designations, flag changes, detentions)
  • Implement proxy labeling strategies that create training signal from observable outcomes
  • Build weak supervision infrastructure to combine multiple noisy labeling rules
  • Create and maintain ML training datasets at scale
  • Build data validation and quality monitoring systems
  • Implement versioning for reproducible model training
  • Integrate LRIT position data for prediction validation
  • Build pipelines that compare predicted locations against actual LRIT reports
  • Create feedback loops that improve model accuracy over time
  • Scale data infrastructure as models and data sources grow

Requirements

  • 4+ years data engineering experience
  • Strong SQL skills, including complex joins across large datasets
  • Experience with Spark, Airflow, or equivalent distributed processing frameworks
  • Python for data processing and pipeline orchestration
  • AWS experience
  • Understanding of ML training data requirements
  • Bachelor's Degree in Computer Science, Engineering, or related field
Benefits
  • Collaborative and flexible work environment
  • Professional development opportunities
  • Access to industry events
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
SQLSparkAirflowPythonML training data requirementsdata validationquality monitoringversioningdata processingpipeline orchestration