SMASH

Data Engineer

SMASH

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

About the role

  • You will design and operate scalable ETL and streaming pipelines that process contracts and invoices at high volume with strong data quality guarantees.
  • This role focuses on building reliable data platforms that power analytics, ROI reporting, and compliance through robust governance, validation, and observability.
  • Design and maintain ETL pipelines to ingest contracts and invoices from PDF, DOCX, CSV, Excel, and webhook sources.
  • Build scalable workflows for historical data migrations (10K+ invoices per customer).
  • Implement real-time streaming pipelines for event-driven integrations.
  • Develop and manage an analytics data warehouse to support reporting, metrics, and trend analysis.
  • Model customer-specific datasets for ROI, savings, and exception reporting.
  • Implement data validation checks for completeness, accuracy, and consistency.
  • Build data quality monitoring, alerting, and dead-letter queue handling.
  • Implement PII/PHI detection, masking, and data retention policies (5-year audit trail).
  • Track data lineage from source through transformation to consumption.
  • Optimize SQL queries and data models for performance and scalability.
  • Collaborate with product, engineering, and analytics teams to evolve data requirements.

Requirements

  • Strong experience building and operating ETL pipelines in production environments.
  • Proficiency in Python for data processing (pandas, numpy, pyspark).
  • Advanced SQL skills with PostgreSQL, including data modeling and query optimization.
  • Hands-on experience with workflow orchestration tools (Airflow, Prefect, or similar).
  • Experience designing and operating data warehouses (Redshift, BigQuery, or Snowflake).
  • Familiarity with streaming platforms such as Kafka or Kinesis.
  • Experience implementing data quality frameworks (Great Expectations or similar).
  • Strong understanding of data validation, error handling, and monitoring best practices.
  • Ability to design scalable systems handling large datasets and schema complexity.
Benefits
  • We believe in long-lasting relationships with our talent.
  • We invest time getting to know them and understanding what they seek as their professional next step.
  • We aim to find the perfect match.
  • As agents, we pair our talent with our US clients, not only by their technical skills but as a cultural fit.
  • Our core competency is to find the right talent fast.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
ETL pipelinesdata processingPythonpandasnumpypysparkSQLPostgreSQLdata modelingquery optimization
Soft skills
collaborationproblem-solvingattention to detailanalytical thinking