
Data Engineer
SMASH
full-time
Posted on:
Location Type: Remote
Location: United States
Visit company websiteExplore more
About the role
- You will design and operate scalable ETL and streaming pipelines that process contracts and invoices at high volume with strong data quality guarantees.
- This role focuses on building reliable data platforms that power analytics, ROI reporting, and compliance through robust governance, validation, and observability.
- Design and maintain ETL pipelines to ingest contracts and invoices from PDF, DOCX, CSV, Excel, and webhook sources.
- Build scalable workflows for historical data migrations (10K+ invoices per customer).
- Implement real-time streaming pipelines for event-driven integrations.
- Develop and manage an analytics data warehouse to support reporting, metrics, and trend analysis.
- Model customer-specific datasets for ROI, savings, and exception reporting.
- Implement data validation checks for completeness, accuracy, and consistency.
- Build data quality monitoring, alerting, and dead-letter queue handling.
- Implement PII/PHI detection, masking, and data retention policies (5-year audit trail).
- Track data lineage from source through transformation to consumption.
- Optimize SQL queries and data models for performance and scalability.
- Collaborate with product, engineering, and analytics teams to evolve data requirements.
Requirements
- Strong experience building and operating ETL pipelines in production environments.
- Proficiency in Python for data processing (pandas, numpy, pyspark).
- Advanced SQL skills with PostgreSQL, including data modeling and query optimization.
- Hands-on experience with workflow orchestration tools (Airflow, Prefect, or similar).
- Experience designing and operating data warehouses (Redshift, BigQuery, or Snowflake).
- Familiarity with streaming platforms such as Kafka or Kinesis.
- Experience implementing data quality frameworks (Great Expectations or similar).
- Strong understanding of data validation, error handling, and monitoring best practices.
- Ability to design scalable systems handling large datasets and schema complexity.
Benefits
- We believe in long-lasting relationships with our talent.
- We invest time getting to know them and understanding what they seek as their professional next step.
- We aim to find the perfect match.
- As agents, we pair our talent with our US clients, not only by their technical skills but as a cultural fit.
- Our core competency is to find the right talent fast.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard skills
ETL pipelinesdata processingPythonpandasnumpypysparkSQLPostgreSQLdata modelingquery optimization
Soft skills
collaborationproblem-solvingattention to detailanalytical thinking