Data Platform Engineer

Trulioo

Data Platform Engineer at Trulioo designing and maintaining data systems for global digital identity verification needs. Collaborating with teams to optimize data pipelines and ensure system performance.

Posted 4/25/2026full-timeSan Diego • California • 🇺🇸 United StatesMid-LevelSenior💰 $120,000 - $150,000 per yearWebsite

Tech Stack

Tools & technologies

AirflowAWSAzureCloudDockerElasticSearchETLGoogle Cloud PlatformKafkaKubernetesNeo4jNoSQLPySparkPythonSparkSQL

About the role

Key responsibilities & impact

Build, optimize, and maintain data ingestion and transformation pipelines from multiple sources (internal systems, vendor data, web data, APIs)
Design and implement data models using the most suitable tool for the task — SQL, NoSQL, GraphDBs, or VectorDBs
Integrate machine learning models into pipelines for entity resolution, de-duplication, semantic enrichment, and embedding generation
Work with Vector Databases (e.g., AWS S3 Vector, PostgresVectorDb, OpenSearch) to support similarity and semantic search applications
Collaborate with data scientists, software engineers, and analysts to deliver reliable, high-performance data infrastructure
Ensure data quality, consistency, and performance monitoring across all pipelines and systems

Requirements

What you’ll need

5+ years of professional software development or data engineering experience
Strong programming skills in Python
Experience with data modeling and schema design in SQL and NoSQL systems
Experience designing and maintaining data pipelines (Airflow, Dagster, Prefect, or similar)
Proficiency with cloud-based data services (AWS, GCP, Azure)
Proficiency in multiple programming languages
Experience with entity resolution or record linkage algorithms
Experience incorporating ML workflows into ETL pipelines
Hands-on experience with Vector Databases and embedding-based search pipelines
Familiarity with graph databases (Neo4j, Neptune, or Gremlin) for ETL, modeling, and querying
Experience with OpenSearch / Elasticsearch, including index creation, tuning, and advanced queries
Familiarity with streaming data systems (Kafka, Kinesis) or distributed processing frameworks (Spark, Flink)
Knowledge of semantic search, RAG pipelines, or LLM-enhanced retrieval
Experience with containerization and orchestration (Docker, Kubernetes) and CI/CD pipelines
Background in information retrieval, knowledge graphs, or data platform architecture
Experience using data catalog/lineage tools (OpenMetadata, DataHub, etc.)
Strong experience with modern ETL tools for both large and small data processing (PySpark, Dask, DuckDB, etc.)

Benefits

Comp & perks

We provide a robust benefits package for full-time, permanent employees, including health, dental, and vision coverage
retirement plans with company match
paid time off
parental leave
an annual education & training stipend (equivalent to $1,000 in local currency)
offices designed to support both collaboration and flexibility
Enjoy weekly lunches, quality coffee, and regular social events.
wellness workshops and events
a complimentary Headspace subscription

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

PythonSQLNoSQLGraphDBsVectorDBsdata modelingdata pipelinesentity resolutionmachine learning workflowsETL

Soft Skills

collaborationcommunicationproblem-solvingattention to detailorganizational skills