Data Scientist, AI Data Foundations

MeridianLink

Data Scientist building and managing AI data foundations for model training and AI applications at MeridianLink. Engaging in data discovery and optimizing data structures for enhanced AI utility.

Posted 5/6/2026full-timeRemote • 🇺🇸 United StatesMid-LevelSenior💰 $114,593 - $195,400 per yearWebsite

Tech Stack

Tools & technologies

AzureNeo4jNumpyPandasPySparkPythonScikit-LearnSQL

About the role

Key responsibilities & impact

Build and maintain vector stores for RAG: Design embedding pipelines, chunking strategies, indexing approaches, and refresh patterns for the vector stores powering retrieval-augmented generation across MeridianLink products.
Own the feature store: Design, build, and operate feature store assets used for model training and online/offline inference, including feature definitions, freshness SLAs, lineage, point-in-time correctness, and reuse across teams.
Design graph data structures: Build graph databases that model relationships between applicants, applications, products, lenders, decisions, and outcomes — and make them queryable for both AI use cases and analytical investigations.
Lead data discovery: Profile our lending, deposit, and behavioral datasets to identify hidden trends, segments, anomalies, and potential model drivers; turn findings into actionable hypotheses for product, risk, and growth teams.
Engineer for AI consumption: Build the curated, AI-ready datasets that downstream model builders, application engineers, and analysts rely on — with appropriate quality, documentation, and governance baked in.
Evaluate retrieval and feature quality: Define and run evaluation frameworks for RAG retrieval quality, feature drift, embedding quality, and graph completeness; iterate based on what the metrics tell you.
Partner with model builders: Work closely with ML engineers and applied scientists to make sure the data structures you build accelerate their work rather than slow it down.
Champion responsible data use: Partner with governance, security, and compliance to ensure that AI-facing data assets respect data classification, customer consent, and regulatory boundaries from day one.
Communicate findings: Translate discovery work into clear narratives — write-ups, notebooks, dashboards, and short presentations — that help non-technical stakeholders act on what the data is showing.

Requirements

What you’ll need

4–7 years of experience in a data science, ML engineering, or applied data role, with a meaningful portion of that time spent building data assets that other people's models or applications consumed.
Hands-on experience designing and operating vector stores for RAG or semantic search, including embedding generation, chunking, indexing, and retrieval evaluation.
Experience building or operating a feature store (e.g., Databricks Feature Store, Feast, or a custom internal platform), including offline training and online serving patterns and point-in-time correctness.
Experience modeling and building graph data structures using Neo4j, TigerGraph, Azure Cosmos DB Gremlin, or similar graph databases — and writing graph queries to answer real questions.
Strong proficiency in Python (pandas, NumPy, scikit-learn, PySpark) and SQL; comfortable working day-to-day in Databricks notebooks and jobs.
Practical experience with embedding models and LLM tooling (e.g., Hugging Face transformers, OpenAI / Azure OpenAI APIs, LangChain or similar) in a production or near-production context.
Demonstrated data discovery skills: profiling messy real-world datasets, surfacing non-obvious patterns, validating findings statistically, and explaining them clearly.
Solid grounding in classical ML concepts — supervised vs. unsupervised learning, train/test discipline, leakage, evaluation metrics — even though you will not own model training day-to-day.
Strong written and verbal communication skills; able to write up findings for both technical and business audiences.

Benefits

Comp & perks

Insurance coverage (medical, dental, vision, life, and disability)
Flexible paid time off
Paid holidays
401(k) plan with company match
Remote work

ATS Keywords

✓ Tailor your resume

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

vector storesembedding generationchunkingindexingretrieval evaluationfeature storegraph data structuresgraph queriesPythonSQL

Soft Skills

data discoverycommunicationproblem-solvingcollaborationanalytical thinking