Clarivate

Senior Data Scientist, NLP

Clarivate

full-time

Posted on:

Location Type: Remote

Location: United States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $117,000 - $147,000 per year

Job Level

About the role

  • Develop scalable pipelines for text ingestion, cleaning, normalization, and tokenization to support downstream applications.
  • Architect and maintain robust indexing systems and vector databases for semantic search and retrieval.
  • Create reusable prompting strategies and lead fine-tuning initiatives for LLMs tailored to business-specific tasks.
  • Construct dynamic knowledge systems and agentic workflows using LangChain and LangGraph.
  • Apply VRAG and GraphRAG design patterns to enrich information retrieval and contextual understanding.
  • Perform benchmark testing and model evaluations to improve accuracy, efficiency, and scalability of NLP systems.
  • Work closely with engineering, product, and research stakeholders to deliver integrated AI-driven features.
  • Mentor junior data scientists, guide best practices, and drive innovation across AI projects.

Requirements

  • Bachelor’s degree in Computer Science, Data Science, Computational Linguistics, or a related field
  • At least 5 years of hands-on experience in data science, focused on natural language processing (NLP)
  • At least 5 years of experience using Python, with expertise in NLP libraries such as LangChain, LangGraph, or other “Lang”-based toolkits
  • Proven experience in model development and applying machine learning techniques to real-world problems
  • Expertise in retrieval-based LLM workflows (RAG, VRAG, GraphRAG) (preferred)
  • Deep understanding of embedding models, semantic search, and vector stores (e.g., FAISS, Pinecone) (preferred)
  • Experience with document loaders and text splitters/document splitting strategies (preferred)
  • Familiarity with MLOps practices and production-level deployment of AI pipelines (preferred)
  • Experience with cloud platforms (e.g., AWS, Azure, or GCP) (preferred)
  • Experience applying Graph Neural Networks (GNNs) to retrieval-enhanced generation (preferred)
  • Knowledge of LangSmith and vector orchestration platforms (preferred)
  • Familiarity with multilingual NLP and cross-lingual embeddings (preferred)
  • Exposure to real-time knowledge graphs and stream-based RAG systems (preferred)
  • A Master’s or PhD in a technical field (Computer Science, Data Science, etc.) (preferred)
Benefits
  • medical
  • dental
  • prescription drug
  • life insurance
  • 401k with match
  • long term disability coverage
  • vacation
  • sick time
  • volunteer time
  • discount programs
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Pythonnatural language processing (NLP)LangChainLangGraphmachine learningembedding modelssemantic searchvector storesGraph Neural Networks (GNNs)MLOps
Soft Skills
mentoringguiding best practicesinnovation
Certifications
Bachelor’s degree in Computer ScienceBachelor’s degree in Data ScienceBachelor’s degree in Computational LinguisticsMaster’s degree in a technical fieldPhD in a technical field