Red Hat

Principal ML Engineer

Red Hat

full-time

Posted on:

Origin:  • 🇺🇸 United States • Massachusetts, North Carolina

Visit company website
AI Apply
Apply

Salary

💰 $189,600 - $312,730 per year

Job Level

Lead

Tech Stack

PythonPyTorch

About the role

  • Architect and lead development of large-scale evaluation platforms for LLMs and agents, enabling automated, reproducible, and extensible assessment
  • Define organizational standards and metrics for LLM/agent evaluation covering hallucination detection, factuality, bias, robustness, interpretability, and alignment drift
  • Build platform components and APIs to allow product teams to integrate evaluation into training, fine-tuning, deployment, and continuous monitoring workflows
  • Design automated pipelines and benchmarks for adversarial testing, red-teaming, and stress testing of LLMs and RAG systems
  • Lead initiatives in multi-dimensional evaluation including safety, grounding, and agent behaviors
  • Collaborate with cross-functional stakeholders to translate evaluation goals into measurable system-level frameworks
  • Advance interpretability and observability tooling to understand, debug, and explain LLM behaviors in production
  • Mentor engineers and establish best practices driving adoption of evaluation-driven development
  • Influence technical roadmaps and represent the team’s evaluation-first approach in external forums and publications

Requirements

  • 10+ years of ML engineering experience
  • 3+ years focused on large-scale evaluation of transformer-based LLMs and/or agentic systems
  • Proven experience building evaluation platforms or frameworks that operate across training, deployment, and post-deployment contexts
  • Deep expertise in designing and implementing LLM evaluation metrics (factuality, hallucination detection, grounding, toxicity, robustness)
  • Strong background in scalable platform engineering, including APIs, pipelines, and integrations used by multiple product teams
  • Demonstrated ability to bridge research and engineering, operationalizing safety and alignment techniques into production evaluation systems
  • Proficiency in Python, PyTorch, Hugging Face, and modern ML ops/deployment environments
  • Track record of technical leadership, including mentoring, architecture design, and defining org-wide practices
  • Experience with multi-agent evaluation frameworks and graph-based metrics for agent interactions (preferred)
  • Background in retrieval-augmented generation (RAG) evaluation (retrieval precision/recall, grounding, attribution) (preferred)
  • Contributions to AI safety or evaluation research in industry or academia (preferred)
  • Familiarity with adversarial testing methodologies and automated red-teaming (preferred)
  • Knowledge of interpretability and transparency methods for LLMs (preferred)
  • Advanced degree in ML/CS or related field with focus on evaluation, safety, or interpretability (preferred)
Loopio

Staff Applied Scientist

Loopio
Leadfull-time🇨🇦 Canada
Posted: 7 days agoSource: jobs.ashbyhq.com
Distributed SystemsMicroservicesPythonPyTorchRaySparkTensorflow
Hotel Engine

Staff Data Scientist, Search and Personalization

Hotel Engine
Leadfull-time$210k–$245k / year🇺🇸 United States
Posted: 10 days agoSource: boards.greenhouse.io
PythonPyTorchSparkTensorflow
Pierce Professional Resources

AI Engineer

Pierce Professional Resources
Mid · Seniorfull-time🇺🇸 United States
Posted: 10 days agoSource: apply.workable.com
PythonPyTorchRayTensorflow
Handshake

AI Research Scientist, Evaluation

Handshake
Mid · Seniorfull-time$200k–$375k / yearCalifornia, New York · 🇺🇸 United States
Posted: 20 days agoSource: jobs.ashbyhq.com
PythonPyTorch
Snowflake

AI Research Scientist – Reinforcement Learning

Snowflake
Mid · Seniorfull-time$195k–$250k / yearCalifornia · 🇺🇸 United States
Posted: 13 days agoSource: jobs.ashbyhq.com
PythonPyTorchTensorflow