Mistral AI

AI Engineer – Product

Mistral AI

full-time

Posted on:

Location Type: Hybrid

Location: Paris • 🇫🇷 France

Visit company website
AI Apply
Apply

Job Level

Mid-LevelSenior

Tech Stack

PythonTypeScript

About the role

  • Build and maintain an LLM evaluation framework (reference tests, heuristics, model-graded checks).
  • Define and track metrics: task success, helpfulness, hallucination proxies, safety flags, latency/cost.
  • Run A/B tests for prompts, models, and system prompts, analyze results, recommend rollout or rollback.
  • Set up observability for LLM calls: structured logging, tracing, dashboards, alerts.
  • Operate the model release: canary and shadow traffic, sign-offs, SLO-based rollback criteria, regression detection.
  • Improve core behaviors: memory write/retrieve policies and evals, intent classification, follow-ups, routing, tool-call reliability.
  • Create templates and docs so other teams can author evals and ship safely.
  • Partner with Science, diagnose regressions and lead post-mortems.

Requirements

  • Strong TypeScript or Python skills
  • Production LLM experience: prompts, tool/function calling, and system prompts.
  • Hands-on with evals and A/B testing, you can design metrics and make rollout decisions from data.
  • Observability: logging, tracing, dashboards, alerting
  • Product mindset: form hypotheses, run experiments, interpret results, iterate.
  • Clear written and spoken communication, autonomous; and product-oriented.
  • Now it would be ideal if you have experience with
  • Safety systems: moderation, PII handling/redaction, guardrails.
  • Release operations: canary/shadowing, automated rollbacks, experiment platforms.
Benefits
  • 💰 Competitive salary and equity
  • 🧑‍⚕️ Health insurance
  • 🚴 Transportation allowance
  • 🥎 Sport allowance
  • 🥕 Meal vouchers
  • 💰 Private pension plan
  • 🍼 Parental : Generous parental leave policy
  • 🌎 Visa sponsorship

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard skills
TypeScriptPythonLLM evaluation frameworkA/B testingmetrics designobservabilityloggingtracingdashboardssafety systems
Soft skills
clear communicationautonomousproduct-orientedhypothesis formationexperiment iteration