AI Engineer – Product

Mistral AI

full-time

Posted on: 12/30/2025

Location Type: Hybrid

Location: Paris • France

✨ AI Apply

About the role

Build and maintain an LLM evaluation framework (reference tests, heuristics, model-graded checks).
Define and track metrics: task success, helpfulness, hallucination proxies, safety flags, latency/cost.
Run A/B tests for prompts, models, and system prompts, analyze results, recommend rollout or rollback.
Set up observability for LLM calls: structured logging, tracing, dashboards, alerts.
Operate the model release: canary and shadow traffic, sign-offs, SLO-based rollback criteria, regression detection.
Improve core behaviors: memory write/retrieve policies and evals, intent classification, follow-ups, routing, tool-call reliability.
Create templates and docs so other teams can author evals and ship safely.
Partner with Science, diagnose regressions and lead post-mortems.

Strong TypeScript or Python skills
Production LLM experience: prompts, tool/function calling, and system prompts.
Hands-on with evals and A/B testing, you can design metrics and make rollout decisions from data.
Observability: logging, tracing, dashboards, alerting
Product mindset: form hypotheses, run experiments, interpret results, iterate.
Clear written and spoken communication, autonomous; and product-oriented.
Now it would be ideal if you have experience with
Safety systems: moderation, PII handling/redaction, guardrails.
Release operations: canary/shadowing, automated rollbacks, experiment platforms.

Benefits

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

TypeScriptPythonLLM evaluation frameworkA/B testingmetrics designobservabilityloggingtracingdashboardssafety systems

Soft Skills

clear communicationautonomousproduct-orientedhypothesis formationexperiment iteration