FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.

AI Engineer – LLM Ops & Evaluation
Auxilius.aiAI Engineer handling LLMOps pipeline and evaluation strategy for AI solutions in Governance, Risk and Compliance. Collaborating with a startup team and shaping AI operations.
Tech Stack
Tools & technologiesPython
About the role
Key responsibilities & impact- Own the LLMOps pipeline: Evaluate infrastructure, prompt optimization loop, and the production integration that turns experiments into reliable customer-facing features
- Design evaluation strategy per output type: Decide when to use deterministic evals (exact match, schema validation, embeddings) vs. LLM-as-judge, and build the rubrics, test datasets, and human-review loops that make the system trustworthy
- Drive prompt engineering and optimization across all LLM operations in the product: Moving from hand-tuned prompts to a measurable, iterative process
- Pick the right tool for each problem: Some things are LLM problems, some are embedding + classical NLP problems, some are deterministic logic
- Run the production side of AI features: Observability (Langfuse /LangSmith / similar), cost and latency engineering, incident response when an LLM feature degrades
- Build human-in-the-loop workflows: Review queues, feedback ingestion, labeling; so production signal feeds back into evals and prompt iteration
- Mentor our AI & Analytics Intern and contribute to how we build the AI team over time
Requirements
What you’ll need- 3+ years of hands-on experience building and shipping ML/AI systems in production (we care more about what you've shipped than years on a CV)
- Have shipped an LLM evaluation or prompt optimization pipeline, not just used LLMs in a project, but owned the loop
- Strong hands-on experience with LLM-as-judge, including its variance problems and concrete techniques for controlling them
- Solid foundation in classical NLP and ML ops: Embeddings, semantic similarity, entity matching, classification, fuzzy matching
- Informed opinions on deterministic vs. LLM-based evals, from experience
- Production judgment: You've owned cost and latency tradeoffs, observability, and incident response for an LLM-powered feature. You're familiar with prompt regression and have strategies for managing it
- Strong Python
- Excellent English communication, written and verbal: We discuss nuanced technical tradeoffs daily with the founding team and customers
- Comfort with ambiguity: You can run experiments on real data, build intuition for this domain, and know when to stop iterating
Benefits
Comp & perks- Hands-on ownership of a real AI product used by enterprise customers
- Work directly alongside the founding team from day one
- Hybrid work model: Munich North, minimum one day per week in the office, otherwise flexible (open to strong candidates elsewhere in the EU for the right fit); onboarding will take in-office
- A steep learning curve at the intersection of LLM engineering, enterprise GRC, and startup operations
- The chance to shape the AI team as we grow
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
LLMOpsprompt engineeringML/AI systemsclassical NLPembeddingssemantic similarityentity matchingclassificationfuzzy matchingPython
Soft Skills
communicationmentoringcomfort with ambiguity