Airtree

Founding Applied AI Engineer – Eval-Driven

Airtree

full-time

Posted on:

Location Type: Hybrid

Location: SydneyAustralia

Visit company website

Explore more

AI Apply
Apply

Tech Stack

About the role

  • Define evaluation problems: success criteria, failure modes, datasets, labelling guidelines, and score functions.
  • Build and maintain an evaluation harness: regression tests, edge-case suites, and quality dashboards to prevent backsliding.
  • Implement workflow systems end-to-end (data → model/LLM components → post-processing → acceptance testing) until they pass eval thresholds.
  • Partner with product and domain stakeholders to translate messy real-world requirements into testable specs.

Requirements

  • Strong Python skills and practical experience shipping ML/AI systems (not just experimentation).
  • Demonstrated experience designing evals for ML/LLM systems (offline metrics, gold sets, error analysis, regression testing, monitoring).
  • Comfort working across data science + engineering tasks: data wrangling, feature/label design, model/LLM iteration, and productionization.
  • High ownership and intensity: persistence in closing the loop from “fails eval” to “passes consistently.”
  • Nice to have
  • Experience with document understanding (OCR, parsing, classification/extraction) and structured outputs (schemas, validators).
  • Familiarity with AEC/construction workflows (design coordination, QA/compliance, BIM concepts like IFC/Revit).
  • Experience building human-in-the-loop review systems and adjudication processes to improve training/eval data.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
PythonML systemsAI systemsevaluation designoffline metricserror analysisregression testingdata wranglingfeature designproductionization
Soft Skills
high ownershipintensitypersistencecollaboration