Founding Applied AI Engineer – Eval-Driven

Airtree

full-time

Posted on: 1/19/2026

Location Type: Hybrid

Location: Sydney • Australia

✨ AI Apply

About the role

Define evaluation problems: success criteria, failure modes, datasets, labelling guidelines, and score functions.
Build and maintain an evaluation harness: regression tests, edge-case suites, and quality dashboards to prevent backsliding.
Implement workflow systems end-to-end (data → model/LLM components → post-processing → acceptance testing) until they pass eval thresholds.
Partner with product and domain stakeholders to translate messy real-world requirements into testable specs.

Strong Python skills and practical experience shipping ML/AI systems (not just experimentation).
Demonstrated experience designing evals for ML/LLM systems (offline metrics, gold sets, error analysis, regression testing, monitoring).
Comfort working across data science + engineering tasks: data wrangling, feature/label design, model/LLM iteration, and productionization.
High ownership and intensity: persistence in closing the loop from “fails eval” to “passes consistently.”
Nice to have
Experience with document understanding (OCR, parsing, classification/extraction) and structured outputs (schemas, validators).
Familiarity with AEC/construction workflows (design coordination, QA/compliance, BIM concepts like IFC/Revit).
Experience building human-in-the-loop review systems and adjudication processes to improve training/eval data.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

PythonML systemsAI systemsevaluation designoffline metricserror analysisregression testingdata wranglingfeature designproductionization

Soft Skills

high ownershipintensitypersistencecollaboration