
Founding Applied AI Engineer – Eval-Driven
Airtree
full-time
Posted on:
Location Type: Hybrid
Location: Sydney • Australia
Visit company websiteExplore more
Tech Stack
About the role
- Define evaluation problems: success criteria, failure modes, datasets, labelling guidelines, and score functions.
- Build and maintain an evaluation harness: regression tests, edge-case suites, and quality dashboards to prevent backsliding.
- Implement workflow systems end-to-end (data → model/LLM components → post-processing → acceptance testing) until they pass eval thresholds.
- Partner with product and domain stakeholders to translate messy real-world requirements into testable specs.
Requirements
- Strong Python skills and practical experience shipping ML/AI systems (not just experimentation).
- Demonstrated experience designing evals for ML/LLM systems (offline metrics, gold sets, error analysis, regression testing, monitoring).
- Comfort working across data science + engineering tasks: data wrangling, feature/label design, model/LLM iteration, and productionization.
- High ownership and intensity: persistence in closing the loop from “fails eval” to “passes consistently.”
- Nice to have
- Experience with document understanding (OCR, parsing, classification/extraction) and structured outputs (schemas, validators).
- Familiarity with AEC/construction workflows (design coordination, QA/compliance, BIM concepts like IFC/Revit).
- Experience building human-in-the-loop review systems and adjudication processes to improve training/eval data.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
PythonML systemsAI systemsevaluation designoffline metricserror analysisregression testingdata wranglingfeature designproductionization
Soft Skills
high ownershipintensitypersistencecollaboration