Architect and scale the infrastructure that powers AI quality, reliability, and reuse across Lattice.
Design and scale an end-to-end AI evaluation framework spanning offline evals, production tracing, and human feedback loops.
Define meaningful performance metrics (task completion, hallucination, response quality, engagement, business impact) and build the datasets and automated scoring systems that prevent regressions.
Identify and quantify the drivers of agent quality improvement and set methodological standards for evaluation across the organization.
Architect reusable agent infrastructure (multi-turn workflows, LLM DAGs, recommendation systems, standardized topologies) using LangGraph or comparable frameworks.
Build and scale RAG pipelines, vector retrieval systems, and production-grade AI infrastructure with strong reliability, observability, and performance.
Make principled build-vs-buy decisions across LLM providers, agent frameworks, and evaluation tooling, balancing capability, cost, latency, and risk.
Engineer AI systems as reusable internal platforms that multiply product engineering velocity at Lattice.
Own projects end-to-end: scope, design, execution, and delivery.
Set technical direction for agent quality and evaluation strategy across Lattice engineering teams.
Lead rigorous discussions on AI system design and evaluation methodology.
Raise the AI engineering bar through mentorship, code review, and clear technical communication across engineering and leadership.

Requirements

8+ years of professional experience writing and maintaining production-level code, with 5+ years in designing, delivering, and operating AI/ML systems in production.
Deep production experience with LLM systems (prompting, RAG, agent orchestration, evaluation frameworks, fine-tuning).
Experience building and operating agentic systems (multi-step workflows, multi-agent topologies) and managing their failure modes.
Strong command of AI evaluation methodology and statistical experimentation.
Strong system design judgment across scalability, latency, accuracy, reliability, and cost.
Production-grade Python (clean, maintainable, testable systems).
Experience with LangGraph (or comparable agent orchestration frameworks) and LLM observability/evaluation tooling (e.g., LangSmith).
Vector databases and retrieval system design (Pinecone or similar).
Experience operating AI systems in AWS or comparable cloud environments, including CI/CD, monitoring, and deployment workflows.
Familiarity with TypeScript is a plus.
Actively engaged in AI research and industry trends.
Nice to Have
Experience with RLHF, LoRA, or other model adaptation techniques.
Background in traditional ML and judgment in selecting ML vs. LLM approaches.
Experience with MLOps tooling (MLflow, DataDog).
Published research, talks, or open-source contributions in AI/ML.
Experience in HR tech or other trust-sensitive domains.

Benefits

Medical insurance
Dental insurance
Vision insurance
Life, AD&D, and Disability Insurance
Emergency Weather Support
Wellness Apps
Paid Parental Leave
Paid Time off inclusive of holidays and sick time
Commuter & Parking Accounts
Lunches in the Office
Internet and Phone Stipend
401(k) retirement plan
Financial Planning
Learning & Development Budget

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

AI/ML systemsLLM systemsPythonLangGraphRAG pipelinesvector retrieval systemsagent orchestrationAI evaluation methodologystatistical experimentationMLOps

Soft Skills

technical communicationmentorshipproject ownershipsystem design judgmentleadershipdiscussion facilitationcollaboration