FREE ACCESS
5,000–10,000 jobs/day

See all jobs on JobTailor
Search thousands of fresh jobs every day.
Discover
- Fresh listings
- Fast filters
- No subscription required
Create a free account and start exploring right away.
Tech Stack
Tools & technologiesAWSCloudGoogle Cloud PlatformPythonTypeScript
About the role
Key responsibilities & impact- Design and ship a robust, end-to-end AI evaluation framework, covering offline evals, production tracing, and human-in-the-loop feedback loops, connected across all of Lattice’s AI use cases.
- Define and instrument the metrics that actually matter: agent task completion, hallucination rates, response quality, user engagement, and downstream business outcomes.
- Build and maintain evaluation datasets, test harnesses, and automated scoring pipelines to catch regressions before they ship.
- Identify and surface the drivers of agent quality improvement, giving the team clear signals on where to invest.
- Architect and implement reusable agent infrastructure: multi-turn conversation workflows, recommendation services, LLM DAGs, and standardized agent topology patterns using LangGraph.
- Build and scale RAG pipelines and retrieval infrastructure, including vector store management and retrieval quality optimization.
- Make principled build vs. buy decisions across LLM providers, agent frameworks, and evaluation tooling, balancing capability, cost, latency, and vendor risk.
- Contribute to production AI systems with a strong focus on reliability, observability, and performance, not just prototypes.
- Own projects end-to-end: scope them, drive them to completion, and bring in the right people at the right time.
- Partner with engineering leads and managers to inform technical direction on agent quality and evaluation strategy you’ll be expected to hold intelligent, substantive conversations about methodology, not just implementation.
- Raise the AI engineering bar across the broader team through code review, documentation, and thoughtful technical debate.
Requirements
What you’ll need- 5+ years of professional software engineering experience with significant time spent on production AI/ML systems.
- Deep hands-on experience with LLM-based systems: prompt engineering, RAG pipelines, agent orchestration, evaluation metrics, and model fine-tuning.
- Proven ability to work with data and understand statistics, especially in experiments.
- Proven ability to build and operate agentic AI systems in production: multi-step workflows, multi-agent topologies, and the failure modes that come with them.
- Strong command of AI evaluation: you’ve built eval frameworks before, you know the difference between a good eval and a vanity metric, and you have opinions about it.
- Production-grade Python engineering: clean, maintainable, testable code.
- LangGraph or comparable agent orchestration frameworks.
- LangSmith or comparable LLM observability tooling for tracing, evaluation, and debugging.
- Reads AI papers & blogs regularly and is a trusted source of AI trends.
- Vector databases (Pinecone or similar) and retrieval system design.
- AWS ecosystem or other cloud infrastructure (ex GCP). Comfortable with lambdas, queues, and cloud-native architecture.
- Familiarity with TypeScript is a plus.
Benefits
Comp & perks- Medical insurance
- Dental insurance
- Vision insurance
- Life, AD&D, and Disability Insurance
- Emergency Weather Support
- Wellness Apps
- Paid Parental Leave
- Paid Time off inclusive of holidays and sick time
- Commuter & Parking Accounts
- Lunches in the Office
- Internet and Phone Stipend
- 401(k) retirement plan
- Financial Planning
- Learning & Development Budget
ATS Keywords
✓ Tailor your resumeApplicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
AI evaluation frameworkRAG pipelinesLLM-based systemsprompt engineeringagent orchestrationevaluation metricsmodel fine-tuningproduction-grade Pythonmulti-step workflowsvector databases
Soft Skills
project ownershiptechnical directioncollaborationcommunicationcritical thinkingcode reviewdocumentationtechnical debateproblem-solvingdata-driven decision making
