Lattice

Senior Software Engineer, AI

Lattice

full-time

Posted on:

Location Type: Remote

Location: Canada

Visit company website

Explore more

AI Apply
Apply

Salary

💰 CA$123,750 - CA$165,000 per year

Job Level

About the role

  • Design and ship a robust, end-to-end AI evaluation framework, covering offline evals, production tracing, and human-in-the-loop feedback loops, connected across all of Lattice’s AI use cases.
  • Define and instrument the metrics that actually matter: agent task completion, hallucination rates, response quality, user engagement, and downstream business outcomes.
  • Build and maintain evaluation datasets, test harnesses, and automated scoring pipelines to catch regressions before they ship.
  • Identify and surface the drivers of agent quality improvement, giving the team clear signals on where to invest.
  • Architect and implement reusable agent infrastructure: multi-turn conversation workflows, recommendation services, LLM DAGs, and standardized agent topology patterns using LangGraph.
  • Build and scale RAG pipelines and retrieval infrastructure, including vector store management and retrieval quality optimization.
  • Make principled build vs. buy decisions across LLM providers, agent frameworks, and evaluation tooling, balancing capability, cost, latency, and vendor risk.
  • Contribute to production AI systems with a strong focus on reliability, observability, and performance, not just prototypes.
  • Own projects end-to-end: scope them, drive them to completion, and bring in the right people at the right time.
  • Partner with engineering leads and managers to inform technical direction on agent quality and evaluation strategy you’ll be expected to hold intelligent, substantive conversations about methodology, not just implementation.
  • Raise the AI engineering bar across the broader team through code review, documentation, and thoughtful technical debate.

Requirements

  • 5+ years of professional software engineering experience with significant time spent on production AI/ML systems.
  • Deep hands-on experience with LLM-based systems: prompt engineering, RAG pipelines, agent orchestration, evaluation metrics, and model fine-tuning.
  • Proven ability to work with data and understand statistics, especially in experiments.
  • Proven ability to build and operate agentic AI systems in production: multi-step workflows, multi-agent topologies, and the failure modes that come with them.
  • Strong command of AI evaluation: you’ve built eval frameworks before, you know the difference between a good eval and a vanity metric, and you have opinions about it.
  • Production-grade Python engineering: clean, maintainable, testable code.
  • LangGraph or comparable agent orchestration frameworks. You’ve built real agent workflows with it, not just tutorials.
  • LangSmith or comparable LLM observability tooling for tracing, evaluation, and debugging.
  • Reads AI papers & blogs regularly and is a trusted source of AI trends.
  • Vector databases (Pinecone or similar) and retrieval system design.
  • AWS ecosystem or other cloud infrastructure (ex GCP). Comfortable with lambdas, queues, and cloud-native architecture.
  • Familiarity with TypeScript is a plus.
Benefits
  • Medical insurance
  • Dental insurance
  • Life, AD&D, and Disability Insurance
  • Natural Disaster Support Program
  • Wellness Apps
  • Paid Parental Leave
  • Paid Time off inclusive of holidays and sick time
  • Working Remotely Stipend
  • One time WFH Office Set-Up Stipend
  • Retirement Plan
  • Financial Planning
  • Learning & Development Budget
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
AI evaluation frameworkproduction AI/ML systemsLLM-based systemsprompt engineeringRAG pipelinesagent orchestrationevaluation metricsmodel fine-tuningproduction-grade Python engineeringmulti-step workflows
Soft Skills
project ownershiptechnical directioncollaborationcommunicationcritical thinkingproblem-solvingcode reviewdocumentationtechnical debatedata-driven decision making