Drive technical breakthroughs in agentic systems, applied ML infrastructure, and LLM-based applications.
Define and evolve the ML/LLM strategy and technology roadmap in alignment with product development.
Act as a principal technical authority, making high-impact architectural and modeling decisions across teams.
Develop prototypes for key technologies to validate new approaches and de-risk system design.
Own the full lifecycle from research and experimentation through production deployment, monitoring, and iteration.
Translate advances in ML into scalable, production-grade systems with measurable impact.
Design how LLMs operate within agent workflows, tool use, and multi-step reasoning and long-lived execution.
Implement and refine prompting strategies, multi-agent orchestration, memory management, and human-in-the-loop controls for safety and reliability.
Establish patterns for planning, decision-making, and tool orchestration within complex systems.
Own end-to-end quality evaluation of ML-powered systems, including defining metrics, benchmarks, and testing frameworks.
Establish evaluation systems that connect model performance to task success and system-level outcomes.
Ensure systems behave predictably, safely, and reliably in production through monitoring, regression testing, and robust failure handling.
Contribute to the design of ML systems supporting the full lifecycle, including training, fine-tuning, evaluation, deployment, and monitoring.
Drive architecture decisions across model serving, routing, orchestration, and latency and cost optimization.
Work across infrastructure layers, including cloud and containerized systems, to ensure scalable and efficient deployment.
Build and deploy enterprise-grade AI systems used by global customers in production environments.
Design systems that operate reliably in regulated and constrained settings, including on-premise, air-gapped, and secure cloud environments.
Ensure systems are auditable, explainable, and compliant with regulatory and organizational requirements.
Write technical reports and design documents summarizing R&D progress, system behavior, and key decisions.
Communicate complex ML concepts and tradeoffs clearly to both technical and non-technical stakeholders.
Drive alignment across research, engineering, and product through strong technical leadership.
Mentor junior and senior engineers and researchers, raising the bar for ML rigor and system-level thinking.
Establish and propagate best practices for ML system design, evaluation, and reliability across the organization.
Influence technical direction beyond immediate teams through high-impact, cross-functional work.

Requirements

12–15+ years of experience in machine learning, including building and deploying applied ML systems in production environments.
Strong programming skills in Python, with experience in Java, C++, or related languages in systems contexts.
Deep expertise in at least one major ML domain, such as LLMs and generative AI, NLP or multimodal systems, deep learning, or graph learning.
Hands-on experience with prompt engineering, multi-agent orchestration, tool integration via APIs, memory management, and human-in-the-loop system design.
Proven experience building and shipping enterprise-grade AI systems, including GenAI, LLM, or agent-based applications at scale.
Experience designing and implementing evaluation frameworks, including metrics, benchmarks, and testing systems.
Strong understanding of ML system behavior in production, including reliability, latency, cost tradeoffs, and failure modes.
Experience deploying ML systems in regulated or constrained environments and familiarity with modern ML infrastructure such as cloud platforms and containerized systems.
Demonstrated ability to lead technical direction across teams and drive systems from concept to production impact.

Benefits

Career track opportunity with potential for rapid advancement with strong performance as the firm grows
100% employer paid, comprehensive health care including medical, dental, and vision for you and your family.
Paid maternity and paternity for 14 weeks at employees' normal pay.
Unlimited PTO, with management approval.
Opportunities for professional development and continued learning.
Optional 401K, FSA, and equity incentives available.
Mental health benefits are available through Tara Mind.

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

machine learningPythonJavaC++LLMsgenerative AINLPdeep learningprompt engineeringmulti-agent orchestration

Soft Skills

technical leadershipcommunicationmentoringcross-functional collaborationdecision-makingplanningsystem-level thinkingevaluationproblem-solvinginfluence