Senior AI Agent Engineer, Voice AI

Zendesk

full-time

Posted on: 9/19/2025

Origin: • 🇩🇪 Germany

Visit company website

✨ AI Apply

Apply

Job Level

Senior

Tech Stack

AWSAzureCloudGoogle Cloud PlatformGRPCPythonReact

About the role

Design and develop robust, stateful, and scalable voice-first AI agents using Python, specifically optimized for real-time voice interactions, managing turn-taking, interruptions, and low-latency responses.
Integrate best-in-class real-time Speech-to-Text (STT), Text-to-Speech (TTS), and Voice Activity Detection (VAD) services to create a seamless conversational flow.
Connect voice agents with existing enterprise systems, databases, and third-party APIs to create powerful, end-to-end automated workflows initiated and managed through voice.
Establish and own the Evals for voice agent performance and behavior and iterate over time to systematically improve performance, reliability, and the overall user experience.
Build end-to-end conversational flows with reasoning, planning, and dynamic tool use — beyond pre-scripted voice experiences.
Work cross-functionally with product managers, ML Scientists, and engineers to deeply understand user needs and voice interaction goals.
Implement fallback, recovery, and error-handling strategies to deal with noisy audio input or speech recognition inaccuracies.
Define and track voice-specific evaluation metrics (e.g., word error rate, latency, conversational naturalness).
Develop observability tools and guardrails to monitor performance, ensure safety, and handle edge cases in spoken interactions.
Document development, architecture decisions, and research findings to share knowledge across the team.

Requirements

Strong experience building multi-step, tool-using agents (LangChain, Autogen).
Familiar with prompt engineering, context management, and reasoning strategies like Chain-of-Thought and ReAct.
Experience building low-latency, streaming voice applications.
Expertise in integrating and managing real-time STT/TTS models and APIs.
Proficient with techniques for Voice Activity Detection (VAD), noise suppression, and implementing robust barge-in/interruption logic.
Experience with integrating third-party voice AI APIs, including Speech-to-Text (STT) and Text-to-Speech (TTS) services from providers like OpenAI, Deepgram, ElevenLabs, etc.
Understanding of latency, timing, and streaming audio constraints.
Comfortable connecting agents to external APIs, tools, databases in secure environments.
Building RAG pipelines with vector stores, chunking strategies, and hybrid retrieval.
Implementing and Using monitoring tools and evaluation frameworks (Braintrust) to score our AI Agents.
Familiarity with techniques for prompt injection defense, guardrails (Rebuff, Guardrails AI), and failover logic.
Token budget and latency management experience using caching, model routing, etc.
Expert in Python, FastAPI, and LLM SDKs.
Experience deploying AI apps to cloud platforms (AWS, GCP, Azure) using CI/CD best practices.
Nice-to-have: M.S. / Ph.D. in Computer Science, NLP, Machine Learning, or related field.
Nice-to-have: Background in spoken dialogue systems or conversational UX design.
Nice-to-have: Familiarity with real-time streaming architecture (e.g., WebRTC, gRPC, socket.io).
Nice-to-have: Multilingual ASR/TTS pipeline experience.

Senior AI Agent Engineer, Voice AI

Job Level

Tech Stack

About the role

Requirements

Similar jobs on JobTailor

Solutions Architect, Applied Engineering

Senior Software Engineer

Senior AI Agent Engineer

Research Staff, Machine Learning Engineer

AVP Software Engineering for Marketing & Communication Platforms