Architect and build the conversation orchestration service: ASR → LLM inference → TTS streaming in real time
Write robust, asynchronous Python code designed to handle high concurrency without deadlocks, race conditions, or memory leaks
Design and maintain clean, well-structured APIs for future scalability and ease of debugging
Manage interaction data using SQL Alchemy (or equivalent) with efficient schema design and safe migrations
Implement observability: structured logging, metrics, and tracing across the system for instant issue diagnosis
Partner with ML and Product teams to rapidly iterate on conversation flow and user experience
Enforce a strong testing culture: automated unit tests, E2E flows, and load testing
Build resilient systems capable of handling real-world edge cases like noisy audio, unreliable APIs, and flaky networks
Continuously profile, optimize, and reduce latency and response times**
Requirements
Deep Python expertise: 5+ years in Python, production systems experience required, context managers, generators, event loops, GIL, and effective use of asyncio
Database fundamentals: data modeling, efficient queries, ORM best practices
Networking & I/O: streaming, backpressure, and resilient design for unreliable networks