Lead research on Text-to-Speech models focused on naturalness, expressiveness, latency, and robustness
Design and train TTS systems for real-world voices across accents, languages, and speaking styles
Improve streaming and low-latency speech synthesis pipelines
Experiment with architectures, loss functions, and data strategies (multi-speaker training, style modeling, distillation, data augmentation)
Translate research ideas into production-ready TTS systems
Collaborate closely with infra, product, and voice engineering teams

Requirements

3–6 years of specialized experience in speech through academia or industry
Strong background in Text-to-Speech / speech generation research
Hands-on experience with deep learning frameworks (PyTorch preferred)
Experience with real-time or low-latency TTS systems
Familiarity with modern TTS architectures (Tacotron-style, FastSpeech, VITS, diffusion-based, neural vocoders)
Ability to think end-to-end: data → model → inference → deployment
Prior work in multilingual, expressive, or accented speech synthesis is a strong plus
Publications in top speech / ML conferences
Experience deploying TTS models in real-time production
Exposure to conversational AI or voice agents

Benefits

Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools

Text-to-Speechspeech generationdeep learningPyTorchreal-time TTS systemslow-latency TTS systemsTacotronFastSpeechVITSneural vocoders

Soft Skills

collaborationproblem-solvingend-to-end thinking