
Senior Researcher – Text to Speech
smallest.ai
full-time
Posted on:
Location Type: Hybrid
Location: San Francisco • California • United States
Visit company websiteExplore more
Salary
💰 $200,000 - $300,000 per year
Job Level
Tech Stack
About the role
- Lead research on Text-to-Speech models focused on naturalness, expressiveness, latency, and robustness
- Design and train TTS systems for real-world voices across accents, languages, and speaking styles
- Improve streaming and low-latency speech synthesis pipelines
- Experiment with architectures, loss functions, and data strategies (multi-speaker training, style modeling, distillation, data augmentation)
- Translate research ideas into production-ready TTS systems
- Collaborate closely with infra, product, and voice engineering teams
Requirements
- 3–6 years of specialized experience in speech through academia or industry
- Strong background in Text-to-Speech / speech generation research
- Hands-on experience with deep learning frameworks (PyTorch preferred)
- Experience with real-time or low-latency TTS systems
- Familiarity with modern TTS architectures (Tacotron-style, FastSpeech, VITS, diffusion-based, neural vocoders)
- Ability to think end-to-end: data → model → inference → deployment
- Prior work in multilingual, expressive, or accented speech synthesis is a strong plus
- Publications in top speech / ML conferences
- Experience deploying TTS models in real-time production
- Exposure to conversational AI or voice agents
Benefits
- We pay top dollar for the best candidates.
Applicant Tracking System Keywords
Tip: use these terms in your resume and cover letter to boost ATS matches.
Hard Skills & Tools
Text-to-Speechspeech generationdeep learningPyTorchreal-time TTS systemslow-latency TTS systemsTacotronFastSpeechVITSneural vocoders
Soft Skills
collaborationproblem-solvingend-to-end thinking