smallest.ai

Senior Researcher – Text to Speech

smallest.ai

full-time

Posted on:

Location Type: Hybrid

Location: San FranciscoCaliforniaUnited States

Visit company website

Explore more

AI Apply
Apply

Salary

💰 $200,000 - $300,000 per year

Job Level

Tech Stack

About the role

  • Lead research on Text-to-Speech models focused on naturalness, expressiveness, latency, and robustness
  • Design and train TTS systems for real-world voices across accents, languages, and speaking styles
  • Improve streaming and low-latency speech synthesis pipelines
  • Experiment with architectures, loss functions, and data strategies (multi-speaker training, style modeling, distillation, data augmentation)
  • Translate research ideas into production-ready TTS systems
  • Collaborate closely with infra, product, and voice engineering teams

Requirements

  • 3–6 years of specialized experience in speech through academia or industry
  • Strong background in Text-to-Speech / speech generation research
  • Hands-on experience with deep learning frameworks (PyTorch preferred)
  • Experience with real-time or low-latency TTS systems
  • Familiarity with modern TTS architectures (Tacotron-style, FastSpeech, VITS, diffusion-based, neural vocoders)
  • Ability to think end-to-end: data → model → inference → deployment
  • Prior work in multilingual, expressive, or accented speech synthesis is a strong plus
  • Publications in top speech / ML conferences
  • Experience deploying TTS models in real-time production
  • Exposure to conversational AI or voice agents
Benefits
  • We pay top dollar for the best candidates.
Applicant Tracking System Keywords

Tip: use these terms in your resume and cover letter to boost ATS matches.

Hard Skills & Tools
Text-to-Speechspeech generationdeep learningPyTorchreal-time TTS systemslow-latency TTS systemsTacotronFastSpeechVITSneural vocoders
Soft Skills
collaborationproblem-solvingend-to-end thinking