Tacotron 2 is a multiple neural network architecture for speech synthesis. It is the combination of the text-to-speech systems (TTSs) WaveNet and Tacotron. The system was developed for Google Assistant.
It is an end-to-end TTS system with a sequence-to-sequence recurrent network that predicts mel spectograms with a modified WaveNet vocoder. It can be directly trained from data and can achieve state-of-the-art natural human speech sound quality.
Alphabet Inc. researchers have developed Tacotron 2 as a new version of DeepMind's WaveNet to power Google Assistant. It is a second generation of AI powered speech synthesis system by Google. It uses multiple neural networks to produce speech almost indistinguishable from humans.
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis and Yonghui Wu
Documentaries, videos and podcasts
Tacotron 2 - THE BEST TEXT TO SPEECH AI YET!
20 January 2018
- Speech synthesisArtificial simulation of human speech using computers or other devices
- Natural language processingNatural Language Processing (NLP) is a field of computer science wherein computer and human languages interact. Programming computers to process vast amount of natural language data.
- Text-to-Speech (TTS)A system for converting text into spoken voice
- WaveNetA deep neural network for generating realistic voices for Google Assistant.
- TacotronAn end-to-end generative TTS system synthesizing speech directly from text by Google
- Google AssistantIntelligent personal assistant
- DeepMindDeepMind is an artificial intelligence company creating programs which use deep neural networks to teach themselves how to play a variety of games like Go and chess.