Deep Voice 1 is an artificial speech synthesis system or text-to-speech system (TTS). It is based on the traditional text-to-speech system but constructed from deep neural networks instead of complex, multi-stage processing pipelines.
Deep Voice 1 runs in real-time, it synthesizes audio as quick as it needed to be played. It is used for interactive applications like media and conversational interfaces. It trains deep neural networks to learn from large amounts of data and simple features.
Deep voice 1 was published in February 2017 by Baidu Research, Silicon Valley AI Lab.
Deep Voice: Real-time Neural Text-to-Speech
Sercan O. Arik, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, Yongguo Kang, Xian LI, Jonathan Raiman, Shubhho Sengupta and Mohammad Shoeybi