Deep voice 2

A multi-speaker neural artificial speech synthesis system based on Deep voice 1.

Deep Voice 2 is an artificial system synthesis commonly called text-to-speech system (TTS). It is based on Deep voice 1 but constructed with higher performance building blocks and introduces a post-processing neural vocoder. It demonstrates a significant audio quality improvement.

Deep voice 2 can generate several hundred voices and accents. It can learn from hundreds of voices and imitate them perfectly. It can learn from hundreds of unique voices from less than half an hour of data per speaker, while achieving high audio quality synthesis and preserving the speaker identities.

It was released in May 2017 by Baidu Research.


Deep Voice 2: Multi-Speaker Neural Text-to-Speech

Sercan Arik, Gregory Diamos, Andrew Gibiansky, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman and Yanqi Zhou

Academic paper


