Deep voice 3

Other attributes

Blog

research.baidu.com/Blog/...x-view

Industry

Deep Voice 3 introduces a comprehensive neural network architecture for speech synthesis. It is a fully-convolutional sequence-to-sequence model that converts text to spectrograms or other acoustic parameters to be used with audio waveform synthesis for fully parallel computation and faster training an

using recurrent cells.

It can generate monotonic attention behavior to avoid error modes that affect sequence-to-sequence models. It uses low-dimensional speaker embeddings to model the variability among the thousands of different speakers in the dataset. It can serve up to ten million queries per day on one single-GPU server.

Deep voice 3 the latest artificial speech synthesis system from Baidu Research, released in October 2017. It is based on Deep voice 1 and Deep voice 2.

Timeline

No Timeline data yet.

Further Resources

Title

Author

Link

Type

Date

Deep Voice 3: 2000-Speaker NeuralText-to-Speech

Wei Ping, Kainan Peng, Andrew Gibansky, Sercan O. Arik, Ajay Kannan,Sharan Narang, Jonathan Raiman and John Mill

https://arxiv.org/pdf/1710.07654.pdf

Academic paper

Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

Wei Ping, Kainan Peng, Andrew Gibiansky, Sercan O. Arik, Ajay Kannan, Sharan Narang, Jonathan Raiman, John Miller

http://arxiv.org/abs/1710.07654v3

Academic paper

Deep voice 3

Contents

Other attributes

Timeline

Further Resources

References

Find more entities like Deep voice 3