Deep Voice 3 introduces a comprehensive neural network architecture for speech synthesis. It is a fully-convolutional sequence-to-sequence model that converts text to spectrograms or other acoustic parameters to be used with audio waveform synthesis for fully parallel computation and faster training an
using recurrent cells.
It can generate monotonic attention behavior to avoid error modes that affect sequence-to-sequence models. It uses low-dimensional speaker embeddings to model the variability among the thousands of different speakers in the dataset. It can serve up to ten million queries per day on one single-GPU server.
Currently, no events have been added to this timeline yet.
Be the first one to add some.
Wei Ping, Kainan Peng, Andrew Gibansky, Sercan O. Arik, Ajay Kannan,Sharan Narang, Jonathan Raiman and John Mill
Deep Voice 3: 2000-Speaker NeuralText-to-Speech
Wei Ping, Kainan Peng, Andrew Gibiansky, Sercan O. Arik, Ajay Kannan, Sharan Narang, Jonathan Raiman, John Miller
Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning
Documentaries, videos and podcasts
No infobox has been created on this topic. Be the first to add one.
No Categories have been added to this topic yet. Be the first to add one.