WaveNet

Other attributes

Wikidata ID

WaveNet is a deep neural network designed to generate raw audio waveforms. It generates realistic-sounding voices for Google Assistant globally.

It mimics the human voice and sounds more natural than the best existing Text-to-Speech systems, reducing the gap with the human performance by over 50% and creating higher quality audio.

DeepMind's WaveNet is a type of feedforward neural network, convolutional neural network (CNN). It is composed of layers of interconnected nodes, CNN uses a raw signal as input and synthesizes an output. The trained network creates new speech-like waveforms at 16,000 samples per second. The output waveforms include realistic breaths and lip smacks.

It was created by researchers at DeepMind in London in 2016. Other Text-to-speech systems (TTSs) are Apple's Siri, Microsoft’s Cortana, Amazon Alexa among others.

Timeline

No Timeline data yet.

Further Resources

Title

Author

Link

Type

Date

Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders

Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Dieleman, Douglas Eck, Karen Simonyan, Mohammad Norouzi

http://arxiv.org/abs/1704.01279v1

Academic paper

WAVENET: A GENERATIVE MODEL FOR RAW AUDIO

Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Seniorn and Koray Kavukcuoglu

https://arxiv.org/pdf/1609.03499.pdf

Academic paper

WaveNet

Contents

Other attributes

Timeline

Further Resources

References

Find more entities like WaveNet