The vanishing gradient problem can occur in artificial neural networks trained using gradient descent with backpropagation. When training such a network, the gradient of the loss function is used to adjust the weights of the network on each iteration. The vanishing gradient problem occurs when the gradient is sufficiently small so as to effectively prevent weights from updating during training, blocking the network from learning.
Networks using activation functions whose derivatives tend to be very close to zero (such as the sigmoid function) are especially susceptible to the vanishing gradient problem. These small values get multiplied at each layer of the network, causing layers closer to the input to adjust their weights very slowly or not at all. Leaky or parametric RELU activation functions provide a robust solution to vanishing gradients by ensuring a derivative sufficiently greater than zero.
Networks using activation functions whose derivatives can take on very large values are susceptible to the related exploding gradient problem.
Currently, no events have been added to this timeline yet.
Be the first one to add some.
Why are deep neural networks hard to train?
Razvan Pascanu, Tomas Mikolov and Yoshua Bengio
On the difficulty of training recurrent neural networks
Documentaries, videos and podcasts
An Old Problem - Ep. 5 (Deep Learning SIMPLIFIED)
14 Decmber 2015
No infobox has been created on this topic. Be the first to add one.
No Categories have been added to this topic yet. Be the first to add one.
No Related Topics have been added to this topic yet. Be the first to add one.