Vanishing gradient problem

Other attributes

Wikidata ID

The vanishing gradient problem can occur in artificial neural networks trained using gradient descent with backpropagation. When training such a network, the gradient of the loss function is used to adjust the weights of the network on each iteration. The vanishing gradient problem occurs when the gradient is sufficiently small so as to effectively prevent weights from updating during training, blocking the network from learning.

Networks using activation functions whose derivatives tend to be very close to zero (such as the sigmoid function) are especially susceptible to the vanishing gradient problem. These small values get multiplied at each layer of the network, causing layers closer to the input to adjust their weights very slowly or not at all. Leaky or parametric RELU activation functions provide a robust solution to vanishing gradients by ensuring a derivative sufficiently greater than zero.

Networks using activation functions whose derivatives can take on very large values are susceptible to the related exploding gradient problem.

Timeline

No Timeline data yet.

Further Resources

Title

Author

Link

Type

Date

An Old Problem - Ep. 5 (Deep Learning SIMPLIFIED)

https://www.youtube.com/watch?v=SKMpmAOUa2Q

14 Decmber 2015

On the difficulty of training recurrent neural networks

Razvan Pascanu, Tomas Mikolov and Yoshua Bengio

http://proceedings.mlr.press/v28/pascanu13.pdf

Academic paper

Why are deep neural networks hard to train?

Michael Nielsen

http://neuralnetworksanddeeplearning.com/chap5.html

Vanishing gradient problem

Contents

Other attributes

Timeline

Further Resources

References

Find more entities like Vanishing gradient problem