Login
Vanishing gradient problem

Vanishing gradient problem

The vanishing gradient problem can occur when training neural networks using gradient descent with backpropagation. When the derivative of the activation function tends to be very close to zero, the gradient used to updated the weights of the network may be too small for effective learning.

The vanishing gradient problem can occur in artificial neural networks trained using gradient descent with backpropagation. When training such a network, the gradient of the loss function is used to adjust the weights of the network on each iteration. The vanishing gradient problem occurs when the gradient is sufficiently small so as to effectively prevent weights from updating during training, blocking the network from learning.

Networks using activation functions whose derivatives tend to be very close to zero (such as the sigmoid function) are especially susceptible to the vanishing gradient problem. These small values get multiplied at each layer of the network, causing layers closer to the input to adjust their weights very slowly or not at all. Leaky or parametric RELU activation functions provide a robust solution to vanishing gradients by ensuring a derivative sufficiently greater than zero.

Networks using activation functions whose derivatives can take on very large values are susceptible to the related exploding gradient problem.

Timeline

Currently, no events have been added to this timeline yet.
Be the first one to add some.

People

Name
Role
Related Golden topics

Further reading

Author
Title
Link
Type

Michael Nielsen

Why are deep neural networks hard to train?

Razvan Pascanu, Tomas Mikolov and Yoshua Bengio

On the difficulty of training recurrent neural networks

Academic paper

Documentaries, videos and podcasts

Title
Date
Link

An Old Problem - Ep. 5 (Deep Learning SIMPLIFIED)

14 Decmber 2015

Companies

Company
CEO
Location
Products/Services