The unstable gradient problem is not necessarily the vanishing gradient problem or the exploding gradient problem, but is rather due to the fact that gradient in early layers is the product of terms from all proceeding layers. More layers make the network an intrinsically unstable solution. Balancing all products of terms is the only way each layer in a neural network can close at the same speed and avoid vanishing or exploding gradients. Balanced product of terms occurring by chance becomes more and more unlikely with more layers. Neural networks therefor have layers that learn at different speeds, without being given any mechanisms or underlying reason for balancing learning speeds.
When magnitudes of gradients accumulate, unstable networks are more likely to occur, which is a cause of poor prediction results.
Documentaries, videos and podcasts
- Vanishing gradient problemThe vanishing gradient problem can occur when training neural networks using gradient descent with backpropagation. When the derivative of the activation function tends to be very close to zero, the gradient used to updated the weights of the network may be too small for effective learning.
- Exploding gradient problemThe exploding gradient problem is a difficulty which can occur when training artificial neural networks using gradient descent by backpropagation. When large error gradients accumulate the model may become unstable and impair effective learning.