Golden Recursion Inc. logoGolden Recursion Inc. logo
Advanced Search


Unpublished but widely-known gradient descent optimization algorithm for mini-batch learning of neural networks.

RMSprop stands for Root Mean Square Propagation. It is an unpublished, yet very widely-known gradient descent optimization algorithm for mini-batch learning of neural networks.


RMSprop first appeared in the lecture slides of a Coursera online class on neural networks taught by Geoffrey Hinton of the University of Toronto. Hinton didn't publish RMSprop in a formal academic paper, but it still became one of the most popular gradient descent optimization algorithms for deep learning.

Hinton developed RMSprop to address the problem that would commonly occur when trying to use rprop with mini-batches, which is that weights would be adjusted proportionally to the magnitude of the gradient of each mini-batch, potentially resulting in very large weight increments or decrements if successive mini-batches don't have similar gradients. This is in contrast to the desired results of stochastic gradient descent, which is making small adjustments to weights and biases in order to calibrate a neural network to perform better and better at a specific task with each iteration of the optimization algorithm. RMSprop also builds on the Adagrad adaptive gradient algorithm by addressing the problem of aggressive, monotonically decreasing learning rates.

How RMSprop Works

In RMSprop, the problem that can occur with rprop if the gradients of successive mini-batches vary by too large an amount is mitigated by using a moving average of the squared gradient for each weight. This means that the gradient of each mini-batch is divided by the square root of the MeanSquare, where the MeanSquare is calculated as:

This process effectively averages the gradients over successive mini-batches so that weights can be finely calibrated.


Further reading


Neural Networks for Machine Learning - Lecture 6

Geoffrey Hinton, Nitish Srivastava, Kevin Swersky


Understanding RMSprop -- faster neural network learning

Vitaly Bushaev


Documentaries, videos and podcasts


Lecture 6.5 -- Rmsprop: normalize the gradient [Neural Networks for Machine Learning]

February 4th, 2016


Golden logo
By using this site, you agree to our Terms & Conditions.