RMSprop

RMSprop stands for Root Mean Square Propagation. It is an unpublished, yet very widely-known gradient descent optimization algorithm for mini-batch learning of neural networks.

Background

RMSprop first appeared in the lecture slides of a Coursera online class on neural networks taught by Geoffrey Hinton of the University of Toronto. Hinton didn't publish RMSprop in a formal academic paper, but it still became one of the most popular gradient descent optimization algorithms for deep learning.

Hinton developed RMSprop to address the problem that would commonly occur when trying to use rprop with mini-batches, which is that weights would be adjusted proportionally to the magnitude of the gradient of each mini-batch, potentially resulting in very large weight increments or decrements if successive mini-batches don't have similar gradients. This is in contrast to the desired results of stochastic gradient descent, which is making small adjustments to weights and biases in order to calibrate a neural network to perform better and better at a specific task with each iteration of the optimization algorithm. RMSprop also builds on the Adagrad adaptive gradient algorithm by addressing the problem of aggressive, monotonically decreasing learning rates.

How RMSprop Works

In RMSprop, the problem that can occur with rprop if the gradients of successive mini-batches vary by too large an amount is mitigated by using a moving average of the squared gradient for each weight. This means that the gradient of each mini-batch is divided by the square root of the MeanSquare, where the MeanSquare is calculated as: