A machine learning strategy that helps accelerate stochastic gradient descent in the relevant direction while dampening oscillations.

Edits on 6 March, 2019

Edits made to:
**Article** (+5/-5 characters)

Article

The idea of SGD with momentum can be conceptualized with an analogy from physics in which a ball gains and loses momentum as it rolls around on hilly terrain. Imagining that a learning algorithm's loss can be interpreted as the height of a hilly terrain, it's then possible to relate the gradient of the loss function with the forceforce of a ball rolling up and down the hills. Specifically, the force is equal to the (negative) gradient of the loss function, where the loss function is represented as the potential energy of the ball at any point on a hill.

Edits made to:
**Article** (+29/-29 characters)

Article

This means that: **Force = −∇U***,*

This means that *Force = −∇U*, where *U = mgh *(i.e. *U *is the potential energy of the ball). Setting the ball's initial velocity equal to zero at some location is analogous to initializing the parameters with random numbers.

Edits made to:
**Article** (+1 images) (+1498 characters)
**Further reading** (+2 rows) (+8 cells) (+304 characters)

Article

The idea of SGD with momentum can be conceptualized with an analogy from physics in which a ball gains and loses momentum as it rolls around on hilly terrain. Imagining that a learning algorithm's loss can be interpreted as the height of a hilly terrain, it's then possible to relate the gradient of the loss function with the force of a ball rolling up and down the hills. Specifically, the force is equal to the (negative) gradient of the loss function, where the loss function is represented as the potential energy of the ball at any point on a hill.

This means that *Force = −∇U*, where *U = mgh *(i.e. *U *is the potential energy of the ball). Setting the ball's initial velocity equal to zero at some location is analogous to initializing the parameters with random numbers.

Optimizing to minimize the loss function can then be seen as equivalent to trying to get the ball to reach the deepest valley in the terrain, where the loss function is smallest. When the slope of a hill is very high, the ball's momentum at the bottom will push it up and over shorter hills. When slope decreases, momentum and velocity of the ball also decrease, eventually resulting in the ball coming to a rest in a valley. In other words, the momentum strategy is simulating the parameter vector (i.e. the ball) as rolling on the hilly terrain. The goal is to descend the slope of the hill faster than without momentum, while still controlling the velocity of the descent to prevent overshooting the valley altogether.

Further reading

Title

Author

Link

Type

Momentum Acceleration of Least-Squares Support Vector Machines

Jorge López, Álvaro Barbero, José R. Dorronsoro

Academic paper

Solving the model - SGD, Momentum and Adaptive Learning Rate

Paras Dahal

Web

Edits made to:
**Description** (+133 characters)
**Article** (+248 characters)
**Categories** (+1 topics)
**Related Topics** (+7 topics)

A machine learning strategy that helps accelerate stochastic gradient descent in the relevant direction while dampening oscillations.

Article

Momentum is a machine learning strategy that helps accelerate stochastic gradient descent (SGD) in the relevant direction while dampening oscillations. This is also referred to as "SGD with momentum" and is useful for training deep neural networks.

Categories

Edits made to:

A machine learning strategy that helps accelerate stochastic gradient descent in the relevant direction while dampening oscillations.

Text is available under the Creative Commons Attribution-ShareAlike 4.0; additional terms apply. By using this site, you agree to our Terms & Conditions.