Golden has been acquired by ComplyAdvantage.Read about it here ⟶

Nesterov momentum

Nesterov momentum is a slightly modified version of the machine learning strategy momentum with stronger theoretical convergence guarantees for convex functions.

Overview Structured Data Issues Contributors Activity

All edits

Edits on 26 Jun, 2023

Amy Tomlinson Gayle

edited on 26 Jun, 2023

Edits made to:

Description (+30/-9 characters)

Article (+582/-579 characters)

Nesterov momentum

ANesterov momentum is a slightly modified version of the machine learning strategy Momentummomentum with stronger theoretical convergence guarantees for convex functions.

Article

Nesterov momentum, or Nesterov Acceleratedaccelerated Gradientgradient (NAG), is a slightly modified version of Momentummomentum with stronger theoretical convergence guarantees for convex functions. In practice, it has produced slightly better results than classical Momentummomentum. The gradient descent optimization algorithm follows the negative gradient of an objective function to locate its minimum. However, it has limitations, getting stuck in flat areas and struggling with noisy gradients. Momentum is an approach that accelerates the search, skimming across flat areas and smoothing noisy gradients. In some cases the acceleration of momentum causes the search to miss the minima. Nesterov momentum is a further extension that involves calculating the decaying moving average of the gradient for projected positions in the search space, not the actual positions themselves. This offers the benefits of momentum while reducing the chance of missing the minima.

...

Momentum is an approach that accelerates the search, skimming across flat areas and smoothing noisy gradients. In some cases, the acceleration of momentum causes the search to miss the minima. Nesterov momentum is a further extension that involves calculating the decaying moving average of the gradient for projected positions in the search space, not the actual positions themselves. This offers the benefits of momentum while reducing the chance of missing the minima.

Nesterov Momentummomentum was first described by Russian mathematician Yurii Nesterov in his 1983 paper titled “A Method For Solving The Convex Programming Problem With Convergence Rate O(1/k^2).” Neterov Momentummomentum was popularized by Ilya Sutskever, et al.al., applying the method to the training of neural networks with stochastic gradient descent. This work was described in their 2013 paper “On The Importance Of Initialization And Momentum In Deep Learning.”

...

In the standard Momentummomentum method, the gradient is computed using current parameters (θt). Nesterov momentum achieves stronger convergence by applying the velocity (vt) to the parameters in order to compute interim parameters (θ̃ = θt+μ*vt), where μ is the decay rate. These interim parameters are then used to compute the gradient, called a "lookahead" gradient step or a Nesterov Acceleratedaccelerated Gradientgradient.

...

The reason this is sometimes referred to as a "lookahead" gradient is that computing the gradient based on interim parameters allows NAG to change velocity in a faster and more responsive way, resulting in more stable behavior than classical Momentummomentum in many situations, particularly for higher values of μ. NAG is the correction factor for the classical Momentummomentum method.

...

Nesterov Momentummomentum can be described in terms of four steps:

...

Nesterov momentum improves the rate of convergence of the optimization algorithm (e.g.e.g., reduce the number of iterations required to find the solution), in particular in the field of convex optimization.

Edits on 25 Jun, 2023

Arthur Smalley

edited on 25 Jun, 2023

Edits made to:

Infobox (+4 properties)

Timeline (+2 events) (+393 characters)

Article (+1809/-285 characters)

Article

Overview

Nesterov momentum, or Nesterov Accelerated Gradient (NAG), is a slightly modified version of Momentum with stronger theoretical convergence guarantees for convex functions. In practice, it has produced slightly better results than classical Momentum. The gradient descent optimization algorithm follows the negative gradient of an objective function to locate its minimum. However, it has limitations, getting stuck in flat areas and struggling with noisy gradients. Momentum is an approach that accelerates the search, skimming across flat areas and smoothing noisy gradients. In some cases the acceleration of momentum causes the search to miss the minima. Nesterov momentum is a further extension that involves calculating the decaying moving average of the gradient for projected positions in the search space, not the actual positions themselves. This offers the benefits of momentum while reducing the chance of missing the minima.

Nesterov Momentum was first described by Russian mathematician Yurii Nesterov in his 1983 paper titled “A Method For Solving The Convex Programming Problem With Convergence Rate O(1/k^2).” Neterov Momentum was popularized by Ilya Sutskever, et al. applying the method to the training of neural networks with stochastic gradient descent. This work was described in their 2013 paper “On The Importance Of Initialization And Momentum In Deep Learning.”

Difference Between Momentum andvs. Nesterov Momentummomentum

...

The reason this is sometimes referred to as a "lookahead" gradient is that computing the gradient based on interim parameters allowallows NAG to change velocity in a faster and more responsive way, resulting in more stable behavior than classical Momentum in many situations, particularly for higher values of μ. NAG is the correction factor for the classical Momentum method.

Nesterov Momentum can be described in terms of four steps:

Project the position of the solution
Calculate the gradient of the projection
Calculate the gradient of the projection
Update the variable

Nesterov momentum improves the rate of convergence of the optimization algorithm (e.g. reduce the number of iterations required to find the solution), in particular in the field of convex optimization.

Infobox

Created/Discovered by

Yurii Nesterov

Date Invented

1983

Related Industries

Artificial neural network

Artificial Intelligence (AI)

Timeline

May 26, 2013

Ilya Sutskever et al. apply Nesterov momentum to the training of neural networks with stochastic gradient descent, in their paper “On The Importance Of Initialization And Momentum In Deep Learning.”

The work popularizes the use of Nesterov momentum.

1983

Yurii Nesterov describes Nesterov momentum in a paper titled “A Method For Solving The Convex Programming Problem With Convergence Rate O(1/k^2).

Edits on 7 Mar, 2019

Jude Gomila

edited on 7 Mar, 2019

Edits made to:

Description (+30 characters)

Nesterov momentum

A slightly modified version of the machine learning strategy Momentum with stronger theoretical convergence guarantees for convex functions.

"Created page"

Daniel Frumkin

edited on 7 Mar, 2019

Edits made to:

Description (+110 characters)

Article (+1 images) (+1068 characters)

Table (+3 rows) (+12 cells) (+489 characters)

Categories (+2 topics)

Related Topics (+6 topics)

Nesterov momentum

A slightly modified version of Momentum with stronger theoretical convergence guarantees for convex functions.

Article

Difference Between Momentum and Nesterov Momentum

In the standard Momentum method, the gradient is computed using current parameters (θt). Nesterov momentum achieves stronger convergence by applying the velocity (vt) to the parameters in order to compute interim parameters (θ̃ = θt+μ*vt), where μ is the decay rate. These interim parameters are then used to compute the gradient, called a "lookahead" gradient step or a Nesterov Accelerated Gradient.

The reason this is sometimes referred to as a "lookahead" gradient is that computing the gradient based on interim parameters allow NAG to change velocity in a faster and more responsive way, resulting in more stable behavior than classical Momentum in many situations, particularly for higher values of μ. NAG is the correction factor for classical Momentum method.

Table

Title

Author

Link

Type

CS231n Convolutional Neural Networks for Visual Recognition

Stanford Computer Science

https://cs231n.github.io/neural-networks-3/#sgd

Web

Momentum Method and Nesterov Accelerated Gradient - Konvergen - Medium

Roan Gylberth

https://medium.com/konvergen/momentum-method-and-nesterov-accelerated-gradient-487ba776c987

Web

On the importance of initialization and momentum in deep learning

Ilya Sutskever, James Martens, George Dahl, Geoffrey Hinton

http://www.cs.toronto.edu/~fritz/absps/momentum.pdf

PDF

"Initial topic creation"

Daniel Frumkin

created this topic on 7 Mar, 2019

Edits made to:

Nesterov momentum

Nesterov momentum is a slightly modified version of the machine learning strategy momentum with stronger theoretical convergence guarantees for convex functions.

Find more entities like Nesterov momentum

Use the Golden Query Tool to find similar entities by any field in the Knowledge Graph, including industry, location, and more.

Open Query Tool

Access by API

By using this site, you agree to our Terms of Service.