Golden
Nesterov momentum

Nesterov momentum

A slightly modified version of the machine learning strategy Momentum with stronger theoretical convergence guarantees for convex functions.

All edits

Edits on 7 Mar 2019
Jude Gomila
Jude Gomila edited on 7 Mar 2019 6:19 pm
Edits made to:
Description (+30 characters)
Topic thumbnail

Nesterov momentum

A slightly modified version of the machine learning strategy Momentum with stronger theoretical convergence guarantees for convex functions.

Daniel Frumkin"Created page"
Daniel Frumkin edited on 7 Mar 2019 6:00 pm
Edits made to:
Description (+110 characters)
Article (+1 images) (+1068 characters)
Further reading (+12 cells) (+489 characters)
Categories (+2 topics)
Related Topics (+6 topics)
Topic thumbnail

Nesterov momentum

A slightly modified version of Momentum with stronger theoretical convergence guarantees for convex functions.

Article



Nesterov momentum, or Nesterov Accelerated Gradient (NAG), is a slightly modified version of Momentum with stronger theoretical convergence guarantees for convex functions. In practice, it has produced slightly better results than classical Momentum.

Difference Between Momentum and Nesterov Momentum

In the standard Momentum method, the gradient is computed using current parameters (θt). Nesterov momentum achieves stronger convergence by applying the velocity (vt) to the parameters in order to compute interim parameters (θ̃ = θt+μ*vt), where μ is the decay rate. These interim parameters are then used to compute the gradient, called a "lookahead" gradient step or a Nesterov Accelerated Gradient.



The reason this is sometimes referred to as a "lookahead" gradient is that computing the gradient based on interim parameters allow NAG to change velocity in a faster and more responsive way, resulting in more stable behavior than classical Momentum in many situations, particularly for higher values of μ. NAG is the correction factor for classical Momentum method. 





Further reading

Title
Author
Link
Type

CS231n Convolutional Neural Networks for Visual Recognition

Stanford Computer Science

Web

Momentum Method and Nesterov Accelerated Gradient - Konvergen - Medium

Roan Gylberth

Web

On the importance of initialization and momentum in deep learning

Ilya Sutskever, James Martens, George Dahl, Geoffrey Hinton

PDF

Categories

Related Topics

Edits on 7 Mar 2019
Daniel Frumkin"Initial topic creation"
Daniel Frumkin created this topic on 7 Mar 2019 1:50 pm
Edits made to:
Topic thumbnail

 Nesterov momentum

A slightly modified version of the machine learning strategy Momentum with stronger theoretical convergence guarantees for convex functions.

No more activity to show.