Log in
Enquire now
Adadelta

Adadelta

An extension of the Adagrad machine learning optimization algorithm which improves upon the two main drawbacks of the method.

OverviewStructured DataIssuesContributors

Contents

BackgroundAdvances of AdadeltaTimelineTable: Further ResourcesReferences

Adadelta is a machine learning optimization algorithm that was created by Matthew D. Zeiler with the goal of addressing two drawbacks of the Adagrad method.

Background

Adagrad improved upon previous gradient descent based algorithms by adaptively scaling the learning rate (η) parameter for each dimension in a system, making it possible to train deep neural networks having millions of dimensions with a process that is neither to volatile and imprecise, nor too slow.

The drawbacks of Adagrad are:

  • It has a continually decaying learning rate throughout training, meaning that the learning rate will become infinitesimally small after many iterations.
  • It doesn't solve the issue with previous gradient descent algorithms which require manual selection of the global learning rate.
Advances of Adadelta

To address Adagrad's drawbacks, Adadelta implements two new ideas.

  1. Accumulate the sum of squared gradients over a restricted time window rather than over all time. This is different from Adagrad, which can accumulate the sum of squared gradients all the way to infinity. Since this sum is in the denominator and the learning rate in the numerator, the learning rate approaches zero as the sum approaches infinity. By restricting the potential size of the sum, Adadelta ensures that learning continues to make progress even after a large amount of iterations have been completed.
  2. Correct the mismatch in units that exists in most gradient descent based algorithms. Each time a parameter updates with past algorithms (e.g. Adagrad, SGD, or Momentum) the units relate to the gradient rather than the parameter. This means that the units of the update don't match up with the units of the parameter they're updating. To address this drawback, Adadelta uses a Hessian approximation which is always positive in place of the learning, ensuring that the update direction always follows the negative gradient at each step as in SGD. This eliminates the learning rate from the update rule completely, meaning that there's no longer a requirement to manually set it.

Timeline

No Timeline data yet.

Further Resources

Title
Author
Link
Type
Date

ADADELTA: An Adaptive Learning Rate Method

Matthew D. Zeiler

https://arxiv.org/pdf/1212.5701.pdf

PDF

An overview of gradient descent optimization algorithms

Sebastian Ruder

http://ruder.io/optimizing-gradient-descent/index.html#adadelta

Web

References

Find more entities like Adadelta

Use the Golden Query Tool to find similar entities by any field in the Knowledge Graph, including industry, location, and more.
Open Query Tool
Access by API
Golden Query Tool
Golden logo

Company

  • Home
  • Press & Media
  • Blog
  • Careers
  • WE'RE HIRING

Products

  • Knowledge Graph
  • Query Tool
  • Data Requests
  • Knowledge Storage
  • API
  • Pricing
  • Enterprise
  • ChatGPT Plugin

Legal

  • Terms of Service
  • Enterprise Terms of Service
  • Privacy Policy

Help

  • Help center
  • API Documentation
  • Contact Us
By using this site, you agree to our Terms of Service.