Log in
Enquire now
Stochastic gradient descent (SGD)

Stochastic gradient descent (SGD)

Gradient-based optimization algorithm used in machine learning and deep learning for training artificial neural networks.

OverviewStructured DataIssuesContributors

Contents

Other attributes

Wikidata ID
Q7617819

Stochastic Gradient Descent (SGD) is a simple gradient-based optimization algorithm used in machine learning and deep learning for training artificial neural networks.

The point of a gradient descent optimization algorithm is to minimize a given cost function, such as the loss function in training an artificial neural network.

Improving the performance of a neural network during training is done by slightly adjusting (i.e. tuning or calibrating) its weights and biases, where weights are values that express how important the corresponding inputs are to the output and bias is equivalent to a negative threshold value used to determine the outcome of the training iteration. Each neuron in the network calculates a weighted sum of the inputs and then applies a pre-set activation function to determine whether or not the weighted sum is greater than a threshold value. There are two possible outcomes from this process:

  • The weighted sum is greater than or equal to the threshold value: the neuron fires (i.e. activates).
  • The weighted sum is less than the threshold value: the neuron doesn't activate.

The loss function tells how well a neural network performs a certain task by producing a value that represents how close the actual output was to the desired output. The gradient descent calculates the slope of the loss function, then shifts the weights and biases according to that slope in order to lower loss in the next iteration. Minimizing loss is done by taking steps in the opposite direction of the gradient until you finally converge on some local minima. This process of minimizing loss is how neural networks learn to better perform specific tasks through training.

The term stochastic describes something that has a random probability distribution or pattern which can be analyzed statistically but not predicated precisely. In order for a gradient descent algorithm to be considered stochastic, it should have a batch size equal to 1, where the one example comprising each batch is chosen at random. When batch size is greater than one, the algorithm is called a mini-batch gradient descent algorithm.

Timeline

No Timeline data yet.

Further Resources

Title
Author
Link
Type
Date

A Bayesian Perspective on Generalization and Stochastic Gradient Descent

Samuel L. Smith, Quoc V. Le

http://arxiv.org/abs/1710.06451v3

Academic paper

Fully Distributed and Asynchronized Stochastic Gradient Descent for Networked Systems

Ying Zhang

http://arxiv.org/abs/1704.03992v1

Academic paper

How do we 'train' neural networks ? - Towards Data Science

Vitaly Bushaev

https://towardsdatascience.com/how-do-we-train-neural-networks-edd985562b73

Web

Stochastic gradient descent algorithms for strongly convex functions at O(1/T) convergence rates

Shenghuo Zhu

http://arxiv.org/abs/1305.2218v1

Academic paper

Stochastic Gradient Descent with momentum - Towards Data Science

Vitaly Bushaev

https://towardsdatascience.com/stochastic-gradient-descent-with-momentum-a84097641a5d

Web

References

Find more entities like Stochastic gradient descent (SGD)

Use the Golden Query Tool to find similar entities by any field in the Knowledge Graph, including industry, location, and more.
Open Query Tool
Access by API
Golden Query Tool
Golden logo

Company

  • Home
  • Press & Media
  • Blog
  • Careers
  • WE'RE HIRING

Products

  • Knowledge Graph
  • Query Tool
  • Data Requests
  • Knowledge Storage
  • API
  • Pricing
  • Enterprise
  • ChatGPT Plugin

Legal

  • Terms of Service
  • Enterprise Terms of Service
  • Privacy Policy

Help

  • Help center
  • API Documentation
  • Contact Us
By using this site, you agree to our Terms of Service.