Golden has been acquired by ComplyAdvantage.Read about it here ⟶

Convolutional neural network

A convolutional neural network (CNN or ConvNet) is a deep learning algorithm, one of the various types of artificial neural networks used for different applications and data types.

Overview Structured Data Issues Contributors Activity

All edits by Ivan Lytvyn

Edits on 23 Feb, 2022

Ivan Lytvyn

edited on 23 Feb, 2022

Edits made to:

Article (+583 characters)

Article

It is commonly assumed that CNNs are invariant to shifts of the input. Convolution or pooling layers within a CNN that do not have a stride greater than one are indeed equivariant to translations of the input. However, layers with a stride greater than one ignore the Nyquist-Shannon sampling theorem and might lead to aliasing of the input signal While, in principle, CNNs are capable of implementing anti-aliasing filters, it has been observed that this does not happens in practice and yield models that are not equivariant to translations. Furthermore, if a CNN makes use of fully connected layers, translation equivariance does not imply translation invariance, as the fully connected layers are not invariant to shifts of the input. One solution for complete translation invariance is avoiding any down-sampling throughout the network and applying global average pooling at the last layer. Additionally, several other partial solutions have been proposed, such as anti-aliasing before downsampling operations, spatial transformer networks, data augmentation, subsampling combined with pooling, and capsule neural networks

Ivan Lytvyn

edited on 23 Feb, 2022

Edits made to:

Article (+582 characters)

Article

Translation Equivariance and Aliasing

Ivan Lytvyn

edited on 23 Feb, 2022

Edits made to:

Article (+16/-16 characters)

Article

Hyperparameters are various settings that are used to control the learning process. CNNs use more hyperparameters hyperparameters than a standard multilayer perceptron (MLP).

Edits on 21 Feb, 2022

Ivan Lytvyn

edited on 21 Feb, 2022

Edits made to:

Article (+1 images) (+24 characters)

Article

Typical CNN architecture

Ivan Lytvyn

edited on 21 Feb, 2022

Edits made to:

Article (+298 characters)

Article

Dilation

Dilation involves ignoring pixels within a kernel. This reduces processing/memory potentially without significant signal loss. A dilation of 2 on a 3x3 kernel expands the kernel to 7x7, while still processing 9 (evenly spaced) pixels. Accordingly, dilation of 4 expands the kernel to 15x15.

Ivan Lytvyn

edited on 21 Feb, 2022

Edits made to:

Article (+380 characters)

Article

Pooling type and size

Max pooling is typically used, often with a 2x2 dimension. This implies that the input is drastically downsampled, reducing processing cost.

Large input volumes may warrant 4×4 pooling in the lower layers. Greater pooling reduces the dimension of the signal, and may result in unacceptable information loss. Often, non-overlapping pooling windows perform best.

Ivan Lytvyn

edited on 21 Feb, 2022

Edits made to:

Article (+273 characters)

Article

Filter size

Common filter sizes found in the literature vary greatly, and are usually chosen based on the data set.

The challenge is to find the right level of granularity so as to create abstractions at the proper scale, given a particular data set, and without overfitting.

Ivan Lytvyn

edited on 21 Feb, 2022

Edits made to:

Article (+607 characters)

Article

Number of filters

Since feature map size decreases with depth, layers near the input layer tend to have fewer filters while higher layers can have more. To equalize computation at each layer, the product of feature values v_a with pixel position is kept roughly constant across layers. Preserving more information about the input would require keeping the total number of activations (number of feature maps times number of pixel positions) non-decreasing from one layer to the next.

The number of feature maps directly controls the capacity and depends on the number of available examples and task complexity.

Ivan Lytvyn

edited on 21 Feb, 2022

Edits made to:

Article (+171 characters)

Article

Stride

The stride is the number of pixels that the analysis window moves on each iteration. A stride of 2 means that each kernel is offset by 2 pixels from its predecessor.

Ivan Lytvyn

edited on 21 Feb, 2022

Edits made to:

Article (+447 characters)

Article

Padding

Padding is the addition of (typically) 0-valued pixels on the borders of an image. This is done so that the border pixels are not undervalued (lost) from the output because they would ordinarily participate in only a single receptive field instance. The padding applied is typically one less than the corresponding kernel dimension. For example, a convolutional layer using 3x3 kernels would receive a 2-pixel pad on all sides of the image.

Ivan Lytvyn

edited on 21 Feb, 2022

Edits made to:

Article (+153/-15 characters)

Article

Hyperparameters

...

Kernel size

The kernel is the number of pixels processed together. It is typically expressed as the kernel's dimensions, e.g., 2x2, or 3x3.

Ivan Lytvyn

edited on 21 Feb, 2022

Edits made to:

Article (+173 characters)

Article

Hyperparameters

Hyperparameters are various settings that are used to control the learning process. CNNs use more hyperparameters than a standard multilayer perceptron (MLP).

Ivan Lytvyn

edited on 21 Feb, 2022

Edits made to:

Article (+1 characters)

Article

The Softmax loss function is used for predicting a single class of K mutually exclusive classes.[nb 3] Sigmoid cross-entropy loss is used for predicting K independent probability values in . Euclidean loss is used for regressing to real-valued labels .

Ivan Lytvyn

edited on 21 Feb, 2022

Edits made to:

Article (+513 characters)

Article

Loss layer

The "loss layer", or "loss function", specifies how training penalizes the deviation between the predicted output of the network, and the true data labels (during supervised learning). Various loss functions can be used, depending on the specific task.

Ivan Lytvyn

edited on 21 Feb, 2022

Edits made to:

Article (+465 characters)

Article

Fully connected layer

After several convolutional and max pooling layers, the final classification is done via fully connected layers. Neurons in a fully connected layer have connections to all activations in the previous layer, as seen in regular (non-convolutional) artificial neural networks. Their activations can thus be computed as an affine transformation, with matrix multiplication followed by a bias offset (vector addition of a learned or fixed bias term)

Ivan Lytvyn

edited on 21 Feb, 2022

Edits made to:

Article (+1 images) (+90 characters)

Article

RoI pooling to size 2x2. In this example region proposal (an input parameter) has size 7x5

...

Ivan Lytvyn

edited on 21 Feb, 2022

Edits made to:

Article (+639 characters)

Article

ReLU layer

ReLU is the abbreviation of rectified linear unit, which applies the non-saturating activation function . It effectively removes negative values from an activation map by setting them to zero. It introduces nonlinearities to the decision function and in the overall network without affecting the receptive fields of the convolution layers.

Other functions can also be used to increase nonlinearity, for example the saturating hyperbolic tangent and the sigmoid function . ReLU is often preferred to other functions because it trains the neural network several times faster without a significant penalty to generalization accuracy.

Ivan Lytvyn

edited on 21 Feb, 2022

Edits made to:

Article (+943 characters)

Article

In this case, every max operation is over 4 numbers. The depth dimension remains unchanged (this is true for other forms of pooling as well).

In addition to max pooling, pooling units can use other functions, such as average pooling or ℓ2-norm pooling. Average pooling was often used historically but has recently fallen out of favor compared to max pooling, which generally performs better in practice.

Due to the effects of fast spatial reduction of the size of the representation, there is a recent trend towards using smaller filters or discarding pooling layers altogether.

RoI pooling to size 2x2. In this example region proposal (an input parameter) has size 7x5.

"Region of Interest" pooling (also known as RoI pooling) is a variant of max pooling, in which output size is fixed and input rectangle is a parameter.

Pooling is an important component of convolutional neural networks for object detection based on the Fast R-CNN architecture.

Ivan Lytvyn

edited on 21 Feb, 2022

Edits made to:

Article (+330 characters)

Article

Intuitively, the exact location of a feature is less important than its rough location relative to other features. This is the idea behind the use of pooling in convolutional neural networks. The pooling layer serves to progressively reduce the spatial size of the representation, to reduce the number of parameters, memory footprint and amount of computation in the network, and hence to also control overfitting. This is known as down-sampling. It is common to periodically insert a pooling layer between successive convolutional layers (each one typically followed by an activation function, such as a ReLU layer) in a CNN architecture. While pooling layers contribute to local translation invariance, they do not provide global translation invariance in a CNN, unless a form of global pooling is used. The pooling layer commonly operates independently on every depth, or slice, of the input and resizes it spatially. A very common form of max pooling is a layer with filters of size 2×2, applied with a stride of 2, which subsamples every depth slice in the input by 2 along both width and height, discarding 75% of the activations:

Ivan Lytvyn

edited on 21 Feb, 2022

Edits made to:

Article (+1 images) (+851/-1137 characters)

Article

...

Max pooling with a 2x2 filter and stride = 2

...

Find more entities like Convolutional neural network

Use the Golden Query Tool to find similar entities by any field in the Knowledge Graph, including industry, location, and more.

Open Query Tool

Access by API

By using this site, you agree to our Terms of Service.