DeconvND

Description

Info

Parent class: Module

Derived classes: Deconv1D, Deconv2D, Deconv3D

General information

This module performs the operation of n-dimensional transposed convolution (inverse convolution, fractionally-strided convolution). The name "deconvolution", although established, is not an exact description of the operation.

When a convolution operation is performed (see ConvND) over a data tensor, a certain amount of information is irretrievably lost, which makes it possible to construct several versions of operations that perform a roughly opposite action. The transposed convolution operation is one of them.

Let us presume there is a map I_{4x4}:

I = \begin{pmatrix} a_1 & a_2 & a_3 & a_4 \\ b_1 & b_2 & b_3 & b_4 \\ c_1 & c_2 & c_3 & c_4 \\ d_1 & d_2 & d_3 & d_4 \\ \end{pmatrix}

As well as the convolution kernel W_{3x3}:

W = \begin{pmatrix} w_{00} & w_{01} & w_{02} \\ w_{10} & w_{11} & w_{12} \\ w_{20} & w_{21} & w_{22} \\ \end{pmatrix}

Then we can represent the convolution kernel as a convolution matrix: C_{4x16}:

C = \begin{pmatrix} w_{00} & w_{01} & w_{02} & 0 & w_{10} & w_{11} & w_{12} & 0 & w_{20} & w_{21} & w_{22} & 0 & 0 & 0 & 0 & 0 \\ 0 & w_{00} & w_{01} & w_{02} & 0 &w_{10} & w_{11} & w_{12} & 0 & w_{20} & w_{21} & w_{22} & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & w_{00} & w_{01} & w_{02} & 0 & w_{10} & w_{11} & w_{12} & 0 & w_{20} & w_{21} & w_{22} & 0 \\ 0 & 0 & 0 & 0 & 0 & w_{00} & w_{01} & w_{02} & 0 & w_{10} & w_{11} & w_{12} & 0 & w_{20} & w_{21} & w_{22} \\ \end{pmatrix}

If we multiply this matrix by a flattened map \text{vec}(I): $$ \text{vec}(I) = \begin{pmatrix} a_1 & a_2 & a_3 & a_4 & b_1 & b_2 & b_3 & b_4 & c_1 & c_2 & c_3 & c_4 & d_1 & d_2 & d_3 & d_4 \ \end{pmatrix}^T $$

we will get a flattened output map \text{vec}(O): $$ \text{vec}(O) = \begin{pmatrix} m_1 & m_2 & m_3 & m_4 \ \end{pmatrix}^{T} $$

which is then converted to a full output map O_{2x2}:

$$ O = \begin{pmatrix} m_1 & m_2 \ m_3 & m_4 \ \end{pmatrix} $$ \hat{I}_{4x4} But, as one can see, another operation is also possible: we can restore the \hat{I}_{4x4} map from the O_{2x2} map through multiplying \text{vec}(O) by the transposed convolution matrix:

\text{vec}(\hat{I}) = C^T\text{vec}(O)

Hence the name of this operation: transposed convolution.

Unlike the Upsample module, this module is trainable, therefore, significant loss of information in the recoverable elements can be avoided.

Operation Parameters

The following parameters and objects characterize the convolution operation:


Convolution kernel size size


If you look at the above example, the convolution kernel is the tensor W^T. The convolution kernel is characterized by its size, shape and a set of values of its elements. For convolution layers of a neural network, a set of values of the kernel elements is represented by the weights that are a trained parameter.

Important

Within this library, the shape of convolution kernels is always equilateral, i.e. a square for two-dimensional convolution and a cube for three-dimensional one.


Convolution stride stride


Within the transposed convolution operation, the stride parameter means the stride of the direct convolution, the use of which would lead to the given O tensor. For more information on this parameter for a direct convolution, please see ConvND.

Let us take a two-dimensional convolution with a kernel of size 3 and a stride of 2 (Figure 1):

Двумерная свёртка с увеличенным шагом
Figure 1. Two-dimensional convolution (size = 3, stride = 2)

To perform the deconvolution restoring the original 5x5 map, we will need to “divide” its stride by adding zero elements between the tensor elements obtained after the direct convolution, hence the second name of this operation is fractionally-strided convolution:

Двумерная свёртка с увеличенным шагом
Figure 2. Two-dimensional deconvolution (size = 3, stride = 2)


Padding pad


Within the transposed convolution operation, the padding parameter means the padding of the initial tensor before the direct convolution, the use of which would lead to the given O tensor. For more information on this parameter for a direct convolution, please see ConvND.

To understand the principle, we need to look at the parallel operation of the direct and inverse convolutions. For example, if pad = 2 is used for a direct convolution with size = 4, stride = 1 parameters on a 5x5 size map, the resulting map will be of 6x6 size (see Figure 3). That is, to perform an inverse convolution (see Figure 4), it is necessary to understand, which parameters of the direct convolution have led to the current tensor.

Двумерная прямая свёртка с увеличенным паддингом
Figure 3. Two-dimensional direct convolution (size = 4, stride = 1, pad = 2)

Двумерная прямая свёртка с увеличенным паддингом
Figure 4. Two-dimensional inverse convolution (size = 4, stride = 1, pad = 2)

If we use the same parameters as those, which preserve the tensor shape in the direct convolution, we can expect to get the identical demonstration of the inverse convolution operation:

Двумерные прямая и обратная свёртки с получением тензора того же размера
Figure 5. Two-dimensional forward and inverse convolution (size = 3, stride = 1, pad = 1)


Dilation dilation


The dilation parameter determines the number of times by which the size of the convolution kernel will be increased. Therewith, the kernel elements are moved apart by a specified number, while the resulting empty values are filled with zeros.

Разреженная свёртка (анимация)
Figure 6. Dilated convolution of a map of size 7 (size = 3, stride = 1, dilation = 1)

A nice feature of this technique is that it is cheap in terms of computation. That is, we use convolutions of a much larger dimension, increase the sensitivity fields, being able to track more global features - but without burdening the hardware.


Number of connections between the input and output maps groups


-

Additional sources

  • Differences between backward convolution and upsample operations: link;
  • Theano Convolution Arithmetic Theory: link;
  • Visual demonstration of the transposed convolution: link.

Initializing

def __init__(self, nd, inmaps, outmaps, size, stride=1, pad=0, dilation=1, wscale=1.0, useBias=True, name=None,
                 initscheme=None, empty=False, groups=1):

Parameters

Parameter Allowed types Description Default
nd int Dimension of the operation -
inmaps int Number of maps in the input tensor -
outmaps int Number of maps in the output tensor -
size int Convolution kernel size -
stride int, tuple Convolution stride 1
pad int, tuple Map padding 0
dilation int Convolution window dilation 1
wscale float Random layer weights variance 1.0
useBias bool Whether to use the bias vector True
initscheme Union[tuple, str] Specifies the layer weights initialization scheme (see createTensorWithScheme). None -> ("xavier_uniform", "in")
name str Layer name None
empty bool Whether to initialize the matrix of weights and biases. False
groups int Number of groups the maps are split into for separate processing 1

Explanations

Please see the derived classes.