Skip to content

Conv2D

Description

Info

Parent class: ConvND

Derived classes: -

This module performs a two-dimensional convolution operation. For more detailed theoretical information about the convolution, please see ConvND.

For an input tensor of shape (N, C_{in}, H_{in}, W_{in}), an output one of shape (N, C_{out}, H_{out}, W_{out}) and a convolution kernel of size (size_h, size_w) the operation is performed as follows (we consider the i-th element of the batch and the j-th map of the output tensor): $$ out_i(C_{out_j}) = bias(C_{out_j}) + \sum_{k=0}^{C_{in} - 1}weight(C_{out_j}, k) \star input_i(k) $$

where

N - batch size;

C - number of maps in the tensor;

H - tensor map height;

W - tensor map width;

bias - bias tensor of the convolution layer, of shape (1, C_{out}, 1, 1);

weight - weights tensor of the convolution layer, of shape (C_{out}, C_{in}, size_h, size_w);

\star - cross correlation operator.

Initializing

def __init__(self, inmaps, outmaps, size, stride=1, pad=0, dilation=1, wscale=1.0, useBias=True, name=None, initscheme=None, empty=False, groups=1):

Parameters

Parameter Allowed types Description Default
inmaps int Number of maps in the input tensor -
outmaps int Number of maps in the output tensor -
size int Convolution kernel size (the kernel is always equilateral) -
stride Union[int, tuple] Convolution stride 1
pad Union[int, tuple] Map padding 0
dilation Union[int, tuple] Convolution window dilation 1
wscale float Random layer weights variance 1.0
useBias bool Whether to use the bias vector True
initscheme Union[tuple, str] Specifies the layer weights initialization scheme (see createTensorWithScheme) None -> ("xavier_uniform", "in")
name str Layer name None
empty bool Whether to initialize the matrix of weights and biases False
groups int Number of groups the maps are split into for separate processing 1

Explanations

Info

For the above input (N, C_{in}, H_{in}, W_{in}) and output (N, C_{out}, H_{out}, W_{out}) tensors there is a relation between their shapes: \begin{equation} H_{out} = \frac{H_{in} + 2pad_h - dil_h(size_h - 1) - 1}{stride_h} + 1 \end{equation}

\begin{equation} W_{out} = \frac{W_{in} + 2pad_w - dil_w(size_w - 1) - 1}{stride_w} + 1 \end{equation}


size - filters are always square, i.e. (size_h, size_w), where size_h == size_w;


stride - possible to specify either a single value of the convolution stride in height and width, or a tuple (stride_h, stride_w), where stride_h - is the value of the convolution stride along the height of the picture, and stride_w - along the width;


pad - possible to specify either a single padding value for all sides of the maps, or a tuple of the form (pad_h, pad_w), where pad_h is the padding value on each side along the height of the image, and pad_w is along the width. The possibility of creating an asymmetric padding (filling with additional elements on only one side of the tensor) is not provided for this module, please use Pad2D;


dilation - possible to specify either a single dilation value for all sides of the convolution kernel or a tuple (dil_h, dil_w), where dil_h is the filter dilation along the image height, and dil_w is along the width;


groups - number of groups into which the set of maps is split in order to be convoluted separately.

The general rule is (one should bear in mind that the values of the inmaps and outmaps parameters must be divided by the value of the groups parameter): for every \frac{inmaps}{groups} input maps \frac{outmaps}{groups} output maps are formed. That is, we can say that we perform groups independent convolutions. Special cases:

  • if groups=1, then each output map interacts with all input maps, that is, a regular convolution occurs;
  • if inmaps == outmaps == groups, then a depthwise convolution occurs: one output map is formed from each input map (please see details in ConvND).

Thus, to obtain a full Depthwise Separable Convolution block, it is necessary to place two library layers of convolution in sequence:

  • one depthwise convolution with parameters inmaps == outmaps == groups;
  • one pointwise convolution with a kernel of size 1.

Examples


Basic convolution example


Necessary imports

import numpy as np
from PuzzleLib.Backend import gpuarray
from PuzzleLib.Modules import Conv2D
from PuzzleLib.Variable import Variable

Info

gpuarray is required to properly place the tensor in the GPU.

Let us set the tensor parameters so that we can clearly demonstrate the operation of the module: we will set the number of input and output maps equal to 2 and 2, respectively.

batchsize, inmaps, h, w = 1, 1, 5, 5
outmaps = 2

Synthetic tensor:

data = gpuarray.to_gpu(np.arange(batchsize * inmaps * h * w).reshape((batchsize, inmaps, h, w)).astype(np.float32))
print(data)
[[[[ 0.  1.  2.  3.  4.]
   [ 5.  6.  7.  8.  9.]
   [10. 11. 12. 13. 14.]
   [15. 16. 17. 18. 19.]
   [20. 21. 22. 23. 24.]]]]

Let us set the filter size to 2 and leave the rest of the convolution parameters default (stride=1, pad=0, dilation=1, groups=1). The use of bias will be explicitly disabled (although by default their tensor is zero and does not affect the final result):

size = 2
conv = Conv2D(inmaps=inmaps, outmaps=outsize, size=size, useBias=False)

At this point, a small hack was made to set weights explicitly. Since there are two output maps and the weights tensors are of shape (C_{out}, C_{in}, size_h, size_w):

def customW(size):
  w1 = np.diag(np.full(size, 1)).reshape((1, 1, size, size))
  w2 = np.flip(np.diag(np.arange(1, size + 1) * (-1)), 1).reshape((1, 1, size, size))
  w = np.vstack([w1, w2]).astype(np.float32)

  return w
w = customW(size)
print(w)
[[[[ 1.  0.]
   [ 0.  1.]]]

 [[[ 0. -1.]
   [-2.  0.]]]]

Let us set the weights of the module:

conv.setVar("W", Variable(gpuarray.to_gpu(w)))

Important

We introduce the requirement that the module weights for all examples are set by the customW function. For brevity sake, this point will be omitted in the code examples.

Let us perform the convolution operation on a synthetic tensor. Since padding was not specified, the size of the maps of the output tensor is smaller than of the input one:

conv(data)
print(conv.data)
[[[[  6.   8.  10.  12.]
   [ 16.  18.  20.  22.]
   [ 26.  28.  30.  32.]
   [ 36.  38.  40.  42.]]

  [[-11. -14. -17. -20.]
   [-26. -29. -32. -35.]
   [-41. -44. -47. -50.]
   [-56. -59. -62. -65.]]]]

Size parameter


Let us use the same thing as in the previous example, but set a different filter size:

conv = Conv2D(inmaps=inmaps, outmaps=outmaps, size=3, useBias=False)
conv(data)
print(conv.data)
[[[[  18.   21.   24.]
   [  33.   36.   39.]
   [  48.   51.   54.]]

  [[ -44.  -50.  -56.]
   [ -74.  -80.  -86.]
   [-104. -110. -116.]]]]

Pad parameter


We will use the parameters from the previous example, but let us suppose that we want to preserve the shape of the tensor. Considering that the filter size is 3, and the convolution stride is 1, then to preserve the size of 5x5 we need padding 1 on the each side, i.e. the initial tensor will look as follows:

[[[[ 0.  0.  0.  0.  0.  0.  0.]
   [ 0.  0.  1.  2.  3.  4.  0.]
   [ 0.  5.  6.  7.  8.  9.  0.]
   [ 0. 10. 11. 12. 13. 14.  0.]
   [ 0. 15. 16. 17. 18. 19.  0.]
   [ 0. 20. 21. 22. 23. 24.  0.]
   [ 0.  0.  0.  0.  0.  0.  0.]]]]

Let us reinitialize the convolution:

conv = Conv2D(inmaps=inmaps, outmaps=outmaps, size=3, pad=1, useBias=False)
conv(data)
print(conv.data)

[[[[   6.    8.   10.   12.    4.]
   [  16.   18.   21.   24.   12.]
   [  26.   33.   36.   39.   22.]
   [  36.   48.   51.   54.   32.]
   [  20.   36.   38.   40.   42.]]

  [[   0.  -17.  -22.  -27.  -32.]
   [ -11.  -44.  -50.  -56.  -57.]
   [ -26.  -74.  -80.  -86.  -82.]
   [ -41. -104. -110. -116. -107.]
   [ -56.  -59.  -62.  -65.  -48.]]]]

The padding for the map width and height can be set different:

conv = Conv2D(inmaps=inmaps, outmaps=outmaps, size=3, pad=(1, 0), useBias=False)
conv(data)
print(conv.data)

[[[[   8.   10.   12.]
   [  18.   21.   24.]
   [  33.   36.   39.]
   [  48.   51.   54.]
   [  36.   38.   40.]]

  [[ -17.  -22.  -27.]
   [ -44.  -50.  -56.]
   [ -74.  -80.  -86.]
   [-104. -110. -116.]
   [ -59.  -62.  -65.]]]]

Stride parameter


Let us return to the default parameters and the 2x2 filter, but change the convolution stride:

conv = Conv2D(inmaps=inmaps, outmaps=outmaps, size=2, stride=2, useBias=False)
conv(data)
print(conv.data)
[[[[  6.  10.]
   [ 26.  30.]]

  [[-11. -17.]
   [-41. -47.]]]]

To preserve the shape of the initial tensor, we will have to set the padding to 3:

conv = Conv2D(inmaps=inmaps, outmaps=outmaps, size=2, stride=2, pad=3, useBias=False)
conv(data)
print(conv.data)
[[[[  0.   0.   0.   0.   0.]
   [  0.   0.   2.   4.   0.]
   [  0.  10.  18.  22.   0.]
   [  0.  20.  38.  42.   0.]
   [  0.   0.   0.   0.   0.]]

  [[  0.   0.   0.   0.   0.]
   [  0.   0.  -2.  -6.   0.]
   [  0.  -5. -29. -35.   0.]
   [  0. -15. -59. -65.   0.]
   [  0.   0.   0.   0.   0.]]]]

Like the pad parameter, the stride parameter can be set different for height and width:

conv = Conv2D(inmaps=inmaps, outmaps=outmaps, size=2, stride=(2, 4), pad=3, useBias=False)
conv(data)
print(conv.data)
[[[[  0.   0.   0.]
   [  0.   2.   0.]
   [  0.  18.   0.]
   [  0.  38.   0.]
   [  0.   0.   0.]]

  [[  0.   0.   0.]
   [  0.  -2.   0.]
   [  0. -29.   0.]
   [  0. -59.   0.]
   [  0.   0.   0.]]]]

Параметр dilation


The dilation parameter dilates the convolution filters by inserting zero elements between the original filter values. For more information on the dilation, please see the theory in ConvND.

conv = Conv2D(inmaps=inmaps, outmaps=outmaps, size=2, stride=1, pad=0, dilation=2, useBias=False)
conv(data)
print(conv.data)
[[[[ 12.  14.  16.]
   [ 22.  24.  26.]
   [ 32.  34.  36.]]

  [[-22. -25. -28.]
   [-37. -40. -43.]
   [-52. -55. -58.]]]]

The dilation parameter can be set different for the convolution filter axes:

conv = Conv2D(inmaps=inmaps, outmaps=outmaps, size=2, stride=1, pad=0, dilation=(3, 1), useBias=False)
print(conv(data))
[[[[ 16.  18.  20.  22.]
   [ 26.  28.  30.  32.]]

  [[-31. -34. -37. -40.]
   [-46. -49. -52. -55.]]]]

Groups parameter


For this example, printing tensors would lead to very lengthy constructions, so we will omit them.

In this example, the reinitialization of weights by the customW function does not occur.

batchsize, inmaps, h, w = 1, 16, 5, 5
outmaps = 32
groups = 1
conv = Conv2D(inmaps, outmaps, size=2, initscheme="gaussian", groups=groups)
print(conv.W.shape)
(32, 16, 2, 2)

We can see that the result is an ordinary convolution. Let us change the number of groups:

groups = 4
conv = Conv2D(inmaps, outmaps, size=2, initscheme="gaussian", groups=groups)
print(conv.W.shape)
(32, 4, 2, 2)

It may not be obvious from the presented code, but now the convolution operation will proceed as follows: from the first \frac{inmaps}{groups}=4 input maps, \frac{outmaps}{groups}=8 output maps will be obtained, the same principle applies to the remaining fours.

To get the Depthwise Separable Convolution block:

from PuzzleLib.Containers import Sequential
batchsize, inmaps, h, w = 1, 3, 5, 5
outmaps = 32
seq = Sequential()
seq.append(Conv2D(inmaps, inmaps, size=4, initscheme="gaussian", groups=inmaps, name="depthwise"))
seq.append(Conv2D(inmaps, outmaps, size=1, initscheme="gaussian", name="pointwise"))
print(seq["depthwise"].W.shape)
(3, 1, 4, 4)
print(seq["pointwise"].W.shape)
(32, 3, 1, 1)
data = gpuarray.to_gpu(np.random.randn(batchsize, inmaps, h, w).astype(np.float32))
seq(data)