Conv2D¶
Description¶
This module performs a two-dimensional convolution operation. For more detailed theoretical information about the convolution, please see ConvND.
For an input tensor of shape (N, C_{in}, H_{in}, W_{in}), an output one of shape (N, C_{out}, H_{out}, W_{out}) and a convolution kernel of size (size_h, size_w) the operation is performed as follows (we consider the i-th element of the batch and the j-th map of the output tensor): $$ out_i(C_{out_j}) = bias(C_{out_j}) + \sum_{k=0}^{C_{in} - 1}weight(C_{out_j}, k) \star input_i(k) $$
where
N - batch size;
C - number of maps in the tensor;
H - tensor map height;
W - tensor map width;
bias - bias tensor of the convolution layer, of shape (1, C_{out}, 1, 1);
weight - weights tensor of the convolution layer, of shape (C_{out}, C_{in}, size_h, size_w);
\star - cross correlation operator.
Initializing¶
def __init__(self, inmaps, outmaps, size, stride=1, pad=0, dilation=1, wscale=1.0, useBias=True, name=None, initscheme=None, empty=False, groups=1):
Parameters
Parameter | Allowed types | Description | Default |
---|---|---|---|
inmaps | int | Number of maps in the input tensor | - |
outmaps | int | Number of maps in the output tensor | - |
size | int | Convolution kernel size (the kernel is always equilateral) | - |
stride | Union[int, tuple] | Convolution stride | 1 |
pad | Union[int, tuple] | Map padding | 0 |
dilation | Union[int, tuple] | Convolution window dilation | 1 |
wscale | float | Random layer weights variance | 1.0 |
useBias | bool | Whether to use the bias vector | True |
initscheme | Union[tuple, str] | Specifies the layer weights initialization scheme (see createTensorWithScheme) | None -> ("xavier_uniform", "in") |
name | str | Layer name | None |
empty | bool | Whether to initialize the matrix of weights and biases | False |
groups | int | Number of groups the maps are split into for separate processing | 1 |
Explanations
Info
For the above input (N, C_{in}, H_{in}, W_{in}) and output (N, C_{out}, H_{out}, W_{out}) tensors there is a relation between their shapes: \begin{equation} H_{out} = \frac{H_{in} + 2pad_h - dil_h(size_h - 1) - 1}{stride_h} + 1 \end{equation}
\begin{equation} W_{out} = \frac{W_{in} + 2pad_w - dil_w(size_w - 1) - 1}{stride_w} + 1 \end{equation}
size
- filters are always square, i.e. (size_h, size_w)
, where size_h == size_w
;
stride
- possible to specify either a single value of the convolution stride in height and width, or a tuple (stride_h, stride_w)
, where stride_h
- is the value of the convolution stride along the height of the picture, and stride_w
- along the width;
pad
- possible to specify either a single padding value for all sides of the maps, or a tuple of the form (pad_h, pad_w)
, where pad_h
is the padding value on each side along the height of the image, and pad_w
is along the width. The possibility of creating an asymmetric padding (filling with additional elements on only one side of the tensor) is not provided for this module, please use Pad2D;
dilation
- possible to specify either a single dilation value for all sides of the convolution kernel or a tuple (dil_h, dil_w)
, where dil_h
is the filter dilation along the image height, and dil_w
is along the width;
groups
- number of groups into which the set of maps is split in order to be convoluted separately.
The general rule is (one should bear in mind that the values of the inmaps
and outmaps
parameters must be divided by the value of the groups
parameter): for every \frac{inmaps}{groups} input maps \frac{outmaps}{groups} output maps are formed. That is, we can say that we perform groups
independent convolutions. Special cases:
- if
groups=1
, then each output map interacts with all input maps, that is, a regular convolution occurs; - if
inmaps
==outmaps
==groups
, then a depthwise convolution occurs: one output map is formed from each input map (please see details in ConvND).
Thus, to obtain a full Depthwise Separable Convolution block, it is necessary to place two library layers of convolution in sequence:
- one depthwise convolution with parameters
inmaps
==outmaps
==groups
; - one pointwise convolution with a kernel of size 1.
Examples¶
Basic convolution example¶
Necessary imports
import numpy as np
from PuzzleLib.Backend import gpuarray
from PuzzleLib.Modules import Conv2D
from PuzzleLib.Variable import Variable
Info
gpuarray
is required to properly place the tensor in the GPU.
Let us set the tensor parameters so that we can clearly demonstrate the operation of the module: we will set the number of input and output maps equal to 2 and 2, respectively.
batchsize, inmaps, h, w = 1, 1, 5, 5
outmaps = 2
Synthetic tensor:
data = gpuarray.to_gpu(np.arange(batchsize * inmaps * h * w).reshape((batchsize, inmaps, h, w)).astype(np.float32))
print(data)
[[[[ 0. 1. 2. 3. 4.]
[ 5. 6. 7. 8. 9.]
[10. 11. 12. 13. 14.]
[15. 16. 17. 18. 19.]
[20. 21. 22. 23. 24.]]]]
Let us set the filter size to 2 and leave the rest of the convolution parameters default (stride=1
, pad=0
, dilation=1
, groups=1
). The use of bias will be explicitly disabled (although by default their tensor is zero and does not affect the final result):
size = 2
conv = Conv2D(inmaps=inmaps, outmaps=outsize, size=size, useBias=False)
At this point, a small hack was made to set weights explicitly. Since there are two output maps and the weights tensors are of shape (C_{out}, C_{in}, size_h, size_w):
def customW(size):
w1 = np.diag(np.full(size, 1)).reshape((1, 1, size, size))
w2 = np.flip(np.diag(np.arange(1, size + 1) * (-1)), 1).reshape((1, 1, size, size))
w = np.vstack([w1, w2]).astype(np.float32)
return w
w = customW(size)
print(w)
[[[[ 1. 0.]
[ 0. 1.]]]
[[[ 0. -1.]
[-2. 0.]]]]
Let us set the weights of the module:
conv.setVar("W", Variable(gpuarray.to_gpu(w)))
Important
We introduce the requirement that the module weights for all examples are set by the customW
function. For brevity sake, this point will be omitted in the code examples.
Let us perform the convolution operation on a synthetic tensor. Since padding was not specified, the size of the maps of the output tensor is smaller than of the input one:
conv(data)
print(conv.data)
[[[[ 6. 8. 10. 12.]
[ 16. 18. 20. 22.]
[ 26. 28. 30. 32.]
[ 36. 38. 40. 42.]]
[[-11. -14. -17. -20.]
[-26. -29. -32. -35.]
[-41. -44. -47. -50.]
[-56. -59. -62. -65.]]]]
Size parameter¶
Let us use the same thing as in the previous example, but set a different filter size:
conv = Conv2D(inmaps=inmaps, outmaps=outmaps, size=3, useBias=False)
conv(data)
print(conv.data)
[[[[ 18. 21. 24.]
[ 33. 36. 39.]
[ 48. 51. 54.]]
[[ -44. -50. -56.]
[ -74. -80. -86.]
[-104. -110. -116.]]]]
Pad parameter¶
We will use the parameters from the previous example, but let us suppose that we want to preserve the shape of the tensor. Considering that the filter size is 3, and the convolution stride is 1, then to preserve the size of 5x5 we need padding 1 on the each side, i.e. the initial tensor will look as follows:
[[[[ 0. 0. 0. 0. 0. 0. 0.]
[ 0. 0. 1. 2. 3. 4. 0.]
[ 0. 5. 6. 7. 8. 9. 0.]
[ 0. 10. 11. 12. 13. 14. 0.]
[ 0. 15. 16. 17. 18. 19. 0.]
[ 0. 20. 21. 22. 23. 24. 0.]
[ 0. 0. 0. 0. 0. 0. 0.]]]]
Let us reinitialize the convolution:
conv = Conv2D(inmaps=inmaps, outmaps=outmaps, size=3, pad=1, useBias=False)
conv(data)
print(conv.data)
[[[[ 6. 8. 10. 12. 4.]
[ 16. 18. 21. 24. 12.]
[ 26. 33. 36. 39. 22.]
[ 36. 48. 51. 54. 32.]
[ 20. 36. 38. 40. 42.]]
[[ 0. -17. -22. -27. -32.]
[ -11. -44. -50. -56. -57.]
[ -26. -74. -80. -86. -82.]
[ -41. -104. -110. -116. -107.]
[ -56. -59. -62. -65. -48.]]]]
The padding for the map width and height can be set different:
conv = Conv2D(inmaps=inmaps, outmaps=outmaps, size=3, pad=(1, 0), useBias=False)
conv(data)
print(conv.data)
[[[[ 8. 10. 12.]
[ 18. 21. 24.]
[ 33. 36. 39.]
[ 48. 51. 54.]
[ 36. 38. 40.]]
[[ -17. -22. -27.]
[ -44. -50. -56.]
[ -74. -80. -86.]
[-104. -110. -116.]
[ -59. -62. -65.]]]]
Stride parameter¶
Let us return to the default parameters and the 2x2 filter, but change the convolution stride:
conv = Conv2D(inmaps=inmaps, outmaps=outmaps, size=2, stride=2, useBias=False)
conv(data)
print(conv.data)
[[[[ 6. 10.]
[ 26. 30.]]
[[-11. -17.]
[-41. -47.]]]]
To preserve the shape of the initial tensor, we will have to set the padding to 3:
conv = Conv2D(inmaps=inmaps, outmaps=outmaps, size=2, stride=2, pad=3, useBias=False)
conv(data)
print(conv.data)
[[[[ 0. 0. 0. 0. 0.]
[ 0. 0. 2. 4. 0.]
[ 0. 10. 18. 22. 0.]
[ 0. 20. 38. 42. 0.]
[ 0. 0. 0. 0. 0.]]
[[ 0. 0. 0. 0. 0.]
[ 0. 0. -2. -6. 0.]
[ 0. -5. -29. -35. 0.]
[ 0. -15. -59. -65. 0.]
[ 0. 0. 0. 0. 0.]]]]
Like the pad
parameter, the stride
parameter can be set different for height and width:
conv = Conv2D(inmaps=inmaps, outmaps=outmaps, size=2, stride=(2, 4), pad=3, useBias=False)
conv(data)
print(conv.data)
[[[[ 0. 0. 0.]
[ 0. 2. 0.]
[ 0. 18. 0.]
[ 0. 38. 0.]
[ 0. 0. 0.]]
[[ 0. 0. 0.]
[ 0. -2. 0.]
[ 0. -29. 0.]
[ 0. -59. 0.]
[ 0. 0. 0.]]]]
Параметр dilation¶
The dilation
parameter dilates the convolution filters by inserting zero elements between the original filter values. For more information on the dilation, please see the theory in ConvND.
conv = Conv2D(inmaps=inmaps, outmaps=outmaps, size=2, stride=1, pad=0, dilation=2, useBias=False)
conv(data)
print(conv.data)
[[[[ 12. 14. 16.]
[ 22. 24. 26.]
[ 32. 34. 36.]]
[[-22. -25. -28.]
[-37. -40. -43.]
[-52. -55. -58.]]]]
The dilation
parameter can be set different for the convolution filter axes:
conv = Conv2D(inmaps=inmaps, outmaps=outmaps, size=2, stride=1, pad=0, dilation=(3, 1), useBias=False)
print(conv(data))
[[[[ 16. 18. 20. 22.]
[ 26. 28. 30. 32.]]
[[-31. -34. -37. -40.]
[-46. -49. -52. -55.]]]]
Groups parameter¶
For this example, printing tensors would lead to very lengthy constructions, so we will omit them.
In this example, the reinitialization of weights by the customW
function does not occur.
batchsize, inmaps, h, w = 1, 16, 5, 5
outmaps = 32
groups = 1
conv = Conv2D(inmaps, outmaps, size=2, initscheme="gaussian", groups=groups)
print(conv.W.shape)
(32, 16, 2, 2)
We can see that the result is an ordinary convolution. Let us change the number of groups:
groups = 4
conv = Conv2D(inmaps, outmaps, size=2, initscheme="gaussian", groups=groups)
print(conv.W.shape)
(32, 4, 2, 2)
It may not be obvious from the presented code, but now the convolution operation will proceed as follows: from the first \frac{inmaps}{groups}=4 input maps, \frac{outmaps}{groups}=8 output maps will be obtained, the same principle applies to the remaining fours.
To get the Depthwise Separable Convolution block:
from PuzzleLib.Containers import Sequential
batchsize, inmaps, h, w = 1, 3, 5, 5
outmaps = 32
seq = Sequential()
seq.append(Conv2D(inmaps, inmaps, size=4, initscheme="gaussian", groups=inmaps, name="depthwise"))
seq.append(Conv2D(inmaps, outmaps, size=1, initscheme="gaussian", name="pointwise"))
print(seq["depthwise"].W.shape)
(3, 1, 4, 4)
print(seq["pointwise"].W.shape)
(32, 3, 1, 1)
data = gpuarray.to_gpu(np.random.randn(batchsize, inmaps, h, w).astype(np.float32))
seq(data)