BatchNorm2D¶
Description¶
This module implements the batch normalization operation for four-dimensional tensors of shape (batchsize, maps, h, w)
, for example, after 2-dimensional convolutional layers. For more detailed theoretical information see BatchNormND.
Инициализация¶
def __init__(self, numOfMaps, epsilon=1e-5, initFactor=1.0, minFactor=0.1, sscale=0.01, affine=True, name=None, empty=False, inplace=False):
Parameters
Parameter | Allowed types | Description | Default |
---|---|---|---|
size | int | Number of input features | - |
epsilon | float | Stabilizing constant | 1e-5 |
initFactor | float | Initial factor value in the moving average | 1.0 |
minFactor | float | Minimal factor value in the moving average | 0.1 |
sscale | float | Dispersion of the Gaussian distribution for the scale parameter of batch normalization |
0.01 |
affine | bool | If True, the layer will have trainable affine parameters scale and bias | True |
name | str | Layer name | None |
empty | bool | If True, the tensors of the parameters of the module will not be initialized | False |
inplace | bool | If True, the output tensor will be written in memory in the place of the input tensor | False |
Explanations
Note
In order to understand how initFactor
and minFactor
are used, it should be mentioned that the principle of the moving average is used in the calculation of statistical parameters in the module, i.e.:
\begin{equation}
\hat{\mu} = \alpha\hat{\mu} + (1 - \alpha)\mu
\end{equation}
\begin{equation}
\hat{\sigma}^2 = \alpha\hat{\sigma}^2 + (1 - \alpha)\sigma^2
\end{equation}
where
\hat{\mu}, \mu - moving average and average on the batch, respectively;
\hat{\sigma^2}, \sigma^2 - moving variance and variance on the batch, respectively;
\alpha - conservation factor.
\alpha in the module is calculated as follows: $$ \alpha = max(\frac{IF}{n}, MF) $$
where
IF - initFactor
, MF - minFactor
, n - batch number.
size
- number of input features, i.e. the maps
axis of the tensor of shape (batchsize, maps, h, w)
;
epsilon
- small number added to prevent division by zero during the normalization of features (see the theory in BatchNormND);
affine
- flag that is responsible for whether the scale
and bias
parameters of the batch normalization layer will be trained, or they will be fixed (1 and 0, respectively), so that the layer will only perform normalization by the average and the variance;
inplace
- flag showing whether additional memory resources should be allocated for the result. If True, then the output tensor will be written to the place of the input one in memory, which can negatively affect the network if the input tensor should take part in calculations on other branches of the graph.
Examples¶
Necessary imports.
import numpy as np
from PuzzleLib.Backend import gpuarray
from PuzzleLib.Modules import BatchNorm2D
Info
gpuarray
is required to properly place the tensor in the GPU.
For this module, showing tensors on the screen will not bring clarity, therefore, only the code will be given in this example. For visualized examples, please see BatchNorm or BatchNorm1D
Let us create a synthetic data tensor:
batchsize, maps, h, w = 5, 3, 10, 10
data = gpuarray.to_gpu(np.arange(batchsize * maps * h * w).reshape(batchsize, maps, h, w).astype(np.float32))
We initialize the class object with default parameters and apply it to the data:
bn = BatchNorm2D(maps)
bn(data)
We can see the calculated mean and variance:
print(bn.mean)
print(bn.var)
As well as the initialized scales and default shifts (the scales are randomly normally distributed, the shifts are zero):
print(bn.scale)
print(bn.bias)