BatchNorm¶

Description¶

Info

Parent class: Module

Derived classes: -

This module implements the batch normalization operation for two-dimensional tensors of shape (batchsize, insize), for example, after fully connected layers. For more detailed theoretical information see BatchNormND.

Initializing¶

def __init__(self, size, epsilon=1e-5, initFactor=1.0, minFactor=0.1, sscale=0.01, affine=True, name=None,
                 empty=False, inplace=False):

Parameters

Parameter	Allowed types	Description	Default
size	int	Number of input features	-
epsilon	float	Stabilizing constant	1e-5
initFactor	float	Initial factor value in the moving average	1.0
minFactor	float	Minimal factor value in the moving average	0.1
sscale	float	Dispersion of the Gaussian distribution for the `scale` parameter of batch normalization	0.01
affine	bool	If True, the layer will have trainable affine parameters scale and bias	True
name	str	Layer name	None
empty	bool	If True, the tensors of the parameters of the module will not be initialized	False
inplace	bool	If True, the output tensor will be written in memory in the place of the input tensor	False

Explanations

Note

In order to understand how initFactor and minFactor are used, it should be mentioned that the principle of the moving average is used in the calculation of statistical parameters in the module, i.e.: \begin{equation} \hat{\mu} = \alpha\hat{\mu} + (1 - \alpha)\mu \end{equation} \begin{equation} \hat{\sigma}^2 = \alpha\hat{\sigma}^2 + (1 - \alpha)\sigma^2 \end{equation}

where

$\hat{\mu}$ , $\mu$ - moving average and average on the batch, respectively;
$\hat{\sigma^2}$ , $\sigma^2$ - moving variance and variance on the batch, respectively;
$\alpha$ - conservation factor.

$\alpha$ in the module is calculated as follows: $$ \alpha = max(\frac{IF}{n}, MF) $$

where

$IF$ - initFactor, $MF$ - minFactor, $n$ - batch number.

size is the number of input features, i.e. the insize axis of the tensor of shape (batchsize, insize);

epsilon - small number added to prevent division by zero during the normalization of features (see the theory in BatchNormND);

affine - flag that is responsible for whether the scale and bias parameters of the batch normalization layer will be trained, or they will be fixed (1 and 0, respectively), so that the layer will only perform normalization by the average and the variance;

inplace - flag showing whether additional memory resources should be allocated for the result. If True, then the output tensor will be written in the place of the input one in memory, which can negatively affect the network if the input tensor should take part in calculations on other branches of the graph.

Examples¶

Necessary imports.

import numpy as np
from PuzzleLib.Backend import gpuarray
from PuzzleLib.Modules import BatchNorm

Info

gpuarray is required to properly place the tensor in the GPU.

Let us create a synthetic data tensor that is convenient for demonstrating the operation of the module:

batchsize, insize = 3, 6
data = gpuarray.to_gpu(np.arange(batchsize * insize).reshape(batchsize, insize).astype(np.float32))
print(data)

[[ 0.  1.  2.  3.  4.  5.]
 [ 6.  7.  8.  9. 10. 11.]
 [12. 13. 14. 15. 16. 17.]]

Let us initialize the class object with default parameters and apply it to the data:

bn = BatchNorm(insize)
bn(data)

We can see the calculated mean and variance:

print(bn.mean)

[[[[ 6.]]
  [[ 7.]]
  [[ 8.]]
  [[ 9.]]
  [[10.]]
  [[11.]]]]

print(bn.var)

[[[[36.]]
  [[36.]]
  [[36.]]
  [[36.]]
  [[36.]]
  [[36.]]]]

Initialized default scales and shifts (scales are randomly normally distributed, shifts are zero):

print(bn.scale)

[[[[1.0097325]]
  [[0.9922849]]
  [[0.9869034]]
  [[1.0015255]]
  [[1.0024816]]
  [[0.9988528]]]]

print(bn.bias)

[[[[0.]]
  [[0.]]
  [[0.]]
  [[0.]]
  [[0.]]
  [[0.]]]]

The final form of the data passed through the module:

print(bn.data)

[[-1.1968222 -1.2156036 -1.2205907 -1.2226099 -1.2302258 -1.2214108]
 [ 0.         0.         0.         0.         0.         0.       ]
 [ 1.1968222  1.2156036  1.2205907  1.2226099  1.2302258  1.2214108]]