Skip to content




Parent class: Module

Derived classes: -

This module implements the operation of two-dimensional instance normalization.

Instance normalization is used to train generative neural networks, as normalization by batch has a negative effect on their learning speed and the quality of the result. In batch normalization, statistics are computed over several images, which is why information about each individual image is lost.

Виды нормализации
Figure 1. Demonstration of the operating principles of various normalization types

The data tensor at the input to the module has the dimension (N, C, H, W), where N - size of the batch, C - number of maps (channels), H - height of the map, W - width of the map. Let us agree on the indexes: t - number of the batch element, i - number of the map, m - number of the feature map element in height, n - number of the feature map element in width. Then, for each separately taken combination of the t-th element of the batch and the i-th feature map:

\begin{equation} \mu_{ti} = \frac{1}{HW}\sum_{m=1}^{H} \sum_{m=1}^{W}x_{timn} \end{equation}
\begin{equation} \sigma_{ti}^2 = \frac{1}{HW}\sum_{m=1}^{H} \sum_{m=1}^{W}(x_{timn} - \mu_{ti})^2 \end{equation}
\begin{equation} \hat{x}_{timn} = \frac{x_{timn} - \mu_{ti}}{\sqrt{\sigma_{ti}^2 + \epsilon}} \end{equation}
\begin{equation} y_{timn} = \gamma\hat{x}_{timn} + \beta \end{equation}


\mu_{ti} - mathematical expectation of the attributes distribution for a single given combination of the t-th batch element and the i-th feature map;
\sigma_{ti}^2 - variance of the feature distribution for a single given combination of the t-th batch element and the i-th feature map;
x_{timn} - feature map element; \hat{x}_{timn} - normalized feature map element;
\epsilon - stabilizing constant that prevents division by zero;
\gamma - affine scale parameter;
\beta - affine bias parameter.

In practice, the restriction represented by zero mathematical expectation and unit variance can greatly limit the predictive ability of the network, therefore two more trained affine parameters are added: scale and bias, so that the algorithm can adjust the average and variance values for itself.

Additional sources

  • More about choosing the type of normalization: github
  • More about individual normalization: arxiv


def __init__(self, numOfMaps, epsilon=1e-5, affine=True, name=None):


Parent Allowed types Description Default
numOfMaps int Number of maps -
epsilon float Small bias coefficient 1e-5
affine bool Use of affine transformations True
name str Layer name None


affine - flag that controls whether scale and bias parameters of the batch normalization layer will be trained or fixed (1 and 0, respectively), so that the layer will perform only the operation of normalization by average and variance.


Necessary imports

import numpy as np
from PuzzleLib.Backend import gpuarray
from PuzzleLib.Modules import InstanceNorm2D


gpuarray is required to properly place the tensor in the GPU

batchsize, maps, h, w = 5, 3, 4, 4
data = gpuarray.to_gpu(np.random.randn(batchsize, maps, h, w).astype(np.float32))
instNorm2d = InstanceNorm2D(numOfMaps=maps)