KLDivergence¶

Description¶

The loss function that calculates the (Kullback–Leibler divergence). It reflects, like CrossEntropy, the measure of error in representing one density (real) of probabilities of another (predicted)

It is used in classification tasks.

The error function formula is:

$KL(P || Q) = \int\limits_{R^d} p(x)\log{\frac{p(x)}{q(x)}}dx$

where

$P, Q$ - continuous random variables in the $R^d$ space;
$KL(P || Q)$ - Kullback-Leibler divergence for distributions P and Q;
$p(x), q(x)$ - distribution densities of $P$ and $Q$ respectively.

Connection with entropy and cross entropy:

$KL(P || Q) = \int p(x)\log{\frac{p(x)}{q(x)}}dx = \int p(x)\log{p(x)}dx - \int p(x)\log{q(x)}dx = H(p) + H(p, q)$

where

$H(p)$ - entropy of the distribution of $P$ ;
$H(p, q)$ - cross entropy of the distributions $P$ and $Q$ .

Initializing¶

def __init__(self, maxlabels=None, normTarget=False):

Parametrs

Parameter	Allowed types	Description	Default
maxlabels	int	Index of the last possible class	None
normTarget	bool	Whether to normalize the target distribution	False

Explanations

maxlabels - needed for additional verification when working with loaded target labels, i.e. if the target labels contain values larger than the value passed in this argument, the class will throw an error;

normTarget - when this flag is set, the values of the target class tensor will be normalized by the softmax function, that is, if the target tensor is received in the "raw" form with values $x_i\in{R}$ , then with the flag set: $x_i\in[0, 1]$ , $\sum_{i=0}^N x_i = 1$ .

Examples¶

Necessary imports:

import numpy as np
from PuzzleLib.Backend import gpuarray
from PuzzleLib.Cost import KLDivergence

Info

gpuarray required to properly place the tensor in the GPU

Synthetic target and prediction tensors:

scores = gpuarray.to_gpu(np.random.randn(10, 10).astype(np.float32))
labels = gpuarray.to_gpu(np.random.randn(10, 10).astype(np.float32))

Important

Please remember that the first dimension of target and prediction tensors is the size of the batch.

Initializing the error function:

div = KLDivergence(normTarget=True)

Calculating the error and the gradient on the batch:

error, grad = div(pred, target)