Activation

Description

Info

Parent class: Module

Derived classes: -

General information

This module implements the layer activation operation.

The activation function is a given mathematical operation on data that adds non-linearity to the transformation of a neural network.

Implemented Activation Functions


Sigmoid

It takes an arbitrary real number at the input, and at the output it returns a real number in the range from 0 to 1.

It is calculated by the formula:

\begin{equation}\label{eq:sigmoid} σ(x) = \frac{1}{(1 + e^{-x})} \end{equation}

Graph of the function:

сигмоида

Pros:

  • infinitely differentiable smooth function

Cons:

  • saturation of the sigmoid leads to damping of the gradient: in the process of back propagation of the error, the local gradient, which can be very small, is multiplied by the general one, and in this case it zeroes it. Because of this, the signal hardly passes through the neuron to its weights and, recursively, to its data;

  • the sigmoid output is not centered around zero: in subsequent layers, the neurons will receive values that are also not centered around zero, which affects the dynamics of gradient descent.


Tanh

It takes an arbitrary real number at the input, and at the output it returns a real number in the range from –1 to 1.

It is calculated by the formula:

\begin{equation}\label{eq:hyptan} th(x) =\frac{e^x-e^{-x}}{e^x+e^{-x}} \end{equation}

Graph of the function:

гиперболический тангенс

Pros:

  • infinitely differentiable smooth function
  • centered around zero

Cons:

  • saturation problem

ReLU

Implements a threshold transition at zero. It is defined on the interval [0, + ∞). At the moment, relu is one of the most popular activation functions that has many modifications.

It is calculated by the formula:

\begin{equation}\label{eq:relu} f(x) = \operatorname{max} (0, x) \end{equation}

Graphs of this function and its two modifications:

все виды relu

Pros:

  • computational simplicity
  • fast stochastic gradient convergence

Cons:

  • fading gradient (eliminated by choosing the appropriate learning speed)
  • not a smooth function

LeakyReLU

Modification of relu with the addition of a small constant α. When the argument is negative, the value of the function is quite close to zero, but not equal to it. It is defined on the interval (-∞, + ∞).

It is calculated by the formula:

\begin{equation} \begin{matrix} \ f(x) & = & \left\{ \begin{matrix} \alpha x & \mbox{if } x < 0 \\ x & \mbox{otherwise} \end{matrix} \right. \end{matrix} \end{equation}

The graph is shown in the image above.

Pros:

  • computational simplicity
  • fast stochastic gradient convergence

Cons:

  • fading gradient
  • not a smooth function

ELU

Smooth modification of relu and leakyRelu. It is defined on the interval [-α, + ∞).

It is calculated by the formula:

\begin{equation} \begin{matrix} \ f(x) & = & \left\{ \begin{matrix} x & \mbox{if } x > 0 \\ \alpha (e^x - 1) & \mbox{otherwise} \end{matrix} \right. \end{matrix} \end{equation}

The chart was given above.

Pros:

  • computational simplicity
  • fast stochastic gradient convergence
  • smooth function

Cons:

  • fading gradient

SoftPlus

Smooth activation function. It is defined on the interval [0, + ∞).

It is calculated by the formula:

\begin{equation}\label{eq:softplus} f(x) = ln(1+e^x) \end{equation}

It is calculated by the formula:

softPlus

Pros:

  • smooth infinitely differentiable function

Cons:

  • not centered around zero
  • fading gradient

Additional sources

  • Wikipedia article with a large comparative characteristic table of all popular activation functions.

Initializing

def __init__(self, activation, slc=None, inplace=False, name=None, args=()):

Parameters

Parameter Allowed types Description Default
activation str Type of activation -
slc slice The slice according to which the activation function will be calculated None
inplace bool If True, the output tensor will be written in memory in the place of the input tensor False
name str Layer name None

Explanations

activation - defines the selected activation function. Currently implemented activation functions are: "sigmoid", "tanh", "relu", "leakyRelu", "elu", "softPlus", "clip".


inplace - a flag showing whether additional memory resources should be allocated for the result. If True, then the output tensor will be written in the place of the input tensor in memory, which can negatively affect the network, if the input tensor takes part in calculations on other branches of the graph.

Examples

Necessary imports.

>>> import numpy as np
>>> from PuzzleLib.Backend import gpuarray
>>> from PuzzleLib.Modules import Activation

Info

gpuarray is required to properly place the tensor in the GPU.

Let us form the visual data.

>>> h, w = 3, 3
>>> data = gpuarray.to_gpu(np.random.randint(-10, 10, (h, w)).astype(np.float32))
>>> data
[[ 5.  9. -4.]
 [ 2.  5.  7.]
 [-1.  1.  2.]]

Suppose we are interested in a sigmoid as an activation function. Let us initialize the object and send data to it.

>>> act = Activation('sigmoid')
>>> act(data)
[[0.9933071  0.9998766  0.01798621]
 [0.8807971  0.9933071  0.999089  ]
 [0.26894143 0.7310586  0.8807971 ]]

ReLU:

>>> act = Activation("relu")
>>> act(data)
[[5. 9. 0.]
 [2. 5. 7.]
 [0. 1. 2.]]