Skip to content




Parent class: Optimizer

Derived classes: -

This module implements a modification of the RMSProp, algorithm proposed in an article by Alex Graves.

In this algorithm, as in Adam, we introduce a term of sum that adds inertia to the optimization process.

\begin{equation} m_t = \alpha{m_{t-1}} + (1 - \alpha)g_t \end{equation}

\begin{equation} \upsilon_t = \alpha{\upsilon_{t-1}} + (1 - \alpha)g_t^2 \end{equation}

where \alpha - momentum exponential decay rate.

We also introduce a moving average for parameter update values:

\begin{equation} \Delta\theta_t = \gamma\Delta\theta_{t-1} - \eta\frac{g_t}{\sqrt{m_t - \upsilon_t^2 + \epsilon}} \end{equation}

\begin{equation} \theta_{t + 1} = \theta_t + \Delta\theta_t \end{equation}

where \gamma - parameter update exponential decay rate.


def __init__(self, learnRate=1e-4, alpha=0.95, momRate=0.9, epsilon=1e-4, nodeinfo=None):


Parameter Allowed types Description Default
learnRate float Learning rate 1e-4
alpha float Momentum exponential decay rate 0.95
momRate float Parameter update exponential decay rate 0.9
epsilon float Smoothing parameter 1e-4
nodeinfo NodeInfo Object containing information about the computational node None




Necessary imports:

import numpy as np
from PuzzleLib.Optimizers import RMSPropGraves
from PuzzleLib.Backend import gpuarray


gpuarray is required to properly place the tensor in the GPU.

Let us set up a synthetic training dataset:

data = gpuarray.to_gpu(np.random.randn(16, 128).astype(np.float32))
target = gpuarray.to_gpu(np.random.randn(16, 1).astype(np.float32))

Let us set up a synthetic training dataset:

optimizer = RMSPropGraves(learnRate=0.001)

Suppose there is already some net network defined, for example, through Graph, then in order to install the optimizer on the network, the following is required:

optimizer.setupOn(net, useGlobalState=True)


You can read more about optimizer methods and their parameters in the description of the Optimizer parent class

Moreover, let there be some loss error function, inherited from Cost, calculating its gradient as well. Then we get the implementation of the optimization process:

for i in range(100):
... predictions = net(data)
... error, grad = loss(predictions, target)

... optimizer.zeroGradParams()
... net.backward(grad)
... optimizer.update()

... if (i + 1) % 5 == 0:
...   print("Iteration #%d error: %s" % (i + 1, error))