GroupLinear

Description

Info

Parent class: Module

Derived classes: -

This module is a group modification of a fully connected Linear layer: while a regular fully connected layer takes a vector and returns a vector, a group linear layer can return several vectors obtained using independent weights.

Dimension

While for an ordinary fully connected layer the input shape is (N, L_{in}), and the outputs are shaped (N, L_{out}), for a group modification the input shape is (N, G, L_{in}) and the output shape is (N, G, L_{out}) (or (G, N, L_{in}) and (G, N, L_{out}) - respectively, depending on the batchDim parameter), where N - batch size, G - number of groups, L_{in} - size of the input feature vector, L_{out} - size of the output feature vector.

Initializing

def __init__(self, groups, insize, outsize, wscale=1.0, useW=True, useBias=True, initscheme=None,
                 inmode="full", wmode="full", batchDim=0, name=None, empty=False, transpW=False):

Parameters

Parameter Allowed types Description Default
groups int Number of groups -
insize int Input vector size -
outsize int Output vector size -
wscale float Variance of random layer weights 1.0
useW bool Variance of random layer weights True
useBias bool Whether to use biases True
initscheme Union[tuple, str] Specifies the initialization scheme of the layer weights (see createTensorWithScheme) None -> ("xavier_uniform", "in")
inmode str Input mode. Possible values: full and one full
wmode str Weights mode. Possible values: full and one full
batchDim int Batch axis position 0
name str Layer name None
empty bool Whether to initialize the matrix of weights and biases False
transpW bool Whether to use a transposed matrix of weights False

Explanations

groups - parameter that controls the connections between inputs and outputs; when groups = 1, we get a special case of a regular fully connected layer;


inmode - if one, one input vector is multiplied to groups outputs using independent weights, i.e. (N, 1, L_{in}) \to (N, G, L_{out}); if full, the module works in the normal mode: (N, G, L_{in}) \to (N, G, L_{out});


wmode - if one, groups input vectors will form groups outputs using the same weights; if full, the module works in the normal mode.


batchDim - by default, the batch size axis comes first: (N, G, L_{in}), however it is possible to swap it with the group axis, by setting batchDim=1: (G, N, L_{in}).

Examples


Basic example


Necessary imports.

import numpy as np
from PuzzleLib.Backend import gpuarray
from PuzzleLib.Modules import GroupLinear

Info

gpuarray is required to properly place the tensor in the GPU.

np.random.seed(123)
batchsize, groups, insize = 1, 2, 3
data = gpuarray.to_gpu(np.random.randint(0, 9, (batchsize, groups, insize)).astype(np.float32))
print(data)
[[[2. 2. 6.]
  [1. 3. 6.]]]
print(data.shape)
(1, 2, 3)

Let us initialize the module with default parameters (useW=True, useBias=True, inmode="full", wmode="full", batchDim=0) and fill the weights tensor with custom values to make the demonstration of the module operation more convenient:

outsize = 4
grpLinear = GroupLinear(groups, insize, outsize)
print(grpLinear.W.shape)
(2, 3, 4)
grpLinear.W[0].fill(1)
grpLinear.W[1].fill(-1)

print(grpLinear(data))
[[[ 10.  10.  10.  10.]
  [-10. -10. -10. -10.]]]
print(grpLinear.data.shape)
(1, 2, 4)


wmode parameter


Let us change the wmode parameter:

grpLinear = GroupLinear(groups, insize, outsize, wmode="one")
print(grpLinear.W.shape)
(1, 3, 4)
grpLinear.W.fill(1)
print(grpLinear(data))
[[[10. 10. 10. 10.]
  [10. 10. 10. 10.]]]
print(grpLinear.data.shape)
(1, 2, 4)


inmode parameter


Let us change the inmode parameter and initialize the other data (with corresponding shapes) for this example:

np.random.seed(123)
data = gpuarray.to_gpu(np.random.randint(0, 9, (batchsize, 1, insize)).astype(np.float32))
print(data)
[[[2. 2. 6.]]]
print(data.shape)
(1, 1, 3)

Let us again fill the weights tensor with adjustable values:

grpLinear = GroupLinear(groups, insize, outsize=4, inmode="one")
print(grpLinear.W.shape)
(2, 3, 4)
grpLinear.W[0].fill(1)
grpLinear.W[1].fill(-1)
print(grpLinear(data))
[[[ 10.  10.  10.  10.]
  [-10. -10. -10. -10.]]]
print(grpLinear.data.shape)
(1, 2, 4)


BatchDim parameter


data = gpuarray.to_gpu(np.random.randint(0, 9, (groups, batchsize, insize)).astype(np.float32))
print(data.shape)
(2, 1, 3)
grpLinear = GroupLinear(groups, insize, outsize, batchDim=1)
grpLinear(data)
print(grpLinear.data.shape)
(2, 1, 4)