GroupLinear¶

Description¶

Info

Parent class: Module

Derived classes: -

This module is a group modification of a fully connected Linear layer: while a regular fully connected layer takes a vector and returns a vector, a group linear layer can return several vectors obtained using independent weights.

Dimension

While for an ordinary fully connected layer the input shape is $(N, L_{in})$ , and the outputs are shaped $(N, L_{out})$ , for a group modification the input shape is $(N, G, L_{in})$ and the output shape is $(N, G, L_{out})$ (or $(G, N, L_{in})$ and $(G, N, L_{out})$ - respectively, depending on the batchDim parameter), where $N$ - batch size, $G$ - number of groups, $L_{in}$ - size of the input feature vector, $L_{out}$ - size of the output feature vector.

Initializing¶

def __init__(self, groups, insize, outsize, wscale=1.0, useW=True, useBias=True, initscheme=None,
                 inmode="full", wmode="full", batchDim=0, name=None, empty=False, transpW=False):

Parameters

Parameter	Allowed types	Description	Default
groups	int	Number of groups	-
insize	int	Input vector size	-
outsize	int	Output vector size	-
wscale	float	Variance of random layer weights	1.0
useW	bool	Variance of random layer weights	True
useBias	bool	Whether to use biases	True
initscheme	Union[tuple, str]	Specifies the initialization scheme of the layer weights (see createTensorWithScheme)	None -> ("xavier_uniform", "in")
inmode	str	Input mode. Possible values: `full` and `one`	full
wmode	str	Weights mode. Possible values: `full` and `one`	full
batchDim	int	Batch axis position	0
name	str	Layer name	None
empty	bool	Whether to initialize the matrix of weights and biases	False
transpW	bool	Whether to use a transposed matrix of weights	False

Explanations

groups - parameter that controls the connections between inputs and outputs; when groups = 1, we get a special case of a regular fully connected layer;

inmode - if one, one input vector is multiplied to groups outputs using independent weights, i.e. $(N, 1, L_{in}) \to (N, G, L_{out})$ ; if full, the module works in the normal mode: $(N, G, L_{in}) \to (N, G, L_{out})$ ;

wmode - if one, groups input vectors will form groups outputs using the same weights; if full, the module works in the normal mode.

batchDim - by default, the batch size axis comes first: $(N, G, L_{in})$ , however it is possible to swap it with the group axis, by setting batchDim=1: $(G, N, L_{in})$ .

Examples¶

Basic example¶

Necessary imports.

import numpy as np
from PuzzleLib.Backend import gpuarray
from PuzzleLib.Modules import GroupLinear

Info

gpuarray is required to properly place the tensor in the GPU.

np.random.seed(123)
batchsize, groups, insize = 1, 2, 3
data = gpuarray.to_gpu(np.random.randint(0, 9, (batchsize, groups, insize)).astype(np.float32))
print(data)

[[[2. 2. 6.]
  [1. 3. 6.]]]

print(data.shape)

(1, 2, 3)

Let us initialize the module with default parameters (useW=True, useBias=True, inmode="full", wmode="full", batchDim=0) and fill the weights tensor with custom values to make the demonstration of the module operation more convenient:

outsize = 4
grpLinear = GroupLinear(groups, insize, outsize)
print(grpLinear.W.shape)

(2, 3, 4)

grpLinear.W[0].fill(1)
grpLinear.W[1].fill(-1)

print(grpLinear(data))

[[[ 10.  10.  10.  10.]
  [-10. -10. -10. -10.]]]

print(grpLinear.data.shape)

(1, 2, 4)

`wmode` parameter¶

Let us change the wmode parameter:

grpLinear = GroupLinear(groups, insize, outsize, wmode="one")
print(grpLinear.W.shape)

(1, 3, 4)

grpLinear.W.fill(1)
print(grpLinear(data))

[[[10. 10. 10. 10.]
  [10. 10. 10. 10.]]]

print(grpLinear.data.shape)

(1, 2, 4)

`inmode` parameter¶

Let us change the inmode parameter and initialize the other data (with corresponding shapes) for this example:

np.random.seed(123)
data = gpuarray.to_gpu(np.random.randint(0, 9, (batchsize, 1, insize)).astype(np.float32))
print(data)

[[[2. 2. 6.]]]

print(data.shape)

(1, 1, 3)

Let us again fill the weights tensor with adjustable values:

grpLinear = GroupLinear(groups, insize, outsize=4, inmode="one")
print(grpLinear.W.shape)

(2, 3, 4)

grpLinear.W[0].fill(1)
grpLinear.W[1].fill(-1)
print(grpLinear(data))

[[[ 10.  10.  10.  10.]
  [-10. -10. -10. -10.]]]

print(grpLinear.data.shape)

(1, 2, 4)

`BatchDim` parameter¶

data = gpuarray.to_gpu(np.random.randint(0, 9, (groups, batchsize, insize)).astype(np.float32))
print(data.shape)

(2, 1, 3)

grpLinear = GroupLinear(groups, insize, outsize, batchDim=1)
grpLinear(data)
print(grpLinear.data.shape)

(2, 1, 4)