GroupLinear¶
Description¶
This module is a group modification of a fully connected Linear layer: while a regular fully connected layer takes a vector and returns a vector, a group linear layer can return several vectors obtained using independent weights.
Dimension
While for an ordinary fully connected layer the input shape is (N, L_{in}), and the outputs are shaped (N, L_{out}), for a group modification the input shape is (N, G, L_{in}) and the output shape is (N, G, L_{out}) (or (G, N, L_{in}) and (G, N, L_{out}) - respectively, depending on the batchDim
parameter), where N - batch size, G - number of groups, L_{in} - size of the input feature vector, L_{out} - size of the output feature vector.
Initializing¶
def __init__(self, groups, insize, outsize, wscale=1.0, useW=True, useBias=True, initscheme=None,
inmode="full", wmode="full", batchDim=0, name=None, empty=False, transpW=False):
Parameters
Parameter | Allowed types | Description | Default |
---|---|---|---|
groups | int | Number of groups | - |
insize | int | Input vector size | - |
outsize | int | Output vector size | - |
wscale | float | Variance of random layer weights | 1.0 |
useW | bool | Variance of random layer weights | True |
useBias | bool | Whether to use biases | True |
initscheme | Union[tuple, str] | Specifies the initialization scheme of the layer weights (see createTensorWithScheme) | None -> ("xavier_uniform", "in") |
inmode | str | Input mode. Possible values: full and one |
full |
wmode | str | Weights mode. Possible values: full and one |
full |
batchDim | int | Batch axis position | 0 |
name | str | Layer name | None |
empty | bool | Whether to initialize the matrix of weights and biases | False |
transpW | bool | Whether to use a transposed matrix of weights | False |
Explanations
groups
- parameter that controls the connections between inputs and outputs; when groups = 1
, we get a special case of a regular fully connected layer;
inmode
- if one
, one input vector is multiplied to groups
outputs using independent weights, i.e. (N, 1, L_{in}) \to (N, G, L_{out}); if full
, the module works in the normal mode: (N, G, L_{in}) \to (N, G, L_{out});
wmode
- if one
, groups
input vectors will form groups
outputs using the same weights; if full
, the module works in the normal mode.
batchDim
- by default, the batch size axis comes first: (N, G, L_{in}), however it is possible to swap it with the group axis, by setting batchDim=1
: (G, N, L_{in}).
Examples¶
Basic example¶
Necessary imports.
import numpy as np
from PuzzleLib.Backend import gpuarray
from PuzzleLib.Modules import GroupLinear
Info
gpuarray
is required to properly place the tensor in the GPU.
np.random.seed(123)
batchsize, groups, insize = 1, 2, 3
data = gpuarray.to_gpu(np.random.randint(0, 9, (batchsize, groups, insize)).astype(np.float32))
print(data)
[[[2. 2. 6.]
[1. 3. 6.]]]
print(data.shape)
(1, 2, 3)
Let us initialize the module with default parameters (useW=True
, useBias=True
, inmode="full"
, wmode="full"
, batchDim=0
) and fill the weights tensor with custom values to make the demonstration of the module operation more convenient:
outsize = 4
grpLinear = GroupLinear(groups, insize, outsize)
print(grpLinear.W.shape)
(2, 3, 4)
grpLinear.W[0].fill(1)
grpLinear.W[1].fill(-1)
print(grpLinear(data))
[[[ 10. 10. 10. 10.]
[-10. -10. -10. -10.]]]
print(grpLinear.data.shape)
(1, 2, 4)
wmode
parameter¶
Let us change the wmode
parameter:
grpLinear = GroupLinear(groups, insize, outsize, wmode="one")
print(grpLinear.W.shape)
(1, 3, 4)
grpLinear.W.fill(1)
print(grpLinear(data))
[[[10. 10. 10. 10.]
[10. 10. 10. 10.]]]
print(grpLinear.data.shape)
(1, 2, 4)
inmode
parameter¶
Let us change the inmode
parameter and initialize the other data (with corresponding shapes) for this example:
np.random.seed(123)
data = gpuarray.to_gpu(np.random.randint(0, 9, (batchsize, 1, insize)).astype(np.float32))
print(data)
[[[2. 2. 6.]]]
print(data.shape)
(1, 1, 3)
Let us again fill the weights tensor with adjustable values:
grpLinear = GroupLinear(groups, insize, outsize=4, inmode="one")
print(grpLinear.W.shape)
(2, 3, 4)
grpLinear.W[0].fill(1)
grpLinear.W[1].fill(-1)
print(grpLinear(data))
[[[ 10. 10. 10. 10.]
[-10. -10. -10. -10.]]]
print(grpLinear.data.shape)
(1, 2, 4)
BatchDim
parameter¶
data = gpuarray.to_gpu(np.random.randint(0, 9, (groups, batchsize, insize)).astype(np.float32))
print(data.shape)
(2, 1, 3)
grpLinear = GroupLinear(groups, insize, outsize, batchDim=1)
grpLinear(data)
print(grpLinear.data.shape)
(2, 1, 4)