GroupLinear¶
Description¶
This module is a group modification of a fully connected Linear layer: while a regular fully connected layer takes a vector and returns a vector, a group linear layer can return several vectors obtained using independent weights.
Dimension
While for an ordinary fully connected layer the input shape is (N, L_{in}), and the outputs are shaped (N, L_{out}), for a group modification the input shape is (N, G, L_{in}) and the output shape is (N, G, L_{out}) (or (G, N, L_{in}) and (G, N, L_{out}) - respectively, depending on the batchDim parameter), where N - batch size, G - number of groups, L_{in} - size of the input feature vector, L_{out} - size of the output feature vector.
Initializing¶
def __init__(self, groups, insize, outsize, wscale=1.0, useW=True, useBias=True, initscheme=None,
inmode="full", wmode="full", batchDim=0, name=None, empty=False, transpW=False):
Parameters
| Parameter | Allowed types | Description | Default |
|---|---|---|---|
| groups | int | Number of groups | - |
| insize | int | Input vector size | - |
| outsize | int | Output vector size | - |
| wscale | float | Variance of random layer weights | 1.0 |
| useW | bool | Variance of random layer weights | True |
| useBias | bool | Whether to use biases | True |
| initscheme | Union[tuple, str] | Specifies the initialization scheme of the layer weights (see createTensorWithScheme) | None -> ("xavier_uniform", "in") |
| inmode | str | Input mode. Possible values: full and one |
full |
| wmode | str | Weights mode. Possible values: full and one |
full |
| batchDim | int | Batch axis position | 0 |
| name | str | Layer name | None |
| empty | bool | Whether to initialize the matrix of weights and biases | False |
| transpW | bool | Whether to use a transposed matrix of weights | False |
Explanations
groups - parameter that controls the connections between inputs and outputs; when groups = 1, we get a special case of a regular fully connected layer;
inmode - if one, one input vector is multiplied to groups outputs using independent weights, i.e. (N, 1, L_{in}) \to (N, G, L_{out}); if full, the module works in the normal mode: (N, G, L_{in}) \to (N, G, L_{out});
wmode - if one, groups input vectors will form groups outputs using the same weights; if full, the module works in the normal mode.
batchDim - by default, the batch size axis comes first: (N, G, L_{in}), however it is possible to swap it with the group axis, by setting batchDim=1: (G, N, L_{in}).
Examples¶
Basic example¶
Necessary imports.
import numpy as np
from PuzzleLib.Backend import gpuarray
from PuzzleLib.Modules import GroupLinear
Info
gpuarray is required to properly place the tensor in the GPU.
np.random.seed(123)
batchsize, groups, insize = 1, 2, 3
data = gpuarray.to_gpu(np.random.randint(0, 9, (batchsize, groups, insize)).astype(np.float32))
print(data)
[[[2. 2. 6.]
[1. 3. 6.]]]
print(data.shape)
(1, 2, 3)
Let us initialize the module with default parameters (useW=True, useBias=True, inmode="full", wmode="full", batchDim=0) and fill the weights tensor with custom values to make the demonstration of the module operation more convenient:
outsize = 4
grpLinear = GroupLinear(groups, insize, outsize)
print(grpLinear.W.shape)
(2, 3, 4)
grpLinear.W[0].fill(1)
grpLinear.W[1].fill(-1)
print(grpLinear(data))
[[[ 10. 10. 10. 10.]
[-10. -10. -10. -10.]]]
print(grpLinear.data.shape)
(1, 2, 4)
wmode parameter¶
Let us change the wmode parameter:
grpLinear = GroupLinear(groups, insize, outsize, wmode="one")
print(grpLinear.W.shape)
(1, 3, 4)
grpLinear.W.fill(1)
print(grpLinear(data))
[[[10. 10. 10. 10.]
[10. 10. 10. 10.]]]
print(grpLinear.data.shape)
(1, 2, 4)
inmode parameter¶
Let us change the inmode parameter and initialize the other data (with corresponding shapes) for this example:
np.random.seed(123)
data = gpuarray.to_gpu(np.random.randint(0, 9, (batchsize, 1, insize)).astype(np.float32))
print(data)
[[[2. 2. 6.]]]
print(data.shape)
(1, 1, 3)
Let us again fill the weights tensor with adjustable values:
grpLinear = GroupLinear(groups, insize, outsize=4, inmode="one")
print(grpLinear.W.shape)
(2, 3, 4)
grpLinear.W[0].fill(1)
grpLinear.W[1].fill(-1)
print(grpLinear(data))
[[[ 10. 10. 10. 10.]
[-10. -10. -10. -10.]]]
print(grpLinear.data.shape)
(1, 2, 4)
BatchDim parameter¶
data = gpuarray.to_gpu(np.random.randint(0, 9, (groups, batchsize, insize)).astype(np.float32))
print(data.shape)
(2, 1, 3)
grpLinear = GroupLinear(groups, insize, outsize, batchDim=1)
grpLinear(data)
print(grpLinear.data.shape)
(2, 1, 4)