Merger

Warning

Documentation for the module is under development.

Merger – is derived from Provider, whose task is to return a data batch taken from several datasets if needed. It forms a data batch in the right proportions from the original datasets.

__init__

def __init__(self, datasets, labelIds=None, numofthreads=4)

Createsv*Merger* for specified datasets. You can specify which labels to assign to each of the datasets.

Parameters

Parameter Allowed types Description Default
datasets list Dataset list. Datasets should be either lists or np.ndarray -
labelIds list Integer list. They indicate which labels should be assigned to datasets None
numofthreads int Number of data processing threads 4

Return value

None.

getNextChunk

def getNextChunk(self, chunksize, **kwargs)

Returns a data batch sized chunksuze from the internal datasets of the Merger. The kwargs parameter specifies the parameters for constructing the data batch.

Parameters

Parameter Allowed types Description Default
chunksize int See the Description in prepareData
**kwargs dict Can contain fields "ratios", "randomize", "permutate", see the description of these parameters in prepareData -

Return value

A tuple of two np.ndarray – data and labels or data only, if labels were not provided when creating the Merger object.

getRandomChunk

def getRandomChunk(self, chunksize, ratios, permutate)

It collects random data from random places in datasets. Does not monitor the uniqueness of collected values.

Parameters

Parameter Allowed types Description Default
chunksize int See the description in prepareData
ratios list See the description in prepareData -
permutate bool See the description in prepareData -

Return value

See the description in getNextChunk.

getRationedChunk

def getRationedChunk(self, chunksize, ratios, permutate)

Sequentially takes data in the right proportions from datasets.

Parameters

Parameter Allowed types Description Default
chunksize int See the description in prepareData
ratios list See the description in prepareData -
permutate bool See the description in prepareData -

Return value

See the description in getNextChunk.

deriveChunkRatios

def deriveChunkRatios(ratios, chunksize)

converts ratios values so that for any i ratios[i] – number of elements that need to be taken from the dataset number i.

Parameters

Parameter Allowed types Description Default
chunksize int See the description in prepareData
ratios int See the description in prepareData -

Return value

None.

prepareData

def prepareData(self, ratios=None, chunksize=20000, randomize=False, permutate=True)

Prepares data from its datasets. It calls the parent method Provider.PrepareData, having checked that ratios are correct. The randomize and permutate parameters are used by the parent method only for passing to the [getNextChunk] method(#getnextchunk).

Parameters

Parameter Allowed types Description Default
ratios list An array of int, sized according to the number of input datasets. Elements of the array indicate in what proportions data from datasets should be . If None, then an array of units will be created as ratios None
chunksize int Size of the returned data batches 20000
randomize bool If True, then collects random data from datasets. Otherwise, it takes them sequentially False
permutate bool If True, then permutates prepared data True

Return value

None.