Merger¶
Warning
Documentation for the module is under development.
Merger – is derived from Provider, whose task is to return a data batch taken from several datasets if needed. It forms a data batch in the right proportions from the original datasets.
__init__¶
def __init__(self, datasets, labelIds=None, numofthreads=4)
Createsv*Merger* for specified datasets. You can specify which labels to assign to each of the datasets.
Parameters
Parameter | Allowed types | Description | Default |
---|---|---|---|
datasets | list | Dataset list. Datasets should be either lists or np.ndarray | - |
labelIds | list | Integer list. They indicate which labels should be assigned to datasets | None |
numofthreads | int | Number of data processing threads | 4 |
Return value
None.
getNextChunk¶
def getNextChunk(self, chunksize, **kwargs)
Returns a data batch sized chunksuze from the internal datasets of the Merger. The kwargs parameter specifies the parameters for constructing the data batch.
Parameters
Parameter | Allowed types | Description | Default |
---|---|---|---|
chunksize | int | See the Description in prepareData | – |
**kwargs | dict | Can contain fields "ratios", "randomize", "permutate", see the description of these parameters in prepareData | - |
Return value
A tuple of two np.ndarray – data and labels or data only, if labels were not provided when creating the Merger object.
getRandomChunk¶
def getRandomChunk(self, chunksize, ratios, permutate)
It collects random data from random places in datasets. Does not monitor the uniqueness of collected values.
Parameters
Parameter | Allowed types | Description | Default |
---|---|---|---|
chunksize | int | See the description in prepareData | – |
ratios | list | See the description in prepareData | - |
permutate | bool | See the description in prepareData | - |
Return value
See the description in getNextChunk.
getRationedChunk¶
def getRationedChunk(self, chunksize, ratios, permutate)
Sequentially takes data in the right proportions from datasets.
Parameters
Parameter | Allowed types | Description | Default |
---|---|---|---|
chunksize | int | See the description in prepareData | – |
ratios | list | See the description in prepareData | - |
permutate | bool | See the description in prepareData | - |
Return value
See the description in getNextChunk.
deriveChunkRatios¶
def deriveChunkRatios(ratios, chunksize)
converts ratios values so that for any i ratios[i] – number of elements that need to be taken from the dataset number i.
Parameters
Parameter | Allowed types | Description | Default |
---|---|---|---|
chunksize | int | See the description in prepareData | – |
ratios | int | See the description in prepareData | - |
Return value
None.
prepareData¶
def prepareData(self, ratios=None, chunksize=20000, randomize=False, permutate=True)
Prepares data from its datasets. It calls the parent method Provider.PrepareData, having checked that ratios are correct. The randomize and permutate parameters are used by the parent method only for passing to the [getNextChunk] method(#getnextchunk).
Parameters
Parameter | Allowed types | Description | Default |
---|---|---|---|
ratios | list | An array of int, sized according to the number of input datasets. Elements of the array indicate in what proportions data from datasets should be . If None, then an array of units will be created as ratios | None |
chunksize | int | Size of the returned data batches | 20000 |
randomize | bool | If True, then collects random data from datasets. Otherwise, it takes them sequentially | False |
permutate | bool | If True, then permutates prepared data | True |
Return value
None.