Skip to content

Provider

Warning

Documentation for the module is under development.

Provider – class that converts chunks of data using its transformers self.transformers. The following specific useful implementations of this class exist: Merger and Serial.

__init__

def __init__(self, numofthreads=4)

Creates a Provider object.

Parameters

Parameter Allowed types Description Default
numofthreads int Number of threads involved in data preparation 4

Return value

None

__enter__

def __enter__(self)

Enables using Provider in with construction.

Parameters

None.

Return value

Given Provider object.

__exit__

def __exit__(self, exc_type, exc_value, traceback)

Called upon exit from with expression. Executes the [closePool] method(#closepool).

Parameters

Parameter Allowed types Description Default
exc_type System parameter, please see the description of the with construction online
exc_value System parameter, please see the description of the with construction online
traceback System parameter, please see the description of the with construction online

Return value

None.

closePool

def closePool(self)

Closes the thread pool.

Parameters

None.

Return value

None.

addTransformer

def addTransformer(self, transformer)

A method of adding a transformer to the array of transformers of this object.

Parameters

Parameter Allowed types Description Default
transformer Transformer Transformer to be used for data conversion

Return value

None.

getNextChunk

def getNextChunk(self, chunksize, **kwargs)

Returns a data batch sized chunksuze. The kwargs parameter specifies the parameters for constructing the data batch. By default the method is empty; you have to redefine it to your needs.

Parameters

Parameter Allowed types Description Default
chunksize int Size of returned data batches
**kwargs dict Dictionary that can specify the parameters for constructing data batches -

Return value

There is no default return value in Provider. If a certain method realization is specified, then the return value is the same as in Merger.

prepareData

def prepareData(self, chunksize=20000, **kwargs)

It takes the next data batch using getNextChunk and prepares the transformed data in multi-threaded mode.

Parameters

Parameter Allowed types Description Default
chunksize int Size of returned data batches 20000
**kwargs dict Dictionary that can specify the parameters for constructing data batches -

Return value

None.

getData

def getData(self)

Method used for receiving processed data, it is called after prepareData. If the data is not ready yet, it waits for the processing to end.

Parameters

None.

Return value

Prepared data from self.data.

worker

def worker(transformers, batch, threadidx)

In separate threads, workers apply each transformer to their data batch.

Parameters

Parameter Allowed types Description Default
transformers list Transformer array self.transformers
batch np.ndarray, list Data batch from self.data, passed to this worker for processing
threadidx int Multiprocessing thread number

Return value

Returns a tuple (batch, threadidx), where batch – processed data batch.