Provider¶
Warning
Documentation for the module is under development.
Provider – class that converts chunks of data using its transformers self.transformers. The following specific useful implementations of this class exist: Merger and Serial.
__init__¶
def __init__(self, numofthreads=4)
Creates a Provider object.
Parameters
Parameter | Allowed types | Description | Default |
---|---|---|---|
numofthreads | int | Number of threads involved in data preparation | 4 |
Return value
None
__enter__¶
def __enter__(self)
Enables using Provider in with construction.
Parameters
None.
Return value
Given Provider object.
__exit__¶
def __exit__(self, exc_type, exc_value, traceback)
Called upon exit from with expression. Executes the [closePool] method(#closepool).
Parameters
Parameter | Allowed types | Description | Default |
---|---|---|---|
exc_type | – | System parameter, please see the description of the with construction online | – |
exc_value | – | System parameter, please see the description of the with construction online | – |
traceback | – | System parameter, please see the description of the with construction online | – |
Return value
None.
closePool¶
def closePool(self)
Closes the thread pool.
Parameters
None.
Return value
None.
addTransformer¶
def addTransformer(self, transformer)
A method of adding a transformer to the array of transformers of this object.
Parameters
Parameter | Allowed types | Description | Default |
---|---|---|---|
transformer | Transformer | Transformer to be used for data conversion | – |
Return value
None.
getNextChunk¶
def getNextChunk(self, chunksize, **kwargs)
Returns a data batch sized chunksuze. The kwargs parameter specifies the parameters for constructing the data batch. By default the method is empty; you have to redefine it to your needs.
Parameters
Parameter | Allowed types | Description | Default |
---|---|---|---|
chunksize | int | Size of returned data batches | – |
**kwargs | dict | Dictionary that can specify the parameters for constructing data batches | - |
Return value
There is no default return value in Provider. If a certain method realization is specified, then the return value is the same as in Merger.
prepareData¶
def prepareData(self, chunksize=20000, **kwargs)
It takes the next data batch using getNextChunk and prepares the transformed data in multi-threaded mode.
Parameters
Parameter | Allowed types | Description | Default |
---|---|---|---|
chunksize | int | Size of returned data batches | 20000 |
**kwargs | dict | Dictionary that can specify the parameters for constructing data batches | - |
Return value
None.
getData¶
def getData(self)
Method used for receiving processed data, it is called after prepareData. If the data is not ready yet, it waits for the processing to end.
Parameters
None.
Return value
Prepared data from self.data.
worker¶
def worker(transformers, batch, threadidx)
In separate threads, workers apply each transformer to their data batch.
Parameters
Parameter | Allowed types | Description | Default |
---|---|---|---|
transformers | list | Transformer array self.transformers | – |
batch | np.ndarray, list | Data batch from self.data, passed to this worker for processing | – |
threadidx | int | Multiprocessing thread number | – |
Return value
Returns a tuple (batch, threadidx), where batch – processed data batch.