Hi,
We are running bob.spear to train a UBM on our own collected database, where we have over 5M frames (each of the ~4000 utterance/speech sample has between 3000 and 15000 frames).
The experiment (running ./bin/verify.py -vv -d 'bigdata' -p 'energy-2gauss' -e 'mfcc-60' -a 'ivec-cosine-bigdata' -s 'ivec-cosine-bigdata' -v -parallel 8) fails on a memory error in the file ivector.py, function train_projector on the line
#train UBM
data = numpy.vstack(train_features_flatten)
We are running on a 8-core 32GB RAM Ubuntu system, and the total size of our extractor HD5 files is over 32GB (~34GB to be more exact). The data is successfully passed to the function, but the numpy array created using numpy.vstack seems to create a copy of the data, and overloads the memory.
We are wondering if there is a way to perhaps load the data in chunks or maybe another solution to help with this problem.
Many thanks in advance,
Ziv