custom Large data set(100GB)

221 views

Skip to first unread message

ras

unread,

Feb 23, 2015, 3:01:50 PM2/23/15

to caffe...@googlegroups.com

Hi,

I'd like to use "bvlc_reference_caffenet" pretrained model architecture for my classifier. I also have around 70GB images to train on, so I’d like to start with the parameters learned on the bvlc_reference_caffenet, and fine-tune as needed. I've already preprocessed my data by using "net.preprocess " and now they are compatible wih Caffe format(i.e 1000,3,227,277). After this preprocessing, I have saved my data in batches(66 batches of ~1GB size).

However, I didn't find any example that shows how to train my model with these batches!

Should I convert my data into HDF5 and then follow one of the example provided in tutorial(e.g Flicker)?

Basically, my problem is how to train my model given I already have this large amount of data. Should I keep the batches to train the model( but how?)? or save the data in HDF5 format?

Any help would be appreciated.

ras

unread,

Feb 24, 2015, 9:21:13 PM2/24/15

to caffe...@googlegroups.com

For future reference:

There was a similar question in this regards:

https://github.com/BVLC/caffe/issues/1470#issuecomment-72295182

https://groups.google.com/forum/#!searchin/caffe-users/hdf5/caffe-users/Jm4hIFoA-9A/jcVs2khqKlUJ

In summary,

".. given the current types Caffe blobs are capped at 2 gb (although it can be raised). Until the HDF5DataLayer learns to prefetch (https://github.com/BVLC/caffe/pull/1584#issuecomment-67596709) to have constant memory use, individual h5 files need to fit in the blob limit.

By splitting the h5 and listing each in your source list .txt Caffe will iterate through them one-by-one."