multiple H5-Files, how are they processed ??

38 views

HDF5DataLayerh5-filesmultiple-inputprocessing

Skip to first unread message

Julian Kolarz

unread,

Dec 9, 2016, 12:12:20 PM12/9/16

to Caffe Users

Hey all,

I am using Caffe for 1 month now and I was wondering how caffe processes multiple h5-files?

So I have a HDF5 Input layer. And I have 13 h5-files and a txt file with all the names in it. In my prototxt file, I put in the path to the txt file as source parameter

Everything works great. But I want to know how caffe processes these files.

Does it use all h5 files per iteration? Because it does not seem like, because my 13 h5 files have ~4,8GB all together, but my GPU only uses ~1Gb out of 8GB during training.

How are the h5 files loaded?? Does anybody know how it works, or is it possible to activate a log message which h5 files are processed during training iteration??

Would be glad if anybody could help me here or share his experience.

Thank you

Patrick McNeil

unread,

Dec 9, 2016, 1:53:18 PM12/9/16

to Caffe Users

My understanding is that Caffe will take the batch_size parameter from the your network definition and use that many entries for each iteration. If you also use iter_size in your solver.prototxt file, that number is multiplied by your batch_size parameter. In addition, if you have multiple GPUs (using either -gpu 0,1,2 or -gpu all), the number of GPUs is multiplied by the (batch_size * iter_size) value.

In my case, I am using a LMDB, but it works the same as HDF5 or just text files with individual files in a TXT file.

I am using a batch_size of 8, iter_size of 10 and 1 GPU. So, effectively 80 files are processed for each iteration.

Since you have multiple HDF5 files, I believe it will start with the first one until it has reached there are no more files in the database. Basically when [iteration # * (batch_size * iter_size * # GPU)] > # entries in first HDF5, the second HDF5 will be used to supply the files for the iteration. This process continues until there are no more entries in the databases. At this point, Caffe will wrap around to the first one again.