LMDB or HDF5 to store data?

2,775 views
Skip to first unread message

Ilya Zhenin

unread,
Feb 1, 2016, 4:07:00 AM2/1/16
to Caffe Users
Is there a big difference in speed of learning convnet if image data stored in LMDB or HDF5?
I now how to write image data to hdf5 data file with Python, but for LMDB I have only Caffe interface - "create_imageset", and it's kind of a problem when you firstly should store about 10 000 000 crops in png format on hard drive and then create LMDB.


Jan C Peters

unread,
Feb 1, 2016, 4:42:55 AM2/1/16
to Caffe Users
Well you can try to adjust my script (https://groups.google.com/d/msg/caffe-users/2xpmLJYmt5k/ApiOQ7NnAwAJ) to directly grab the images from your source and put them into the DB -> No intermediary storage of all images.

As to the difference: I cannot give representative timing results (and I have not seen serious benchmarking for that here either), but I think that theoretically LMDB should be a bit faster. Although that speed difference might not be actually visible depending on your network complexity, since caffe has data prefetching. It is just that LMDB is a "real" DB whereas HDF5 is only a "light" DB. Nevertheless I personally prefer HDF5 because it is much easier and nicer to work with, and I never really had speed problems.

Jan
Reply all
Reply to author
Forward
0 new messages