Mixing LMDB and HDF5

856 views
Skip to first unread message

Alex Bewley

unread,
Nov 1, 2014, 8:24:00 AM11/1/14
to caffe...@googlegroups.com
I'm interested in learning a joint classification/regression network.
I have a lmdb with 2 million+ images and labels. Additionally I have regression target data in a hdf5 file with data shape (2000000,84,1,1) and label shape (2000000,1). 
The labels in both the lmdb and hdf5 file are redundant. 
I created the lmdb using the tools/convert_imageset executable and forced shuffle to false to maintain order with the hdf5.

To verify if the order is maintained, I've concatenated the labels from both the lmdb and hdf5 and stored using a HDF5_OUTPUT layer.

The order of the lmdb labels appears to have somehow changed. 
Is there some shuffling going on in the batch train or test process?
I can't find any info on the order samples are selected -- I have assumed they are sequential.

What is the best way of ensuring data selected from two separate data layers are selected such that the batches keep in sync with the correct order?

Alex Bewley

unread,
Nov 1, 2014, 8:36:22 AM11/1/14
to caffe...@googlegroups.com
I forgot to mention that I've set the batch size to 300 for both the DATA and HDF5_DATA layers.

Mohamed Omran

unread,
Nov 1, 2014, 10:59:42 AM11/1/14
to caffe...@googlegroups.com
During training the samples are indeed read sequentially from the lmdb, but they're sorted internally during storage by the key value. Given the input list, convert_imageset.cpp generates a key for each image composed of the corresponding line number followed by the filename to ensure that they're stored in the required order:

    int length = snprintf(key_cstr, kMaxKeyLength, "%08d_%s", line_id,
        lines[line_id].first.c_str());

    // Put in db
    CHECK(dataset->put(string(key_cstr, length), datum));

Are you sure the issue isn't with your hdf5 file?

Alex Bewley

unread,
Nov 3, 2014, 3:00:34 AM11/3/14
to caffe...@googlegroups.com
Silly me, I had mixed versions of shuffled and non-shuffled of lmdb files and didn't update the DATA layer in my net.prototxt .

Nice to now know that the order is deterministic when shuffling is disabled.

Thanks for the clarification.
Alex

Zheng Shou

unread,
Jul 19, 2015, 6:37:16 PM7/19/15
to caffe...@googlegroups.com
Hi,

Do you know if I want enable "shuffle", how to make sure they have same order? (except shuffle by myself when prepare data)

Thanks so much!

Zheng

在 2014年11月3日星期一 UTC-5上午3:00:34,Alex Bewley写道:

Prophecies

unread,
Apr 29, 2016, 4:48:05 PM4/29/16
to Caffe Users
Hey, Did you ever figure that out? Please let me know
Reply all
Reply to author
Forward
0 new messages