Mixing LMDB and HDF5

已查看 856 次
跳至第一个未读帖子

Alex Bewley

未读,
2014年11月1日 08:24:002014/11/1
收件人 caffe...@googlegroups.com
I'm interested in learning a joint classification/regression network.
I have a lmdb with 2 million+ images and labels. Additionally I have regression target data in a hdf5 file with data shape (2000000,84,1,1) and label shape (2000000,1). 
The labels in both the lmdb and hdf5 file are redundant. 
I created the lmdb using the tools/convert_imageset executable and forced shuffle to false to maintain order with the hdf5.

To verify if the order is maintained, I've concatenated the labels from both the lmdb and hdf5 and stored using a HDF5_OUTPUT layer.

The order of the lmdb labels appears to have somehow changed. 
Is there some shuffling going on in the batch train or test process?
I can't find any info on the order samples are selected -- I have assumed they are sequential.

What is the best way of ensuring data selected from two separate data layers are selected such that the batches keep in sync with the correct order?

Alex Bewley

未读,
2014年11月1日 08:36:222014/11/1
收件人 caffe...@googlegroups.com
I forgot to mention that I've set the batch size to 300 for both the DATA and HDF5_DATA layers.

Mohamed Omran

未读,
2014年11月1日 10:59:422014/11/1
收件人 caffe...@googlegroups.com
During training the samples are indeed read sequentially from the lmdb, but they're sorted internally during storage by the key value. Given the input list, convert_imageset.cpp generates a key for each image composed of the corresponding line number followed by the filename to ensure that they're stored in the required order:

    int length = snprintf(key_cstr, kMaxKeyLength, "%08d_%s", line_id,
        lines[line_id].first.c_str());

    // Put in db
    CHECK(dataset->put(string(key_cstr, length), datum));

Are you sure the issue isn't with your hdf5 file?

Alex Bewley

未读,
2014年11月3日 03:00:342014/11/3
收件人 caffe...@googlegroups.com
Silly me, I had mixed versions of shuffled and non-shuffled of lmdb files and didn't update the DATA layer in my net.prototxt .

Nice to now know that the order is deterministic when shuffling is disabled.

Thanks for the clarification.
Alex

Zheng Shou

未读,
2015年7月19日 18:37:162015/7/19
收件人 caffe...@googlegroups.com
Hi,

Do you know if I want enable "shuffle", how to make sure they have same order? (except shuffle by myself when prepare data)

Thanks so much!

Zheng

在 2014年11月3日星期一 UTC-5上午3:00:34,Alex Bewley写道:

Prophecies

未读,
2016年4月29日 16:48:052016/4/29
收件人 Caffe Users
Hey, Did you ever figure that out? Please let me know
回复全部
回复作者
转发
0 个新帖子