how to convert xml file containing feature vectors to the lmdb format?

222 views
Skip to first unread message

rsl

unread,
Jan 31, 2016, 4:31:42 PM1/31/16
to Caffe Users
I'm new in Caffe and I have to implement a Cnn with caffe which takes feature vectors as an input. The feature vectors have been saved in xml file like matrix without any child tag.  I know the input file for Caffe should be in hfd5 or leveldb or lmdb format, but I don't have any clue how to convert the xml file to one these.
Would you please help me? Thanks

Jan C Peters

unread,
Feb 1, 2016, 3:03:32 AM2/1/16
to Caffe Users
Well, that calls for a custom solution. Either write a converter to turn your data from XML to some format caffe can handle, or write a new input layer class that directly loads the XML data into memory. You cannot expect caffe to be able to handle all kinds of possible input formats. XML is really not well suited for large amounts of data, butdeep learning is usually using large amounts of data; maybe that's why there is no such input layer as of yet.

Jan

rsl

unread,
Feb 1, 2016, 3:44:08 AM2/1/16
to Caffe Users
Then what would be the best way to save the feature descriptors at the beginning? I mean is it possible to save the feature vectors in other format like hfd5, leveldb or lmdb? I don't have any experiences in working with such formats before.

Jan C Peters

unread,
Feb 1, 2016, 4:33:47 AM2/1/16
to Caffe Users
You can store anything in these formats that can be represented as a multidimensional array of numbers. The only thing to mind is that HDF5 works with any kind of label whereas leveldb and lmdb store "Datum" structs (as defined in the caffe.proto) that are usually better suited for one single class label (integer 0..N-1) per sample. See my post on creating an LMDB for caffe: https://groups.google.com/d/msg/caffe-users/2xpmLJYmt5k/ApiOQ7NnAwAJ.

Which of these formats you choose should be your decision. And some basic reading about these formats wouldn't hurt either. I found HDF5 to be quite useful because it is really flexible and incredibly easy to use (at least in MATLAB and python, in C++ it is a bit more difficult).

Jan
Reply all
Reply to author
Forward
0 new messages