lmdb creation for multilabel regression (30 outputs)

39 views

Skip to first unread message

Floda

unread,

Feb 24, 2017, 3:50:11 PM2/24/17

to Caffe Users

Hi, i am running through a bunch of challenges and examples with caffe. Today still i have the leftovers of Wednesday: Multilabel Regression with data in LMDB. I did it with hdf5 and i have "something" running with lmdb. Obviously i do something wrong since the loss with the hdf5 variant is something like 0.02, the one with data in lmdb ~0.98. The predictions done with the lmdb network dont make any sense. At the moment i would explain this difference with labels which dont match the required representation.

I tapped in almost every error which somebody can tap into. I think i read all articles in the internet about it. I am going to document this properly.

Any help is highly appreciated. Thanks in advance

Below the logs, layer conf and python code which generates the lmdb -----

Checking the logs i see a difference in the input shapes. The label shape is 128 30 where 128 is the batch size and 30 the output vector.

I0821 09:21:09.052621 5405 hdf5_data_layer.cpp:94] Number of HDF5 files: 1

I0821 09:21:09.093633 5405 net.cpp:155] Setting up MyData

I0821 09:21:09.093675 5405 net.cpp:163] Top shape: 128 1 96 96 (1179648)

I0821 09:21:09.093685 5405 net.cpp:163] Top shape: 128 30 (3840)

With lmdb it looks like below. Data and Label is split in two different lmdbs. The shape of train_label_lmdb is 64 30 1 1 where 64 is the batch size. But the rest look peculiar 30 1 1. I was expecting 30 as with HDF5

I0224 15:01:11.022727 3552 db_lmdb.cpp:35] Opened lmdb data/train_image_lmdb

I0224 15:01:11.045539 3545 data_layer.cpp:41] output data size: 64,1,96,96

I0224 15:01:11.052714 3545 net.cpp:150] Setting up data

I0224 15:01:11.052753 3545 net.cpp:157] Top shape: 64 1 96 96 (589824)

I0224 15:01:11.052762 3545 net.cpp:165] Memory required for data: 2359296

I0224 15:01:11.052799 3545 layer_factory.hpp:77] Creating layer label

I0224 15:01:11.052917 3545 net.cpp:100] Creating Layer label

I0224 15:01:11.052932 3545 net.cpp:408] label -> label

I0224 15:01:11.054951 3554 db_lmdb.cpp:35] Opened lmdb data/train_label_lmdb

I0224 15:01:11.055282 3545 data_layer.cpp:41] output data size: 64,30,1,1

I0224 15:01:11.057204 3545 net.cpp:150] Setting up label

I0224 15:01:11.057241 3545 net.cpp:157] Top shape: 64 30 1 1 (1920)

Those are the two input layers:

layer {
  name: "data"
  type: "Data"
  top: "data"
  data_param {
    source: "data/train_image_lmdb"
    batch_size: 64
    backend: LMDB
  }
  include { phase: TRAIN }
}


layer {
  name: "label"
  type: "Data"
  top: "label"
  data_param {
    source: "data/train_label_lmdb"
    batch_size: 64
    backend: LMDB
  }
  include { phase: TRAIN }
}

....

This is how i create the image and label lmdb:

def writelmdb(t,data,label=None):
    map_size = data.nbytes * 10


    print("data.shape: ")
    print(data.shape)


    env = lmdb.open(t+'_image_lmdb', map_size=map_size)
    with env.begin(write=True) as txn:


        N = len(data)
        for i in range(N):
            datum = caffe.io.array_to_datum(data[i])


            str_id = '{:08}'.format(i)
            txn.put(str_id.encode('ascii'), datum.SerializeToString())
    env.close()


    if label is not None:


        print("label.shape: ")
        print(label.shape)


        env = lmdb.open(t+'_label_lmdb', map_size=map_size)
        with env.begin(write=True) as txn:


            N = len(data)
            for i in range(N):
                #label_dat = caffe.io.array_to_datum(np.array(label[i]).astype(float).reshape(30,1,1))
                label_dat = caffe.io.array_to_datum(label[i].reshape(30,1,1))
                str_id = '{:08}'.format(i)
                txn.put(str_id.encode('ascii'), label_dat.SerializeToString())
        env.close()

Reply all

Reply to author

Forward

0 new messages