lmdb creation for multilabel regression (30 outputs)

39 views
Skip to first unread message

Floda

unread,
Feb 24, 2017, 3:50:11 PM2/24/17
to Caffe Users
Hi, i am running through a bunch of challenges and examples with caffe. Today still i have the leftovers of Wednesday: Multilabel Regression with data in LMDB. I did it with hdf5 and i have "something" running with lmdb. Obviously i do something wrong since the loss with the hdf5 variant is something like 0.02, the one with data in lmdb ~0.98. The predictions done with the lmdb network dont make any sense. At the moment i would explain this difference with labels which dont match the required representation. 


I tapped in almost every error which somebody can tap into. I think i read all articles in the internet about it. I am going to document this properly.

Any help is highly appreciated. Thanks in advance

Below the logs, layer conf and python code which generates the lmdb  -----

Checking the logs i see a difference in the input shapes. The label shape is 128 30 where 128 is the batch size and 30 the output vector.

I0821 09:21:09.052621  5405 hdf5_data_layer.cpp:94] Number of HDF5 files: 1
I0821 09:21:09.093633  5405 net.cpp:155] Setting up MyData
I0821 09:21:09.093675  5405 net.cpp:163] Top shape: 128 1 96 96 (1179648)
I0821 09:21:09.093685  5405 net.cpp:163] Top shape: 128 30 (3840)

With lmdb it looks like below. Data and Label is split in two different lmdbs. The shape of train_label_lmdb is 64 30 1 1 where 64 is the batch size. But the rest look peculiar 30 1 1. I was expecting 30 as with HDF5

I0224 15:01:11.022727  3552 db_lmdb.cpp:35] Opened lmdb data/train_image_lmdb
I0224 15:01:11.045539  3545 data_layer.cpp:41] output data size: 64,1,96,96
I0224 15:01:11.052714  3545 net.cpp:150] Setting up data
I0224 15:01:11.052753  3545 net.cpp:157] Top shape: 64 1 96 96 (589824)
I0224 15:01:11.052762  3545 net.cpp:165] Memory required for data: 2359296
I0224 15:01:11.052799  3545 layer_factory.hpp:77] Creating layer label
I0224 15:01:11.052917  3545 net.cpp:100] Creating Layer label
I0224 15:01:11.052932  3545 net.cpp:408] label -> label
I0224 15:01:11.054951  3554 db_lmdb.cpp:35] Opened lmdb data/train_label_lmdb
I0224 15:01:11.055282  3545 data_layer.cpp:41] output data size: 64,30,1,1
I0224 15:01:11.057204  3545 net.cpp:150] Setting up label
I0224 15:01:11.057241  3545 net.cpp:157] Top shape: 64 30 1 1 (1920)

Those are the two input layers:

layer {
  name
: "data"
  type
: "Data"
  top
: "data"
  data_param
{
    source
: "data/train_image_lmdb"
    batch_size
: 64
    backend
: LMDB
 
}
  include
{ phase: TRAIN }
}


layer
{
  name
: "label"
  type
: "Data"
  top
: "label"
  data_param
{
    source
: "data/train_label_lmdb"
    batch_size
: 64
    backend
: LMDB
 
}
  include
{ phase: TRAIN }
}

....


This is how i create the image and label lmdb:

def writelmdb(t,data,label=None):
    map_size
= data.nbytes * 10


   
print("data.shape: ")
   
print(data.shape)


    env
= lmdb.open(t+'_image_lmdb', map_size=map_size)
   
with env.begin(write=True) as txn:


        N
= len(data)
       
for i in range(N):
            datum
= caffe.io.array_to_datum(data[i])


            str_id
= '{:08}'.format(i)
            txn
.put(str_id.encode('ascii'), datum.SerializeToString())
    env
.close()


   
if label is not None:


       
print("label.shape: ")
       
print(label.shape)


        env
= lmdb.open(t+'_label_lmdb', map_size=map_size)
       
with env.begin(write=True) as txn:


            N
= len(data)
           
for i in range(N):
               
#label_dat = caffe.io.array_to_datum(np.array(label[i]).astype(float).reshape(30,1,1))
                label_dat
= caffe.io.array_to_datum(label[i].reshape(30,1,1))
                str_id
= '{:08}'.format(i)
                txn
.put(str_id.encode('ascii'), label_dat.SerializeToString())
        env
.close()









Reply all
Reply to author
Forward
0 new messages