converting multiclass image data set to lmdb

456 views
Skip to first unread message

spuran yarram

unread,
Apr 24, 2016, 9:20:16 PM4/24/16
to Caffe Users
i have an image data set of 8 classes  i which i have a train set test set and a csv file containing the classes of each image in train set and i want to use googleNet cnn, could someone explain how to convert the imagedata i have t lmdb so that i can run the cnn 

spuran yarram

unread,
Apr 25, 2016, 2:52:04 PM4/25/16
to Caffe Users
i have resized the images to a 256X256 format and now i want use to convert_imageset to convert the images to lmdb but i have given the train_root_dir correctly but i fail to generate any imges ..could any one pls help

spuran@rustum:~/Caffe/caffe$ ./examples/imagenet/create_imagenet.sh
Creating train lmdb...
I0425 14:45:40.831051 12500 convert_imageset.cpp:83] Shuffling data
I0425 14:45:40.831655 12500 convert_imageset.cpp:86] A total of 0 images.
I0425 14:45:40.865133 12500 db_lmdb.cpp:38] Opened lmdb examples/imagenet/new_train_lmdb
Creating val lmdb...
I0425 14:45:41.095629 12505 convert_imageset.cpp:83] Shuffling data
I0425 14:45:41.096097 12505 convert_imageset.cpp:86] A total of 0 images.
I0425 14:45:41.096269 12505 db_lmdb.cpp:38] Opened lmdb examples/imagenet/new_val_lmdb


On 24 April 2016 at 21:20, spuran yarram <spur...@gmail.com> wrote:
i have an image data set of 8 classes  i which i have a train set test set and a csv file containing the classes of each image in train set and i want to use googleNet cnn, could someone explain how to convert the imagedata i have t lmdb so that i can run the cnn 

--
You received this message because you are subscribed to a topic in the Google Groups "Caffe Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/caffe-users/1zZhLCTU3dc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/f7a8eab8-3a10-40ce-b4c0-ea8b98e3c1df%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ahmed Ibrahim

unread,
Apr 25, 2016, 3:38:46 PM4/25/16
to Caffe Users
you can not create multiclass dataset with create_imagenet.sh or the command line tool provided by caffe directly , you have to do some tricks like creating two seperate LMDB one for images and one for tables. also you can write a python script to create your LMDB or may be go to another format like HDF5 (which can be creating using a simple python script as well).

Jan

unread,
Apr 26, 2016, 5:28:31 AM4/26/16
to Caffe Users
What Ahmed said. And: Please use full sentences and punctuation. Otherwise it is really hard to understand what you are trying to say/ask.

Jan

spuran yarram

unread,
Apr 27, 2016, 12:18:32 PM4/27/16
to Jan, Caffe Users
sorry for the inconvenience. I have got two lmdb files for train data, one file with labels and other with data(images as ndarray)
and i am unable to figure out how to proceed further. And also my lmdb file has a data of 45gb but my input data was only 1.1 gb ..Below i have attached my code . Could you suggest how to proceed further:


import lmdb
import sys

import re, fileinput, math
import numpy as np

# Make sure that caffe is on the python path:
# caffe_root = '/home/spuran/Caffe/caffe' # this file is expected to be in {caffe_root}/examples
# import sys
# sys.path.insert(0, caffe_root + 'python')

import caffe

# Command line to check created files:
# python -mlmdb stat --env=./Downloads/caffe-master/data/liris-accede/train_score_lmdb/

data = 'test.txt'
lmdb_data_name = 'test_data_lmdb'
lmdb_label_name = 'train_score_lmdb'

Inputs = []
Labels = []
error_catch_label =[]

for line in fileinput.input(data):
entries = re.split(',', line.strip())
a= entries[0]
Inputs.append(a)
# print(a[1:-1])
# b= entries[1]
# Labels.append(b[1:-1])

print('Writing labels')

# # Size of buffer: 1000 elements to reduce memory consumption
# for idx in range(int(math.ceil(len(Labels)/1000.0))):
# in_db_label = lmdb.open(lmdb_label_name, map_size=int(1e12))
# with in_db_label.begin(write=True) as in_txn:
# try:
# for label_idx, label_ in enumerate(Labels[(1000*idx):(1000*(idx+1))]):
# im_dat = caffe.io.array_to_datum(np.array(label_).astype(float).reshape(1,1,1))
# in_txn.put('{:0>10d}'.format(1000*idx + label_idx), im_dat.SerializeToString())
#
# string_ = str(1000*idx+label_idx+1) + ' / ' + str(len(Labels))
# sys.stdout.write("\r%s" % string_)
# sys.stdout.flush()
# except(ValueError,IOError,TypeError,AttributeError):
# print("problem with")
# print(label_idx)
# error_catch_label.append(label_idx)
# continue
# in_db_label.close()
# print('')

print('Writing image data')
error_catch = []


for idx in range(int(math.ceil(len(Inputs)/1000.0))):
in_db_data = lmdb.open(lmdb_data_name, map_size=int(1e12))
with in_db_data.begin(write=True) as in_txn:
try:
for in_idx, in_ in enumerate(Inputs[(1000*idx):(1000*(idx+1))]):
im = caffe.io.load_image(in_)
print(type(im),"im")
im_dat = caffe.io.array_to_datum(im.astype(float).transpose((2, 0, 1)))
# print(im_dat,"im_dat")
in_txn.put('{:0>10d}'.format(1000*idx + in_idx), im_dat.SerializeToString())
# print('{:0>10d}'.format(1000*idx + in_idx), im_dat.SerializeToString())
string_ = str(1000*idx+in_idx+1) + ' / ' + str(len(Inputs))
sys.stdout.write("\r%s" % string_)
sys.stdout.flush()
# print()



except(ValueError,IOError,TypeError,AttributeError):
print("problem with")
print(in_db_data)
error_catch.append(string_)
continue
in_db_data.close()
print(error_catch)


print('')

Jan

unread,
Apr 28, 2016, 3:39:51 AM4/28/16
to Caffe Users, jcpet...@gmail.com
Concerning the sizes: LMDB uses sparse files, which is a special kind of file. Regular file managers may report misleading sizes for files like these, probably the db does not really occupy 45GB of space.

Well the next step is usually to write up a network definition (a .prototxt file) similar to those you see in the examples, and a solver definition which specifies the training parameters. Then train the network with
$build/bin/caffe train -solver yoursolverfile.prototxt

How these files can look like is documented in the multiple examples caffe ships with. If you have questions to individual parameters in the prototxt files, your first step should be to refer to the proto definition file https://github.com/BVLC/caffe/blob/master/src/caffe/proto/caffe.proto, which is comprehensively commented. The contents of the network definition file is a NetParameter, the solver file contains a SolverParameter instance.

Jan
Reply all
Reply to author
Forward
0 new messages