Image segmentation - pixel wise labeling

2,009 views
Skip to first unread message

eran paz

unread,
Jul 5, 2015, 7:35:04 AM7/5/15
to caffe...@googlegroups.com
Hi
I'm trying to run a fully convolutional network for semantic segmentation.
I've created my own dataset, I have an image (HxWx3) with embedded items in it and a corresponding mask (HxW) where each pixel denotes a class (0=background, 1=item #1, 2=item #2, etc.)
I encountered 2 issues:
1. When I'm using the script to create the lmbd in PR#1698 it automatically read the mask with 3 channels and when I try to run the net I'm getting this error:
F0705 13:31:39.501696 13484 softmax_loss_layer.cpp:42] Check failed: outer_num_ * inner_num_ == bottom[1]->count() (226800 vs. 680400) Number of labels must match number of predictions; e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be N*H*W, with integer values in {0, 1, ..., C-1}.
which means my labels vector is 3 times longer than it should be (because it's suppose to have just 1 channel and it has 3)


2. when I force the mask to be grayscale (inside the lmdb script) I get an error that caffe.io.array_to_datum is expecting a 3d array.

I'd appreciate any help
THX

Message has been deleted

eran paz

unread,
Jul 6, 2015, 6:49:57 AM7/6/15
to caffe...@googlegroups.com
Hi Carlos
I think I figured out how to solve my problem (I think your problem is different than mine)
I can't post all of my code, but it's very similar to yours.
What I think resolves the issue is the way you reduce the mask to 1d while maintaining the ndim=3 constraint from array_to_datum.

This is how I generate the label lmdb:

            msk = np.array(cv2.imread(in_msk)) # or load whatever ndarray you need
   msk = msk[:,:,::-1]
            msk = msk.transpose((2,0,1))
   msk=msk[0,:,:]
   msk=msk.reshape(1,msk.shape[0],msk.shape[1])
   #print msk.shape, msk.ndim
            msk_dat = caffe.io.array_to_datum(msk)
            msk_txn.put('{:0>10d}'.format(img_idx), msk_dat.SerializeToString())

in_msk is the path to the image of the mask.
Some of this code might be redundant, but I'd rather leave it there.

BTW, I'm running into a different problem now, I'm getting an error:
Check failed: status == CUBLAS_STATUS_SUCCESS (14 vs. 0)  CUBLAS_STATUS_INTERNAL_ERROR
But I don't think it's something with the code, probably some build issues.

good luck


 

On Monday, July 6, 2015 at 11:57:58 AM UTC+3, Carlos Treviño wrote:
Hi

Could you share your files? I´m working in the same topic but i get another error:
F0706 10:29:05.290323  6352 db_lmdb.hpp:17] Check failed: mdb_status == 0 (123 vs. 0) D

this is how i generate the lmdb







































f = open('dataset.txt','r')
inputs = f.read().splitlines()
f.close()

in_db = lmdb.open('dataset', map_size=int(262144000))
with in_db.begin(write=True) as in_txn:
for in_idx, in_ in enumerate(inputs):
# load image:
# - as np.uint8 {0, ..., 255}
# - in BGR (switch from RGB)
# - in Channel x Height x Width order (switch from H x W x C)
im = np.array(Image.open(in_)) # or load whatever ndarray you need
im = im[:,:,::-1]
im = im.transpose((2,0,1))
im_dat = caffe.io.array_to_datum(im)
in_txn.put('{:0>10d}'.format(in_idx), im_dat.SerializeToString())
in_db.close()

print 'dataset done'

# color code for ground truth images
label_colors = [(64,128,64),(192,0,128),(0,128,192),(0,128,64),(128,0,0),(64,0,128),(64,0,192),(192,128,64),(192,192,128),(64,64,128),(128,0,192),(192,0,64),(128,128,64),(192,0,192),(128,64,64),(64,192,128),(64,64,0),(128,64,128),(128,128,192),(0,0,192),(192,128,128),(128,128,128),(64,128,192),(0,0,64),(0,64,64),(192,64,128),(128,128,0),(192,128,192),(64,0,64),(192,192,0),(0,0,0),(64,192,0)]

f = open('groundtruth.txt','r')
inputs = f.read().splitlines()
f.close()

in_db = lmdb.open('groundtruth', map_size=int(94371840))
with in_db.begin(write=True) as in_txn:
for in_idx, in_ in enumerate(inputs):
# load image:
# - as np.uint8 {0, ..., 255}
im = np.array(Image.open(in_)) # or load whatever ndarray you need
# convert to one dimensional ground truth labels
tmp = np.uint8(np.zeros(im[:,:,0:1].shape))
for i in range(0,len(label_colors)):
tmp[:,:,0] = tmp[:,:,0] + i*np.prod(np.equal(im,label_colors[i]),2)

# - in Channel x Height x Width order (switch from H x W x C)
tmp = tmp.transpose((2,0,1))
im_dat = caffe.io.array_to_datum(tmp)
in_txn.put('{:0>10d}'.format(in_idx), im_dat.SerializeToString())
in_db.close()







Carlos Treviño

unread,
Jul 6, 2015, 9:57:36 AM7/6/15
to caffe...@googlegroups.com
Thanks for sharing your code, although the error still remains there:
F0706 10:29:05.290323  6352 db_lmdb.hpp:17] Check failed: mdb_status == 0 (3 vs. 0) D

I'm starting to think the problem is in my train_val.prototxt file

I think the PR #440 and #507 deal with this issue https://github.com/BVLC/caffe/pull/507

Mansi Rankawat

unread,
Jul 18, 2015, 4:01:14 PM7/18/15
to caffe...@googlegroups.com
Hi Eran paz,
I am training FCN-32 network using pretrained weights from ILSVRC 16 layer VGG net and finetuning on PASCAL VOC 11 dataset. Even after 17000 iterations the loss remains constant at 3.04452. Kindly help me as to what could be the reason behind the loss not decreasing at all and remaining constant. I am using the code here to create lmdb files (https://github.com/BVLC/caffe/issues/1698).

Thanks,
Mansi

王勇翔

unread,
Aug 11, 2015, 11:48:30 PM8/11/15
to Caffe Users
HI, I'm trying to work with this model and I stuck with this problem as well. 
F0812 11:43:03.424506 32702 softmax_loss_layer.cpp:42] Check failed: outer_num_ * inner_num_ == bottom[1]->count() (262144 vs. 786432) Number of labels must match number of predictions; e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be N*H*W, with integer values in {0, 1, ..., C-1}.
However, I quite sure my label is in shape (1,H,W) and dim is 3 .

There is my code to build lmdb, hoping someone can help me indicating the bug.

 with open('groundtruth/'+inputfiles[i]) as file:
                        arrlist = [[int(digit) for digit in line.split()] for line in file]
                        arr = np.asarray(arrlist)
                        arr = arr[newaxis,:,:]
                        g_dat = caffe.io.array_to_datum(arr)
                        in_txn.put('{:0>10d}'.format(i), g_dat.SerializeToString())
                        i+=1;

eran paz

unread,
Aug 12, 2015, 1:12:34 AM8/12/15
to Caffe Users
Hi
I would first debug to make sure that ndim=3 and shape=1xHxW., because it looks like your shape is 3xHxW.
When you're loading the image it's automatically loads as a 3 dim image (even if it's saved as 1 dim), you need to explicitly reshape to 1 dim.

Loot at my code, the line
msk=msk.reshape(1,msk.shape[0],msk.shape[1])
explicitly reshapes to 1xHxW.
BTW, I saw you're not transposing to CxHxW which might cause you problems down the line...

hope it help

王勇翔

unread,
Aug 12, 2015, 2:09:38 AM8/12/15
to Caffe Users
HI , thank you for the advice and sorry for ... my stupidity XD...
I just write the wrong lmdb path in my train.protxt ....

eran paz於 2015年8月12日星期三 UTC+8下午1時12分34秒寫道:
Reply all
Reply to author
Forward
Message has been deleted
0 new messages