How to create the ground truth label mask for pascal voc 11 segmentation set.

6,896 views
Skip to first unread message

beng...@gmail.com

unread,
Jul 8, 2015, 10:35:00 PM7/8/15
to caffe...@googlegroups.com
Hi, anyone can tell me how to create the ground truth label mask for implementing fully convolution networks?
In pascal voc dataset, the segmentation data has been annotated with different color for different class of object. 

But how can I know which color represent which class ? 

Gavin Hackeling

unread,
Jul 9, 2015, 7:28:44 PM7/9/15
to caffe...@googlegroups.com
The colors should be converted into integer class indices. If the segmentation masks are RGB images with the shape 3 x height x width, you should produce an array with the shape 1 x height x width where each pixel is an integer label.

Hope that helps,
Gavin

Ben

unread,
Jul 9, 2015, 9:35:37 PM7/9/15
to caffe...@googlegroups.com
Thanks, Gavin, it's exactly right, I should produce label mask with the shape like 1 x height x width, and each pixel should be label range{0,1,2, ... 20} . 
And I've got another question: in the deploy.prototxt, the input data dimension is 1 x 3 x 500 x 500, but the size of images are actually not 
500 x 500, how to handle this? Do I have to reshape the images to 500 x 500? Does this influence the final evaluation?

Gavin Hackeling

unread,
Jul 9, 2015, 9:45:42 PM7/9/15
to Ben, caffe...@googlegroups.com
Yes, I would reshape the images to 500x500. The predictions should not be degraded so long as you preserve the images' aspect ratios.

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/a8daeec7-ce24-46ec-a75d-dd31c728a637%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ben

unread,
Jul 9, 2015, 10:07:04 PM7/9/15
to caffe...@googlegroups.com, beng...@gmail.com
Thanks, I'll try. 

Evan Shelhamer

unread,
Jul 25, 2015, 2:05:14 AM7/25/15
to Ben, caffe...@googlegroups.com
There's no need to resize the input images. The dimensions in the deploy net are purely for the sake of example. When the data layer batch size == 1 the net is reshaped for each input, so that each training image and ground truth pair are taken at their original size. This reshaping operation is essentially instantaneous in most cases (although it needs to reallocate at times for larger inputs).

The network and solver definitions and inference and solving scripts for this model zoo FCN show how: https://gist.github.com/shelhamer/80667189b218ad570e82#file-readme-md

I hope that helps you brew your fully convolutional network,

Evan Shelhamer

Ben Gee

unread,
Jul 25, 2015, 8:34:02 PM7/25/15
to Caffe Users, evan.sh...@gmail.com
Thanks, that's right. Actually for some tasks, resizing would sacrifice some information. And when creating the label for each pixel, how do you treat the boundary ? I mean the white pixel on the segmentation ground truth annotations. Did you treat it as background? I evaluated your model, and didn't obtain the same mIU. I'm looking for the problem. 

Evan Shelhamer

unread,
Jul 26, 2015, 1:24:05 AM7/26/15
to Ben Gee, Caffe Users
The void pixels in PASCAL VOC -- the boundary region you mentioned -- are ignored in the loss since they are ignored in the evaluation. That is, the loss is defined as

layer {
  name: "prob"
  type: "SoftmaxWithLoss"
  bottom: "score"
  bottom: "label"
  top: "loss"
  loss_param {
    ignore_label: 255
    normalize: false
  }
}
since void is coded as value 255. The loss is not computed at these pixels.

Evan Shelhamer

Evan Shelhamer

unread,
Aug 2, 2015, 9:14:30 PM8/2/15
to Steve Bengi, caffe...@googlegroups.com
The model zoo FCNs for PASCAL VOC are trained on SBD and evaluated on what we call "seg11valid," which is part of the PASCAL VOC 11 segval split that does not intersect with SBD train, as explained in footnote 7 of the paper.

For convenience of replication, see this gist with seg11valid PASCAL IDs: https://gist.github.com/shelhamer/edb330760338892d511e

Evan Shelhamer

On Fri, Jul 31, 2015 at 8:50 PM, Steve Bengi <beng...@gmail.com> wrote:
Hi, shelhamer, I did what you have suggested, and don't get the reported mIU on pascal voc 2011 seg val data. The reported mIU is 64.0, and I get about 72, there must be something wrong with my code, but I don't know where. I searched for many concepts, like the mIU metric, but still in fusion.

Below is the code for testing, can you help see it and give me some possible suggestion.



net = caffe.Net("deploy.prototxt", "fcn-32s-pascal.caffemodel", caffe.TEST)

files = []
with open("val.txt","r") as f:
    for line in f.readlines():
        line = line.strip("\n")
        files.append(line)
f.close()

colors = []
with open("color.txt", "r") as cf:
    for line in cf.readlines():
        line = line.strip("\n")
        tmp = line.split()
        colors.append((int(tmp[3]), int(tmp[2]), int(tmp[1])) )
cf.close()

bound = (192,224,224)
sum = 0.
ct=0 # counter 

for l in files:
    ct += 1
    print ct, l
    img = cv2.imread("data/images/%s.jpg"%(str(l)))
    in_ = np.array(img, dtype=np.float32)
    in_ -= np.array((104.00698793,116.66876762,122.67891434))
    in_ = in_.transpose((2,0,1))

    net.blobs['data'].reshape(1, *in_.shape)
    net.blobs['data'].data[...] = in_
    net.forward()
    out = net.blobs['upscore'].data[0].argmax(axis=0)
    
    mask = cv2.imread("data/mask/%s.png"%(l))
    label = np.zeros(mask[:,:,0].shape)

    obj = []
    value = 0.
    num = 0
    for i in range(len(colors)):
        eq = np.equal(mask, colors[i])
        eq = np.prod(eq, 2).astype(int)
        if (eq > 0).any():
            num += 1
            label += ( i+1 ) * eq
            obj.append(i+1)
    if num == 0:
        raise
    eq = np.equal(mask, bound)
    eq = np.prod(eq, 2).astype(int)
    label[np.nonzero(eq > 0)] = 255
    out[np.nonzero(eq > 0)] = 255
    
    # including background 
    if(label == 0).any():
        obj.append(0)
        num += 1

    for idx in obj:
        count1 = np.count_nonzero( (out == idx) & (label == idx))
        count2 = np.count_nonzero( (out == idx) | (label==idx))
        print count1, count2
        value += float(count1)/count2 
    value = value/num
    sum += value
        
print 1111, sum/1111

lu qi

unread,
Nov 24, 2015, 9:00:00 PM11/24/15
to Caffe Users
Hi, I want to know the answer of you last questions, how can I know which color represent which class. I am very confused that 
how to make the label based on the colors image. Thank you. 
在 2015年7月9日星期四 UTC+8上午10:35:00,Ben写道:

侯文博

unread,
Dec 12, 2015, 10:28:08 AM12/12/15
to Caffe Users
I also encountered the same problem. Do you find the correspondence between label and color?

在 2015年11月25日星期三 UTC+8上午10:00:00,lu qi写道:

Martin Keršner

unread,
Jan 24, 2016, 10:22:33 PM1/24/16
to Caffe Users
Check VOClabelcolormap.m in development kit code.

meln...@gmail.com

unread,
Oct 19, 2016, 7:26:46 AM10/19/16
to Caffe Users
Yep, seems there is convention for creating such colours. The convention is implemented here http://vision.cs.utexas.edu/voc/VOCcode/VOClabelcolormap.m, it is designed so that adjacent indices had different colours.

Mateo Villa

unread,
Mar 26, 2017, 7:38:33 AM3/26/17
to Caffe Users
Hi, I'm new in caffe. I want to fine-tune my own dataset with any of the voc-fcns. My images are  in gray-scale , in bmp format.I only have 1 class (2 classes , if we include the ground) . Should I convert them to .jpg ? It is necessary to store the segmentation ground truth image in png format? Or can I create them in bmp ?  Because when I stored them in .png format, with the opencv library, the images loss quality.

Thank you for your answers,

Mateo

zhe...@gmail.com

unread,
Mar 28, 2017, 11:04:09 AM3/28/17
to Caffe Users
Hi:
   Can U tell me how to make the lable image(ground truth) in FCN? What tools or procedures?
   Thanks,very much.

在 2017年3月26日星期日 UTC+8下午7:38:33,Mateo写道:

Evan Shelhamer

unread,
Apr 12, 2017, 2:03:39 AM4/12/17
to zhe...@gmail.com, Caffe Users
To train an FCN with the softmax loss all you need to do is turn your annotation into a 1 x H x W array of class indices from {0, 1, ..., K - 1} for K classes. This is done for annotations stored in png in this example from fcn.berkeleyvision.orghttps://github.com/shelhamer/fcn.berkeleyvision.org/blob/master/voc_layers.py#L108-L116

The data can be stored however you like, as long as you can transform it into such an array.

Evan Shelhamer





--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users+unsubscribe@googlegroups.com.

To post to this group, send email to caffe...@googlegroups.com.

greeny

unread,
Nov 14, 2017, 11:40:39 PM11/14/17
to Caffe Users
Excuse me, I am curious about if I want to classify crack class , so I need to generate the mask but if some image quality is not good can I just mask one part of the target  or I still need to mask the whole crack ? 
Another question is about the class 0(background),if I want to classify dog and cat in different background , so when I mask the cat (grayvalue=1)images and dog(grayvalue=2) images , the question is the class 0 background (grayvalue=0) will also training ?  
Evan Shelhamer於 2017年4月12日星期三 UTC+8下午2時03分39秒寫道:
To train an FCN with the softmax loss all you need to do is turn your annotation into a 1 x H x W array of class indices from {0, 1, ..., K - 1} for K classes. This is done for annotations stored in png in this example from fcn.berkeleyvision.orghttps://github.com/shelhamer/fcn.berkeleyvision.org/blob/master/voc_layers.py#L108-L116

The data can be stored however you like, as long as you can transform it into such an array.

Evan Shelhamer





On Tue, Mar 28, 2017 at 8:04 AM, <zhe...@gmail.com> wrote:
Hi:
   Can U tell me how to make the lable image(ground truth) in FCN? What tools or procedures?
   Thanks,very much.

在 2017年3月26日星期日 UTC+8下午7:38:33,Mateo写道:
Hi, I'm new in caffe. I want to fine-tune my own dataset with any of the voc-fcns. My images are  in gray-scale , in bmp format.I only have 1 class (2 classes , if we include the ground) . Should I convert them to .jpg ? It is necessary to store the segmentation ground truth image in png format? Or can I create them in bmp ?  Because when I stored them in .png format, with the opencv library, the images loss quality.

Thank you for your answers,

Mateo
On Thursday, July 9, 2015 at 4:35:00 AM UTC+2, Ben wrote:
Hi, anyone can tell me how to create the ground truth label mask for implementing fully convolution networks?
In pascal voc dataset, the segmentation data has been annotated with different color for different class of object. 

But how can I know which color represent which class ? 

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.

To post to this group, send email to caffe...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages