Fully Convolutional Network and unbalanced label distribution

1,497 views
Skip to first unread message

Nicolai Harich

unread,
Sep 1, 2015, 6:23:23 AM9/1/15
to Caffe Users
Hello,

I want to train a fully convolutional net for semantic segmentation with (for now) only 2 labels.
I have 3 labels: "uncategorized", "ground plane" and my object.

The object is quite thin, thus the labels are rather underrepresented in the train data.

First I've tried to train only on the 2 target values by ignoring the uncategorized label in my SoftMaxWithLoss-Layer.
This gives me an accuracy of 0.9, because the net says always "ground plane", which is most of the time true... but of coarse this is not what I want ;-)

Second I've trained on all the labels (including "uncategorized"). This results in a (bit) more meaningfull segmentation but the output is very coarse. The reason for this is that I upsampled a very coarse 16 pixel stride output. To improve this I will add the score from an earlier layer (like suggested in the FCN-paper).

So my questions are
- Which strategy makes more sense: incorporate "uncategorized" in the training or not?
- How can I do class balancing for FCNs? Would a custom loss-function help, which gives a higher penalty when it missclassifies the minority label?! Or do you have other ideas?

Thanks in advance!
Regards
Nicolai

Etienne Perot

unread,
Sep 12, 2015, 12:11:20 PM9/12/15
to Caffe Users
Hello!

the fcn paper tend to say that class balancing was not that big of a deal, but it gave clues, like

(1) re-weighted SoftMaxWithLoss like you suggest.
(2) dropout between the output and the SoftMaxWithLoss 
(3) put some (not all), your majority labels to ignore label (which should have the same effect than (1)

(4) you could copy-paste your thin object everywhere?

an example here, it mirrors patterns of interest, but you could scale them, rotate, copy-paste on different images... 

def add_fake_instances(im,gt,label=2):

   y,x = np.where(gt==label)

   if x.size==0:

       return None,None    

   h,w,c = im.shape

   img = im.copy()

   mask = gt.copy()

   '''first we just add mirrors'''

   x3 = x

   y3 = y

   x3 = w-x3

   mask[y3,x3] = label

   img[y3,x3,...] = im[y,x,...]

   return img,mask




from my (fairly recent) experience training alexnet in fcn mode on mscoco dataset, it kind of worked not too bad without ignoring anything...

also maybe it would make sense to write another accuracy layer for pixel prediction...? 

ath...@ualberta.ca

unread,
Sep 13, 2015, 7:14:05 AM9/13/15
to Caffe Users
Hi Nicolai,

Convolutional nets perform poorly on "thin" objects (pg 22 of http://arxiv.org/pdf/1409.0575.pdf) by the nature of their design. Think of a thin object's signal being comparatively sparse as it's passed through each convolution up the hierarchy.

I would first try to solve the much simpler CNN problem of two-way classification with one class as background and the other as your thin object with appropriate rotation/flip augmentation. It's hard to know without seeing your thin object but this might be worth your time as CNNs are currently quite thin-blind. This should give you more insight into what works and what doesn't than looking at semantic segmentation output.

Regards,
Andy Hess

Nicolai Harich

unread,
Sep 14, 2015, 3:53:53 AM9/14/15
to Caffe Users
Hi Etienne,

thank you for sharing your thoughts and suggestions. I will test some of them as soon as possible and will report my results here.

Thanks!

Nicolai Harich

unread,
Sep 14, 2015, 4:07:13 AM9/14/15
to Caffe Users
Hi Andy,

thank you for response. I was able to improve my segmentation results a bit. The thin structures are not recognized as good as huge objects, but its seems to be manageable in most of the cases.
I believe that a fully connected CRF (like here: http://arxiv.org/abs/1412.7062) will refine the spatial accuracy dramatically. I think this this is the way I want to go now...
Btw. my "thin" object is a palette in a warehouse environment. The camera has a wide field of view. In my first tests I've created a very detailed gt-mask. Probably its better to label the palette more coarse to avoid the thin structures.

Best regards
Nicolai

mrgloom

unread,
Sep 18, 2015, 12:12:08 PM9/18/15
to Caffe Users
What caffe fork was used for FCN?

вторник, 1 сентября 2015 г., 13:23:23 UTC+3 пользователь Nicolai Harich написал:

Ben

unread,
Sep 21, 2015, 2:33:57 AM9/21/15
to Caffe Users
Hi, Perot, I don't quite understand why (3) has the same effect as (1) as you mentioned ?
say, if you have 4 classes:0, 1, 2, 3,when the class 3 have the majority, maybe more than 80 percent, you put class 3 into ignore_label, what will happen?
So actually, the network will not learn when to classify an object into class 3. Am I right?

Etienne Perot

unread,
Sep 23, 2015, 11:26:48 AM9/23/15
to Caffe Users
Hello to you all!

sorry for my late answer,

Ben : I was proposing to put some pixels from the majority class to ignore label. Not all. If you put an entire class to ignore_label, then you are right, it will not learn this class at all.

mrgloom : i personally use the caffe-master, by making sure that upscoring build blobs of same width & height than original image (you can put extra padding for example) (otherwise you need to use caffe-future and weird crop layers)

ath...@ualberta.ca : you are absolutely right that very sparse signal will have trouble making it its way to the top, after all the downsampling & stride from the network ( around 32 for AlexNet). But I think it is possible to either :

1. remove some layers and relearn
2. use stitch-&-shift (or filter rarefaction as explained in fcn paper)
3. use skip layers at different pooling levels (so as to extract the fine grained structure)
4. use a deconvolution network
5. use a CRF as mentioned by Nicolas ...







Reply all
Reply to author
Forward
0 new messages