fully convolutional training for bounding box

1,114 views
Skip to first unread message

Etienne Perot

unread,
Jul 31, 2015, 5:29:38 AM7/31/15
to Caffe Users
Hello,

I'm using caffenet to finetune a cnn to segment (coarsely) pedestrians using the inria person dataset.

The steps i got so far :

1. building a dataset of segmentation with input image blob ( N, 3, Height, Width) (rgb format preprocessed by caffe transformer)

2. building a mask of pedestrians out of the ground truth, making it much smaller so that output of the cnn matches this map. Ground truth 2d is therefore ( N, Height/32, Width/32) (stride of the whole convnet is exactly, I made sure of that by adding an ugly padding at conv1 of 4 so that every downscale rounds up exactly to product of different strides). I got it by computing  Intersection over Union overlap of bounding boxes of 32 x 32 with actual ground truth bounding boxes.

3. the input image & the mask are downscaled several times and stitched into the same plane like that :



( data is written using hdf5, which is so much simpler than lmdb )

4. finally the prototxt use conv1 to conv5 and redefines fc6 to fc8 into fc6-conv to fc8-conv and changes the types from "InnerProduct" to "Convolution" (so this way you do not need the net_surgery which is a bit artificial, after all, "a fully connected layer is just a 1x1 convolution", as master lecun once said)

i obtain fair not too bad results and, as it is stated in fcn paper, training is much faster, and database is much smaller than for patch training):


Ok, cool. Now what I would like is to also train a regression of bounding box. but here i get a bit confused. My idea was to build a map of (N, 4, height/32, width/32) with each channel being simply x_shift, y_shift, width, height. But..for each pixel which is negative (meaning Intersection over Union is smaller than a threshold), what kind of values should i put? Isn't it gonna harm the training with irelevant output? 

I'm quite new to caffe, so i don't know which additional layer i should use... WindowData? or is there some value that prevents the net to propagate backprop on marked positions?

Otherwise, i guess i can always continue to do patch training for this particular finetuning?



Evan Shelhamer

unread,
Aug 3, 2015, 3:56:44 PM8/3/15
to Etienne Perot, Caffe Users
To train the regression with missing data / ignored outputs you'll need to extend the EuclideanLossLayer to for instance take a mask of ignored instances or loss weights and zero out the loss for the missing targets (the occluded or under threshold bounding boxes).

For the fully convolutional conversion

use conv1 to conv5 and redefines fc6 to fc8 into fc6-conv to fc8-conv and changes the types from "InnerProduct" to "Convolution" (so this way you do not need the net_surgery which is a bit artificial, after all, "a fully connected layer is just a 1x1 convolution", as master lecun once said)

​did you not inherit the weights of the previously defined fully connected layers? The point of the net surgery is to transfer the weights over in a way that makes the Caffe parameter checking happy (since the dimensions are different for convolution vs. fully connected weights even though in memory they are the same in Caffe).​ However, you might have disabled this shape checking.
 
​If not, and you did not transfer these weights, you will likely achieve better results if you initialize from the fully connected params too and not just the original convolutions.

Hope that helps. If you extend the EuclideanLossLayer for missing values or per-instance loss weights send a PR!

Evan Shelhamer

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/a7711932-1777-4e5a-b241-f3a5b5640edb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Etienne Perot

unread,
Aug 4, 2015, 1:32:16 PM8/4/15
to Caffe Users, et.p...@gmail.com
Thanks for your answer !

1. the shape checking did not occur as the layer was named differently (fc6-toto), but you are probably right! i will try with the original layers, and finetuning might be faster too. 
2. Thanks for your idea about the mask.! do you know if  https://github.com/BVLC/caffe/pull/1020 works ? 

Etienne

Carlos Treviño

unread,
Aug 21, 2015, 5:15:51 AM8/21/15
to Caffe Users, et.p...@gmail.com
Hi,

I saw tthat PR#1020 is was replaced by https://github.com/BVLC/caffe/pull/1703 which is already merged. How do you calculate the Intersection over Union for the segmented images? So far i created a script that change the colors of my segmentation to be the same as in the ground truth images.

Carlos

Etienne Perot

unread,
Aug 21, 2015, 5:56:05 AM8/21/15
to Caffe Users, et.p...@gmail.com
Hello carlos!

here is the small function to "rapidly" compute intersection over union :

#w, h are reference bounding box size
#return variables are matrices, you can get IoU by calling SI/SU
def compute_pascal_mapping(pd,w,h,imwidth,imheight):

    boxes = np.mgrid[0:imheight,0:imwidth]    

    bx1 = boxes[1]-w/2

    by1 = boxes[0]-h/2

    bx2 = boxes[1]+w/2

    by2 = boxes[0]+h/2  

    SA=h*w

    SB = (pd[3]-pd[1])*(pd[2]-pd[0])

    min2 = np.minimum(bx2,pd[2])

    max0 = np.maximum(bx1,pd[0])

    min3 = np.minimum(by2,pd[3])

    max1 = np.maximum(by1,pd[1])      

    SI=np.maximum(0,min2-max0)*np.maximum(0,min3-max1)

    SI=SI.astype(np.float)

    SU=SA+SB-SI

    return SI,SU,SA,SB


actually, i'm not quite sure this is the best way, one could just fit a gaussian with the bounding box parameters... :-S
keep in mind, this was an ugly trick made just because Inria did not have any segmentation ground truth. You don't need that if you use, for example, mscoco or sift flow datasets... Also it does not always highlight well small limbs, therefore accuracy during training is quite small (around 93%)

Carlos Treviño

unread,
Aug 27, 2015, 9:52:16 AM8/27/15
to Caffe Users, et.p...@gmail.com
Thanks Etienne!

The code is very clear, but i still don't get what pd means. if you can clarify that it would be awesome.

I was looking at this task trying to do

true positive/ (true positive + false positive + false negative )

but i see you substract the true positive from the denominator.

Carlos
Reply all
Reply to author
Forward
0 new messages