Fully convolutional classifier

329 views
Skip to first unread message

eran paz

unread,
Sep 30, 2015, 3:43:33 AM9/30/15
to Caffe Users
Hi
I'm trying to build a classifier based on a fully convolutional net, I needs it to be fully convolutional to be able to run on different size input.
Assuming I have K classes the layer before softmax should be 1xKx1x1 (NxCxWxH).
My problem is how can I assure that my W and H are 1x1? the shape of the filter depends on the inputs size.

Any ideas? has anyone tried this kind of classifier?

THX



Youssef Kashef

unread,
Sep 30, 2015, 4:49:57 AM9/30/15
to Caffe Users
Hello Eran,

You might want to look up the FCN model definitions on Model Zoo for reproducing the experiments in this paper. The network definition and solver for one of the FCN networks is provided here.
This particular network is trained to predicts a pixel-level 60-way classification. 59 classes and the 60th stands for background. The last three layers of in the train_val.protoxt are worth looking at.
  • A convolution layer "score59" with 60 kernels, each of size 1x1. This is one kernel per class.
  • A deconvolution layer. Also with 60 kernels, one for each class. Except that the kernel size is much large 64. You can imagine that the feature maps will be much larger than the input.
  • A crop layer will strip away elements of the feature map, so that you end up with a feature map for each output class that matches the dimensions of the input image.
  • The SoftmaxWithLoss computes the loss. No need to specify any dims here.
Hope this helps,

Youssef

eran paz

unread,
Sep 30, 2015, 5:12:35 AM9/30/15
to Caffe Users
Hi Youssef
Thanks for your reply, I'm actually using your version of the FCN, so thanks again :)
I'm familiar with everything you've sent, I've already trained my FCN for segmentation successfully, but here I'm facing a different problem.
I don't want pixel level prediction, I just want a single prediction per image, think ImageNet without scaling or cropping the input...

So my problem is really about getting the width and height of the last kernels to 1x1 so I'd get a single value compared to my single value label...
Otherwise I'm getting NxN kernels compared to a single values which raises an error...

If you have any ideas I'd be happy to hear.

Thanks again

Youssef Kashef

unread,
Sep 30, 2015, 5:25:19 AM9/30/15
to Caffe Users
Hello Eran,

Very interesting. I'm also curious to know how best to support variable sized input and while keeping the size of the output vector fixed. I assumed warping and cropping the input image to a fixed size was the only way, but I haven't done enough reading if there's a better solution without fiddling with the input.

Something I was thinking of trying out is appending a max pooling layer at the end, that does not have a fixed kernel size. Instead of pooling over window it would pool over the feature maps. That way you end up with a fixed size output. It's still a question if this will perform better than classification after warping scaling the input.
Do you know if the current pooling layer in Caffe already supports this? Or do you always have to specify a kernel size?

Thanks,

Youssef

eran paz

unread,
Sep 30, 2015, 6:44:00 AM9/30/15
to Caffe Users
Youssef
I was thinking a very similar thought, but instead of appending a pooling layer with changing kernel size, I was thinking to add a convolutional layer with changing kernel size.
This way you reduce the feature map to a 1x1xK which supports single value labels.
I don't think it's implemented in caffe layers (neither pooling or convolution), might be a nice contribution....
I'll let you know how I progress

THX

Youssef Kashef

unread,
Sep 30, 2015, 6:56:48 AM9/30/15
to Caffe Users
Hi Eran,

Not sure I follow the idea of a conv. layer with variable kernel size. How would you learn the weights if you don't always know how many you need?
Or will you add new randomly initialized weights once they're needed?

eran paz

unread,
Sep 30, 2015, 7:01:03 AM9/30/15
to Caffe Users
I guess you're right, have thought it through...
I initially thought about pooling layer, but I'm afraid that pooling this late in the network will lose too much information...
anyway, I think it worth investigating, I think that enabling a network for different input size is highly important.

Thanks for you inputs, they really help.
Eran

Evan Shelhamer

unread,
Sep 30, 2015, 2:06:47 PM9/30/15
to eran paz, Caffe Users
One can pool over all spatial dimensions with "global" pooling: https://github.com/BVLC/caffe/blob/master/src/caffe/proto/caffe.proto#L768-L770

Evan Shelhamer



--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/e19d8064-35c8-4a03-b1f5-78b1e20b1906%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

eran paz

unread,
Sep 30, 2015, 2:16:02 PM9/30/15
to Caffe Users, era...@gmail.com
Hi Evan
Thanks! Exactly what I was looking for.
I'll give it a try.

Thanks
Eran
Reply all
Reply to author
Forward
0 new messages