Fully convolutional Network (FCN-32) loss remains constant while training

Mansi Rankawat

unread,

Jul 18, 2015, 3:56:33 PM7/18/15

to caffe...@googlegroups.com

Hi,

I am training FCN-32 network using pretrained weights from ILSVRC 16 layer VGG net and finetuning on PASCAL VOC 11 dataset. Even after 17000 iterations the loss remains constant at 3.04452. Kindly help me as to what could be the reason behind the loss not decreasing at all and remaining constant. I am using the code here to create lmdb files (https://github.com/BVLC/caffe/issues/1698).

Thanks,

Mansi

Evan Shelhamer

unread,

Jul 20, 2015, 1:17:46 PM7/20/15

to Mansi Rankawat, caffe...@googlegroups.com

How are you initializing the deconv. and score layer parameters? See for instance https://gist.github.com/shelhamer/80667189b218ad570e82#file-solve-py that sets up the bilinear interpolation params for the `Deconvolution` layer and zero initializes the 1x1 kernel `Convolutional` layer that computes the class scores.

Leaving both layers with zero params breaks training since only zeros will be propagated.

Evan Shelhamer

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/15f78084-d783-4f31-9e2d-4de26f446d85%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mansi Rankawat

unread,

Jul 20, 2015, 7:15:44 PM7/20/15

to caffe...@googlegroups.com, mansira...@gmail.com

Thanks a lot Evan, I was not initializing the Deconvolution layer to bilinear upsampling. The loss is reducing now and does not remain constant.

Youssef Kashef

unread,

Aug 4, 2015, 10:43:56 AM8/4/15

to Caffe Users, mansira...@gmail.com

Hello Evan,

I'm trying to figure out why my trained fcn has a very large loss and predicts all pixels as background.

Does the initialization through bilinear interpolation initialize all parameters in the Deconvolution layer to non-zero values? When viewing the Deconv. parameters I see blocks of parameters that are either all-zeros or all-non-zeros. Is this expected?

Also, is the zero-initialization of the convolution layer that generates score59 from fc7 implicit. I can't see where the script initializes its parameters to zero.

Thank you very much,

Youssef

Etienne Perot

unread,

Aug 4, 2015, 11:09:16 AM8/4/15

to Caffe Users

Hello there,

I'm a bit confused here on methods to intialize the deconvolution :

if i use caffe-master/ it seems i can use bilinear init ( http://caffe.berkeleyvision.org/doxygen/classcaffe_1_1BilinearFiller.html )
otherwise we need to use the python script?

Also i'm not sure about the "group" term for deconvolutions : when using this layer, does it take all input channels and somewhat outputs the specified number of channels (which is not really what i want) ? or does it just upsample the input blob?

Bjørn Rustad

unread,

Aug 6, 2015, 11:42:59 AM8/6/15

to caffe...@googlegroups.com

On Tue, Aug 4, 2015, at 16:43, Youssef Kashef wrote:
> Hello Evan,
>
> I'm trying to figure out why my trained fcn has a very large loss and
> predicts all pixels as background.
> Does the initialization through bilinear interpolation initialize all
> parameters in the Deconvolution layer to non-zero values? When viewing
> the
> Deconv. parameters I see blocks of parameters that are either all-zeros
> or
> all-non-zeros. Is this expected?

As far as I understand, this is expected. Let's say that your last layer
before deconvolution has two channels (or feature maps), i.e. it is
32x32x2 or something. You want to upscale the two channels separately to
become the two channels of your deconvolutional layer. You do not want
to mix the two channels, which is why half the filter parameters will be
zero.

--
Bjørn Rustad
bj...@rustad.me

Evan Shelhamer

unread,

Aug 6, 2015, 11:47:36 AM8/6/15

to Bjørn Rustad, caffe...@googlegroups.com

Yes, in the current arch / init the channels are interpolated separately either by setting weights to zero off the diagonal or by setting groups == number of channels to accomplish the same effect with less memory.

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/1438875773.1930017.349511553.0D04139F%40webmail.messagingengine.com.

Youssef Kashef

unread,

Aug 6, 2015, 11:51:13 AM8/6/15

to Caffe Users, bj...@rustad.me

Hello Bjørn,

This makes a lot of sense.

In this particular network where the layer preceeding Deconvolution is a fully convolutional layer with 60 feature maps. That's the same number of classes I want to target.

Deconvolution is basically an upsampling of each of these 60 channels independent of the other.

Thank you very much for clarifying this.

Youssef

Youssef Kashef

unread,

Aug 6, 2015, 11:55:05 AM8/6/15

to Caffe Users, bj...@rustad.me

Hi Evan,

thanks for weighing in. I think I get it now. Do you know if using the groups also speeds things up, or is it only a matter of less memory?

Youssef

Evan Shelhamer

unread,

Aug 6, 2015, 11:55:49 AM8/6/15

to Youssef Kashef, Caffe Users, bj...@rustad.me

Note that the deconvolution layer is more general than that and can compute any weights and learn filters just like the convolution layer, but yes, in this instance we are configuring the deconvolution to compute interpolation.

--

You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/4898e1e6-10b7-4ec2-8b3b-ffb09663285f%40googlegroups.com.

Md. Atiqur Rahman

unread,

Aug 7, 2015, 9:26:19 PM8/7/15

to Caffe Users, mansira...@gmail.com

Hi Evan,

As already mentioned in the below thread by Etienne Perot, can we initialize the Deconvolution layer weights by using this param ( http://caffe.berkeleyvision.org/doxygen/classcaffe_1_1BilinearFiller.html )?

Thanks.
Atique

Evan Shelhamer

unread,

Aug 12, 2015, 2:55:38 PM8/12/15

to Md. Atiqur Rahman, Caffe Users, mansira...@gmail.com

Yes, you can use the bilinear filler. The model zoo FCN definitions were written before that filler was defined so the weights are filled by net surgery through Python instead.

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/ede6b8b6-8a78-4aca-af39-fc11b1f7a7d6%40googlegroups.com.

Md. Atiqur Rahman

unread,

Aug 12, 2015, 4:04:14 PM8/12/15

to Caffe Users, rat.c...@gmail.com, mansira...@gmail.com

Dear Evan,

Thank you so much for replying. But, running caffe-master for FCN semantic-segmentation causes an error saying missing crop layer. And, if I go to merge pull request#1976, which actually adds this crop layer, it creates conflicts and so cannot be compiled.

Could you please advise how I can compile caffe with all the required resources for FCN semantic-segmentation?

Regards
Atique

Oscar Beijbom

unread,

Aug 13, 2015, 12:22:03 PM8/13/15

to Caffe Users, rat.c...@gmail.com, mansira...@gmail.com

Hi Atique,

Yes, the current caffe-master branch does not support FCN. Instead, use this branch: https://github.com/longjon/caffe/tree/future. It's probably easiest to compile this branch as-is, rather than merging with the master.

hope this helps.

Oscar

Md. Atiqur Rahman

unread,

Aug 13, 2015, 1:33:08 PM8/13/15

to Caffe Users, rat.c...@gmail.com, mansira...@gmail.com

Thanks a lot Oscar. So, what you mean is I need to use this script https://gist.github.com/shelhamer/80667189b218ad570e82#file-solve-py for initializing the deconvolution layer filter weights. I was actually willing to run the command prompt interface for which the bilinear_filler option available with caffe-master would be required. But, as you mentioned, caffe-master doesn't come with the crop layer, nor allows the PR (#1976) to be merged with FCN at caffe-future branch.

Thanks for your reply.

Regards
Atique

Carlos Treviño

unread,

Aug 21, 2015, 5:34:58 AM8/21/15

to Caffe Users, rat.c...@gmail.com, mansira...@gmail.com

Hi, Atique

It is easy to make fully semantic segmentation from the caffe master branch. To overcome the auto merge issue, you have to add the crop layer from PR#1976 to the vision_layers file. I did it and it just works fine.

Carlos

Md. Atiqur Rahman

unread,

Aug 21, 2015, 7:17:59 PM8/21/15

to Caffe Users, rat.c...@gmail.com, mansira...@gmail.com

Dear Carlos,

Thanks a lot for reliving this issue. But, after cloning caffe-master, if I run the following commands

git checkout master
hub merge https://github.com/BVLC/caffe/pull/1976

it generates an error saying merge conflict.

I would highly appreciate if you kindly mention the exact steps you followed?

Thanks
Atique

feiyi...@gmail.com

unread,

May 24, 2016, 4:00:45 AM5/24/16

to Caffe Users, rat.c...@gmail.com, mansira...@gmail.com

Dear Md,

Do you solve the merge conflict problem, can you give me some advice to solve this problem?

在 2015年8月22日星期六 UTC+8上午7:17:59，Md Atiqur Rahman写道：

Vignesh Ungrapalli

unread,

Nov 23, 2016, 1:39:10 AM11/23/16

to Caffe Users, mansira...@gmail.com

Hi Evan, I get the point that initialization with zeroes will result in propagating zeroes. But, with a truncated normal initialization i see similar issues where loss remains constant and the output is just backgorund. Is there any particular reason as to why the initialization needs to be in bilinear fashion?

Youssef

unread,

Nov 23, 2016, 3:47:41 AM11/23/16

to Caffe Users, mansira...@gmail.com

Hello Vignesh,

The bilinear initialization of the Deconvolution filters and keeping them constant, lets them upsample their direct input through simple linear interpolation.

Reply all

Reply to author

Forward