Fully Convolutional Network Only Predicts One Class

3,264 views
Skip to first unread message

Gavin Hackeling

unread,
Jul 6, 2015, 6:35:22 PM7/6/15
to caffe...@googlegroups.com
Hi all,

I am trying to use a fully convolutional network for semantic image segmentation.

I am using the same network architecture as FCN-32s https://gist.github.com/shelhamer/80667189b218ad570e82#file-readme-md, with the exception that I have changed the `num_output` in the `upscore` and `score59` layers and that I am training on images with dimensions 300x300. Here are my train_val.prototxt and my deploy.prototxt.
I am not fine-tuning as my GPU has insufficient RAM. I am training the network with solve.py, except that I am not loading the VGG16FC weights.

I am using a web app with this library to produce the training data. I am following these instructions to export the segmentation data as JSON, and then converting the JSON to .mat with this script. I then use this script to produce the LMDBs. Before serializing the datum to a string, the images are uint8 NumPy arrays containing the integers [0, 8] with the shape (1, 300, 300).

I've tried evaluating the network after 100-40,000 iterations with eval.py, modified to use the correct deploy.prototxt and weights; only zeros are predicted. I believe that I am producing the LMDBs correctly; I suspect that the convolution parameters of my [de]convolution layers need to be changed to accommodate the change in the image sizes.

Any help would be appreciated.

Thanks,
Gavin


Gavin Hackeling

unread,
Jul 6, 2015, 6:44:49 PM7/6/15
to caffe...@googlegroups.com
I should also note that I am using https://github.com/longjon/caffe/tree/future.

Carlos Treviño

unread,
Jul 15, 2015, 10:35:43 AM7/15/15
to caffe...@googlegroups.com
Have you solved this issue? I'm also training a fcnn for semantic image segmentation, the snapshots only show one class so far.

Gavin Hackeling

unread,
Jul 15, 2015, 11:02:04 AM7/15/15
to Carlos Treviño, caffe...@googlegroups.com

Nope, I have not. I've tried creating synthetic  2-class datasets, which also resulted in predictions of only one class.

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/7ecaa76d-3e17-49a9-b83e-de5f6178125c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Carlos Treviño

unread,
Jul 15, 2015, 12:17:13 PM7/15/15
to caffe...@googlegroups.com, trevino....@gmail.com
I created a vector for the label colors and used it to map the groundtruth with a respective label value. But since my loss function still has a high value I'm not sure if my result is due the latter reason or just because an incorrect creation of the groundtruth lmdb file

Carlos Treviño

unread,
Jul 16, 2015, 1:14:59 PM7/16/15
to caffe...@googlegroups.com, trevino....@gmail.com
Did you check your RGB values for the ground truth? I realized that due resizing my RGB values are not the same, and therefore they don't match my mask anymore. I will try with a fixed mask or with the original images to see if the problem lies there.

-Carlos


El miércoles, 15 de julio de 2015, 17:02:04 (UTC+2), Gavin Hackeling escribió:

Gavin Hackeling

unread,
Jul 16, 2015, 1:53:19 PM7/16/15
to Carlos Treviño, caffe...@googlegroups.com
I'll verify that tonight. I am loading and resizing the RGB images as follows:


Gavin Hackeling

unread,
Jul 16, 2015, 1:54:30 PM7/16/15
to Carlos Treviño, caffe...@googlegroups.com
Sorry, the cat got my keyboard.

I'll verify that tonight. I am loading and resizing the RGB images as follows:

im = np.array(Image.open(in_).resize((HEIGHT, WIDTH), Image.ANTIALIAS))
im = im[:, :, ::-1]
im = im.transpose((2, 0, 1))
im_dat = caffe.io.array_to_datum(im)
in_txn.put('{:0>10d}'.format(in_idx), im_dat.SerializeToString())

Evan Shelhamer

unread,
Jul 16, 2015, 6:21:04 PM7/16/15
to Gavin Hackeling, Carlos Treviño, caffe...@googlegroups.com
Learning these models without fine-tuning could be problematic. It could take more careful initialization of the params and solver. Note that the original VGG-16 model could not even be trained all-at-once and had to be iteratively fine-tuned. It could be useful to try the MSRA initialization merged in PR #1946.

Have you checked that the model weights are initialized as desired, and for instance the conv. layers all produce non-zero output and the deconv. layer parameters are set to do bilinear interpolation? You can check the weights through pycaffe in the manner shown in the editing model parameters notebook.

Good luck,

Evan Shelhamer

Carlos Treviño

unread,
Jul 17, 2015, 1:17:53 PM7/17/15
to caffe...@googlegroups.com, trevino....@gmail.com
Gavin,

I just ran some tests to see how the compressed images look like, and i found out that some noise is generated by doing ANTIALIAS, i suggest you to try using the nearest neighbor.
However after checking this resizing I'm still stucked. I will train it again but not from scratch.
im = np.array(Image.open(in_).resize((HEIGHT, WIDTH), Image.NEAREST))
im = im[:, :, ::-1]
im = im.transpose((2, 0, 1))
im_dat = caffe.io.array_to_datum(im)
in_txn.put('{:0>10d}'.format(in_idx), im_dat.SerializeToString())

Evan,
Thanks for your answer, i was thinking about doing the net surgery to the Alexnet to base my semantic segmentator on the last network you uploaded (FCN-AlexNet PASCAL).
Are there other considerations regarding this model?

Carlos

Caio Mendes

unread,
Jul 17, 2015, 4:01:55 PM7/17/15
to caffe...@googlegroups.com
Hello guys.

So, I have the same problem here. Could anyone check my prototxt (annexed) to see if I'm missing something? 

I'm training to train a smaller network based on FCN-32 from scratch. It's a binary problem. I tried all learning rates, batch sizes, non-normalized softmax and so on...

After analyzing the blobs and params, what happens is: the layer before the last one (conv4_2) outputs only zeroes and it does so because it learns only negative params which are zeroed by the relu operation.
The image input and the labels are OK.

The actual output of the network is only the bias of the last layer, so it is the same value everywhere. The bias favor the most frequent class (I tried switching classes values).

Could the guys having the same problem check if this happens with your network also? I have a script (annexed) that I use the check the blobs and weights, you can adapt to you network.

Best regards,
stage_1_train.prototxt
pytrain.py

Caio Mendes

unread,
Jul 17, 2015, 7:08:31 PM7/17/15
to caffe...@googlegroups.com
I was finally able to train a FCN, the problem with my net was the weight initialization. I changed to xavier and it worked. I annexed a working model.

However I use a modified caffe, originally from https://github.com/HyeonwooNoh/caffe. I may be able to help others, just reach me via email.

Best regards, 
stage_1_train.prototxt

Mansi Rankawat

unread,
Jul 18, 2015, 3:35:24 PM7/18/15
to caffe...@googlegroups.com
Hi all,

I am using FCN-32 architecture and training on PascalVOC 11 dataset with finetuning using ILSVRC 16 layer VGG net. I let the training run till 17000 iterations and still I see no decrease in loss. It just remains constant throughout. Please help me as to what could be going wrong here?

Thanks
Mansi


On Monday, July 6, 2015 at 6:35:22 PM UTC-4, Gavin Hackeling wrote:

Gavin Hackeling

unread,
Jul 19, 2015, 5:53:53 PM7/19/15
to caffe...@googlegroups.com
@Evan @Caio
Yes, my problem is that some of the convolutional layers only output zeros. Thanks for the advice regarding initialization; this appears to be the cause.

@Caio
Can you describe the IMAGE_SEG_DATA layer type? What is the format of the imglist_train.txt file? Is it a list of RGB images and their corresponding gray scale masks? Should the labels be consecutive integer values starting from 0?

Carlos Treviño

unread,
Jul 20, 2015, 3:31:39 AM7/20/15
to caffe...@googlegroups.com
Hi everybody,

I finally managed to train my own conv-net from scratch basing my deploy.prototxt and the train_val.prototxt on this architecture https://gist.github.com/shelhamer/3f2c75f3c8c71357f24c#file-readme.md . The key lies in the weight_filler parameters, in my case it worked well with a gaussian

@Gavin
Just for you to consider, when you use Image.resize, the order of the values is (base width, height). You just have to swap them.

Carlos

Gonçalo Cruz

unread,
Mar 10, 2016, 7:14:55 PM3/10/16
to Caffe Users
Hello everyone,

I guess this thread is quite old but I will really appreciate any help to fine-tune a fcn-alexnet.

I am currently using future branch and I have been able to run the eval.py script and obtain a segmented image (using this model).
Right now, I will like to adapt the network to perform segmentation on a binary problem.
I guess that I might have a problem in the train_val.prototxt. I think that I just have to change the number of outputs from 21 (from the PASCAL case) to 2 (in my binary problem). I am also following the eval_fcn-alex.py script to do this fine-tuning.

I am getting the following error:
"...
upsample
m and k:  2 1
input + output channels need to be the same
"

Can point out any errors that I might have on the train_val.prototxt?

Many thanks.
Best regards,
Gonçalo Cruz
fcn_alex_train_val.prototxt
fcn-alex_solve.py

Christos Apostolopoulos

unread,
Mar 19, 2016, 4:22:40 PM3/19/16
to Caffe Users
@Gonçalo Cruz

Did you manage to solve this ? I'm stuck at the same point...

Gonçalo Cruz

unread,
Mar 21, 2016, 5:02:40 AM3/21/16
to Caffe Users
Hello Christos,

Yes I did.
Actually, there were several issues.

The first and most obvious was that the path in fcn_alex_train_val.prototxt contained a path to a dataset with 21 classes (instead of the 2 classes that I wanted).
The second was that on the deconvolution layer, group had to be changed from 2 to 1.

Please check if these solve your problem.

Best regards,
Gonçalo Cruz

S. Majid Azimi

unread,
Mar 21, 2016, 11:01:09 AM3/21/16
to Gonçalo Cruz, Caffe Users
Hi,

That's right. you have to change the number of classes in two places such as deconvolution layer and last fully convolutional layer. You also have to decrease the learning rate at least by one tenth. also you have to change the name of last two mentioned layer and initialize them with gaussian noises and it would be better to increase the learning bl_lr by ten.
in addition the input path should be changed to your file path.

Best
Majid

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.

Christos Apostolopoulos

unread,
Mar 21, 2016, 11:09:22 AM3/21/16
to Caffe Users
Goncalo, majic,

Thanks . Actually I also tried changing the group to one but wasn't sure if that was a good fix or just a poor work around. It's just that my loss continues to be around 60-80 all the time where usually it should be a few thousand. When creating the lmdb files I actually threshold my ground truth grayscale image to have only zero and ones in order for the net to classify it correctly. I think that's my issue or the solver.prototxt . If you have yours available and could post it I would really appreciate it :)

Gonçalo Cruz

unread,
Mar 21, 2016, 1:21:15 PM3/21/16
to Caffe Users
Regarding the creation of the lmdb's, you can check my procedure here.
As for the solver, I am sending you the one that I using. Not really sure if I have the best parameters, though.
fcn-alex_solver_test.prototxt

Evan Shelhamer

unread,
Apr 14, 2016, 4:40:09 AM4/14/16
to Gonçalo Cruz, Caffe Users
Instead of making a pair of LMDBs, you can load the data through a Python data layer as done here: https://github.com/shelhamer/fcn.berkeleyvision.org/blob/master/layers.py. This is more flexible and doesn't need the additional storage. Since FCNs are more computationally intense, the need for fast IO is less dire.

Please see the master edition of our reference FCNs and related code: http://fcn.berkeleyvision.org.

Evan Shelhamer





--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.

Majid

unread,
Nov 5, 2016, 12:22:24 PM11/5/16
to Caffe Users, g.chart...@gmail.com
Hi,

Thanks @Evan. I have a problem with using python layer which you mentioned to read the data as it reads all the dataset at once and then begins to train the net. My dataset is big and even my machine with 256GB RAM can't afford loading the whole dataset. On the other hand, it takes a while to read the samples. Is there any way to read only the images with batch size for each iteration?

I have another problem as well. My dataset has 5 classes including background, but the output is always contains class 2 and background. I am finetuning FCN8s.

Best,
Majid
Reply all
Reply to author
Forward
0 new messages