Fully convolutional neural net

Kien Nguyen Thanh

unread,

Jun 7, 2015, 10:00:23 PM6/7/15

to caffe...@googlegroups.com

Hi all,

Is there anyone managing to run semantic segmentation FCN models on the future branch of Caffe? I have been around with the previous version of Caffe sometime but now having trouble installing and running, testing the model provided in the Model Zoo.

1) When installing using the same procedure as previous, the making commands (make pycaffe, make all and make test) return errors.
2) How to prepare image data for segmentation. Are we using the same python script "classify.py" to segment the probe images?

I appreciate any ideas. Thanks in advance.

Christopher Catton

unread,

Jun 7, 2015, 10:23:48 PM6/7/15

to caffe...@googlegroups.com

1) Could you provide more detail on the errors? I'm guessing either you do not have a dependency installed or your graphics driver needs to be updated (the latest drivers from the nvidia website should work)

2) You can use a single image if you are just testing the model. Eval.py in the pascal context models show how to do testing on the model. If you are looking to train the model on your own dataset then your are probably about where I am. I'm still having trouble getting that bit going.

Kien Nguyen Thanh

unread,

Jun 8, 2015, 9:05:01 PM6/8/15

to caffe...@googlegroups.com

Thanks for the response Chris.
1) I got error on the math_function file: "make : *** [build/src/caffe/util/math_functions.cuo] Error 2". After modifying that file, the make commands work well now.
2) While using Eval.py for segmenting an image, I got the following error:
" File "python/eval.py", line 15, in <module>
net = caffe.Net('examples/FCN/deploy.prototxt', 'examples/FCN/fcn-32s-pascalcontext.caffemodel', caffe.TEST)
AttributeError: 'module' object has no attribute 'TEST'".
Did you have this problem before?
3) By the way, will the Eval.py output a segmented image, or we will need to add extra code to present it?

Thanks.

Christopher Catton

unread,

Jun 9, 2015, 1:51:50 AM6/9/15

to caffe...@googlegroups.com

1) I've never needed to modify any of the caffe code to build it. Are you using Atlas? I think I might of had a similar error using OpenBlas.

2) Are you exporting the python path as described in the installation guide?

3) You'll need to add the code the script to be able to present it or store it as you want.

Do you have any problems making the the master branch of the caffe repository https://github.com/BVLC/caffe ? If you don't have a problem with said branch then you may want to run "git branch" and make sure that you have cloned the correct branch.

Kien Nguyen Thanh

unread,

Jun 9, 2015, 9:13:09 PM6/9/15

to caffe...@googlegroups.com

1) I am using Atlas
2) I have no problem installing master branch so far, all working perfectly.

I still haven't found any way to walk around the issue "AttributeError: 'module' object has no attribute 'TEST'" in the eval.py file.

Cheers

Kien Nguyen Thanh

unread,

Jun 9, 2015, 9:32:20 PM6/9/15

to caffe...@googlegroups.com

Got it done. Thanks Chris for all your useful discussion.

Carlos Treviño

unread,

Jun 10, 2015, 4:53:57 AM6/10/15

to caffe...@googlegroups.com

The next PR explains a little bit how to generate the data for semantic segmentation, nevertheless I´m still stucked in that part.

https://github.com/BVLC/caffe/issues/1698

eran paz

unread,

Jul 5, 2015, 3:01:06 AM7/5/15

to caffe...@googlegroups.com

Hi

Were you able to run the network?

I'm having some trouble with the label matrix.

I've created an image with 0 as background and 1...K marking pixels belonging to each class.

I've created the lmdb according to PR#1698 for both images and labels.

when I run the net I get this error:

Check failed: outer_num_ * inner_num_ == bottom[1]->count() (187500 vs. 562500) Number of labels must match number of predictions; e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be N*H*W, with integer values in {0, 1, ..., C-1}.

As far as I can tell, the problem is that my labels are saved with 3 channels and not 1, but I couldn't figure out how to save them with 1 channel.

Any help would be appreciated

THX

Gavin Hackeling

unread,

Jul 7, 2015, 11:53:34 PM7/7/15

to caffe...@googlegroups.com

Yes, the problem appears to be that your labels have three channels instead of one channel. Assuming that your image has the shape (C, H, W) and that the channel containing your integer class labels is c, you can index that channel using "img = img[c, :, :]".

Mansi Rankawat

unread,

Jul 18, 2015, 4:01:47 PM7/18/15

to caffe...@googlegroups.com

Hi,

I am training FCN-32 network using pretrained weights from ILSVRC 16 layer VGG net and finetuning on PASCAL VOC 11 dataset. Even after 17000 iterations the loss remains constant at 3.04452. Kindly help me as to what could be the reason behind the loss not decreasing at all and remaining constant. I am using the code here to create lmdb files (https://github.com/BVLC/caffe/issues/1698).

Thanks,

Mansi

Ben Gee

unread,

Jul 31, 2015, 5:48:15 AM7/31/15

to Caffe Users, mansira...@gmail.com

Hi, Mansi, have you solved your problem ? I think you might have some problem with the lmdb data.

Have you tested the model on the pascal voc 11 or pascal context voc dataset ?

I'm having trouble in obtaining the same result as reported.

Youssef Kashef

unread,

Aug 4, 2015, 6:34:06 AM8/4/15

to Caffe Users

Hello everyone,

I've been trying to train FCN-32s Fully Convolutional Semantic Segmentation on PASCAL-Context but keep getting very high loss values regardless of how many iterations:"Train net output #0: loss = 767455 (* 1 = 767455 loss)". Sometimes it would go as low as 440K but then it'll just jump back up to something higher and oscillate.

Ignoring the high and letting it go through 80K iterations, I still end up with a network that produces all zero output.

I can't tell what's throwing it off like that.

My procedure in detail:

Follow instructions in future.sh from longjon:future, except that I apply the PR merges to BVLC:master instead of longjon:master. Building off of longjon:future results in cuDNN build errors like here. Applying some of the PR merges to BVLC:master is redundant since they've already been merged to the master branch.
Build Caffe with cuda 6.5, cuDNN. I've tried a CPU only build and got the same high-loss behavior, so I don't think it's related to GPU or the driver (then again, I only let it run for 5K iterations).
Generate LMDB for the PASCAL-Context database. The lmdb generation script is built around Evan Shellhammer's python snippet in this comment in PR#1698.

The images are stored as numpy.uin8 with shape C x H x W, with C=3
The ground truth is stored as numpy.int64 with shape C x H x W, with C=1
The order in which the images are stored is the same as the ground truth. One lmdb for the images and one for the labels. I have two pairs of each to reflect the train/val split.

Use net surgery to turn VGG-16 into a fully convolutional model. For this I pretty much followed the net surgery example and used the layer definitions from the FCN-32s' trainval.prototxt in Model Zoo.

Not sure I did this one right though. The output I get for the cat image is still a single element 2D matrix.
I've tried using VGG-16 fcn weights from HyeonwooNoh/DeconvNet/model/get_model.sh but still getting the same behavior.
How can I verify my fully convolutional VGG-16 better?

Apply the solve.py step for initialzing the deconv. parameters. According to shelhammer''s post here, not doing this, could leave things stuck at zero.

What's a good way of verifying the initialization is correct? I'm worried that the problem is there.

solver.protoxt and trainval.prototxt are identical to those shared on Model Zoo. They only differ in the paths to the lmdbs.
I start the training, I start getting "Train net output #0: loss = 767455 (* 1 = 767455 loss)" sometimes it will go down by several 100K, but I never see values < 10.0 that I've seen some report.

I could really use some help in figuring out what I'm doing wrong and understanding why the loss is so high. It seems that people have figured out how to train these fcn's without the need of a detailed guide, so it seems I'm missing a critical step somewhere.

Thank you

Etienne Perot

unread,

Aug 4, 2015, 12:42:00 PM8/4/15

to Caffe Users

Thanks Youssef Kashef to sharing your detailed procedure.

1. i just built it without cudnn, not sure why we need to fuse?

3. i found that hdf5 was pretty easy to use : you create the dataset this way :

import h5py

f.create_dataset("Images", (maxSamples,3,imahe_height,image_width), dtype='float32')
f.create_dataset("Mask", (maxSamples,1,mask_height,mask_width), dtype='float32') #in practice you want mask_height to be equal to image_height
#write your data as numpy arrays (use a transformer for the image)
shape=(1,3,imh,imw)
transformer = caffe.io.Transformer({'data': shape})
transformer.set_mean('data', np.array([100,109,113]))
transformer.set_transpose('data', (2,0,1))
transformer.set_raw_scale('data',255.0)
n=0
for img,gt in zip(data,gt):
    f["Images"][n] = transformer.preprocess(img)
    f["Mask"][n] = gt.reshape((1,mask_height,mask_width))
    n=n+1 #pardon my french

4. You do not need this step! you can just finetune from your model and replace "InnerProduct" layers by "Convolution". Of course by doing so, all weights in the fully connected part will be gone, but they are fast to train.

5. Here i'm not sure we need this : it seems there is this initialization which is possible :

layer{ type:"Deconvolution"
...
convolution_params{
... 
   weight_filler{
        type: "bilinear"  
   }
}

6. about the solver, it seems it puts very high momentum and very small base learning rate (10^-10) for the un-normalzed softmax, i'm not sure to understand why...

Youssef Kashef

unread,

Aug 4, 2015, 12:51:03 PM8/4/15

to Caffe Users

Some indication of progress:

I think my problem of very high loss was due to all-zero parameters in the fc6 and fc7 layers.

The FCN32s model is trained by fine-tuning the fully convolutional variant of VGG-16 model (vgg16fc).

The VGG-16 model is made fully convolutional by following the net_surgery notebook example.

solve.py describes how to set things up for training, specifically how to initialize the weights of layers fc7, and all preceeding layers. Then it shows us how to initialize the weights fo the remaining Deconvolutional layer.

The initialization of all layers leading up to and including fc7 are done by calling solver.net.copy_from(base_weights) where base_weights is the path to vgg16fc.caffemodel.

For some reason that step is not copying the weights of the fc6 and fc7 layers, leaving them all zero. All earlier conv layer weights are copied correctly.

I solved it by copying the remaining weights in python similar to the net_surgery notebook example.

Will still need to verify what's keep it from copying all layers.

Might be too early to claim victory, but I'm already seeing the loss drop after the first 200 iterations. A significant improvement over the earlier behavior.

Youssef Kashef

unread,

Aug 4, 2015, 12:59:52 PM8/4/15

to Caffe Users

Hello Etienne,

Thanks for weighing in.

re-4: Not sure if I can do without this step, otherwise the weights of my last two convolutional layers (fc6 and fc7) are all zeros and just stay zeros regardless of how many iterations.

re-6: Maybe the small learning rate is because it assumes you're fine tuning VGG-16. Maybe this also contributed to the weights not changing in the last to conv. layers.

Will let it run some more and see if my earlier assumption was valid.

Thanks

Youssef

Fatemeh Saleh

unread,

Aug 6, 2015, 4:56:47 AM8/6/15

to Caffe Users

Hi Youssef,

I exactly do the same steps as you and the loss values is something about 623345 after about 4000 iteration! I would appreciate if you could help me with any successful solution.

Thank you

Youssef Kashef

unread,

Aug 6, 2015, 5:18:22 AM8/6/15

to Caffe Users

Hey Fatemeh,

It seems there's still something I'm doing wrong.

One problem I had was that in solve.py the weights for the fc6, and fc7 were not being copied correctly from my fully convolutional variant of VGG-16 model. They were all zeros.

The weight of the earlier layers were copied correctly.

I ended up copying the weights for fc6 and fc7 using step 9 in the net_surgery notebook example.

That step is just about transplanting a set of parameters from one network to another.

My fc6 and fc7 are no longer zero.

Within 200 iterations of training fcn32s with that initialization, the loss dropped from +600K to the 100K-300K. I also noticed that the "Train net output #0: loss" and "Iteration X, loss = " weren't the same values anymore. Not sure why so.

Unfortunately, I'm now at iteration 44K and the loss is still oscillating in that same range. So there's still something wrong with my setup.

Something I found odd was that the step for initializing the deconvolution layers in solve.py still leaves large bocks of all zero parameters. I don't know if that's intended. Still waiting on a response from Evan Shelhammer in another thread.

Fatemeh Saleh

unread,

Aug 6, 2015, 5:34:47 AM8/6/15

to Caffe Users

Thank you very much for your complete answer.

So, it seems that I should also wait for the response from Evan Shelhammer. I will also try your solution about copying the weights for fc6 and fc7.

Thanks.

Evan Shelhamer

unread,

Aug 6, 2015, 11:53:43 AM8/6/15

to Caffe Users, Youssef Kashef

You are doing the right thing checking the weights as you go and should find your misstep this way.

The loss does oscillate quite a lot with whole image mini-batches but it should descend. Are you making use of gradient accumulation or high momentum as suggested by our paper and the model zoo models?

Are you starting with FCN-32s or directly training a skip architecture? The skips can be more sensitive to hyperparams if learned all at once than through stages of fine-tuning.

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/ed48deaa-8285-433c-b477-ae61115895b9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Youssef Kashef

unread,

Aug 6, 2015, 12:16:36 PM8/6/15

to Caffe Users, youssef...@gmail.com

Hell Evan,

That's reassuring, I'm 50K iterations in and see oscillations around 90K-110K (train loss = 36K-145K). The loss on the training has much stronger variations. I'd like to think that the average loss is decreasing. I wasn't sure what magnitude to expect because I read people posting a loss of < 10.0 with less than 20K iterations into the training. What kind of loss magnitude should I expect.

Maybe it's worth computing the accuracy to compare with values in the paper.

I haven't gotten to trying out any other hyperparameters yet. This is pretty much just taking trainval.prototxt from Model Zoo. Isn't that the definition for FCN-32s (non-fixed) in the paper?

It doesn't use grad. accumulation, a very slow learning rate (base_lr: 1e-10), a very high momentum (momentum: 0.99).

Haven't looked into the skipping archs yet. I figured it would make more sense to train those by fine-tuning.

I guess I'm still at a point where I want to set a baseline for my experiments. I guess then I'll start reading the paper more closely in regards of the recommended learning params and fiddeling with the hyper parameters. Then take on the skipping archs. Do you think that's a reasonable order for going about things. I do want to get a feel for the parameters before doing any crazy stuff.

Thanks,

Youssef

Vladimir Nekrasov

unread,

Aug 7, 2015, 4:43:21 AM8/7/15

to Caffe Users

Hello Youssef,

I have tried to do the same steps as you in 1), but, unfortunately, I have got the 'merge conflict' message when merging first PR:

CONFLICT (content): Merge conflict in include/caffe/vision_layers.hpp

Have you encountered the same problem?
If so, how have you solved it?

Vladimir

вторник, 4 августа 2015 г., 13:34:06 UTC+3 пользователь Youssef Kashef написал:

Youssef Kashef

unread,

Aug 7, 2015, 5:33:01 AM8/7/15

to Caffe Users

Hello Vladimir,

I manually resolved the conflict. The conflict was between SPPLayer and CropLayer class definitions. So I basically disentagled the definitions. More details in this PR. Still pending feedback.

Youssef

Vladimir Nekrasov

unread,

Aug 7, 2015, 9:54:49 AM8/7/15

to Caffe Users

Youssef,

Thank you very much!
Everything has worked fine with your PR.

Vladimir

пятница, 7 августа 2015 г., 12:33:01 UTC+3 пользователь Youssef Kashef написал:

zzz

unread,

Aug 17, 2015, 5:32:01 PM8/17/15

to Caffe Users

Hi Youssef,

Thanks for your details.

For PASCAL-Context database, there are 5105 training images. May I ask how do your split the train/val data?

Thanks in advance for helping!

Zizhao

Youssef Kashef

unread,

Aug 18, 2015, 12:05:04 PM8/18/15

to Caffe Users

Hello Zizhao,

I sort of guessed. My train/val split is based on the 59-category segmentation results reported on the PASCAL-Context webpage. When you scroll down to the "Project Specific Downloads" section, you can download segmentation results generated by Motthagi et al.'s CVPR paper from 2014. The generated segmentations for 5105 images. Those are the ones I grouped into the validation set. I don't know what their splitting strategy was, but curious to learn what it is.

Here's a link to a text file with those 5105 image names (excluding extension).

Youssef

zzz

unread,

Aug 18, 2015, 1:47:12 PM8/18/15

to Caffe Users

Hi Youssef,

I am kind of confused. You said you split train/val on 59-category segmentation results (totally 5105 labeled images) and you also said use 5105 as validation data. I want to make sure you are not using PASCAL full labeled training data with 400+ categories and 10000+ image right? I think you split this 5105 images as train/val right?

Thanks for your help !

Zizhao

zzz

unread,

Aug 18, 2015, 2:07:08 PM8/18/15

to Caffe Users

Hi Youssef,

I followed this instruction https://gist.github.com/shelhamer/80667189b218ad570e82#file-solve-py

to train FCN. My train/val data is totally from those 5105 images from PASCAL-Context webpage.

I download vgg 16 layer model from https://gist.github.com/ksimonyan/211839e770f7b538e2d8#file-readme-md

When I set everything done and run solve.py. I got an error in fc6

I0818 13:46:41.312158 3458 net.cpp:703] Copying source layer relu5_1

I0818 13:46:41.312182 3458 net.cpp:703] Copying source layer conv5_2

I0818 13:46:41.331995 3458 net.cpp:703] Copying source layer relu5_2

I0818 13:46:41.332023 3458 net.cpp:703] Copying source layer conv5_3

I0818 13:46:41.351951 3458 net.cpp:703] Copying source layer relu5_3

I0818 13:46:41.351976 3458 net.cpp:703] Copying source layer pool5

I0818 13:46:41.351981 3458 net.cpp:703] Copying source layer fc6

F0818 13:46:41.351986 3458 blob.cpp:454] Check failed: ShapeEquals(proto) shape mismatch (reshape not set)

This error is from blob.cpp. Looks like the reshape variable is set to zero. Have you met this problem before. I though it may happen in how my data organized in lmdb. But I have checked this which is fine.

void Blob<Dtype>::FromProto(const BlobProto& proto, bool reshape) {

if (reshape) {

vector<int> shape;

if (proto.has_num() || proto.has_channels() ||

proto.has_height() || proto.has_width()) {

// Using deprecated 4D Blob dimensions --

// shape is (num, channels, height, width).

shape.resize(4);

shape[0] = proto.num();

shape[1] = proto.channels();

shape[2] = proto.height();

shape[3] = proto.width();

} else {

shape.resize(proto.shape().dim_size());

for (int i = 0; i < proto.shape().dim_size(); ++i) {

shape[i] = proto.shape().dim(i);

}

Reshape(shape);

} else {

CHECK(ShapeEquals(proto)) << "shape mismatch (reshape not set)";

}

Thanks for your help

Youssef Kashef

unread,

Aug 18, 2015, 3:11:22 PM8/18/15

to Caffe Users

Hi Zizhao,

Out of the 10,103 annotated images in the PASCAL-Context dataset, I excluded the 5105 images used by the authors to demonstrate their segmentation results as the validation set and keeping the rest for training. It's a near 50-50 split with no overlap between the two subsets.

Does this clarify things? Happy to discuss more. Unfortunately, I haven't gotten around to inspecting the distribution of the labels in both subsets.

Youssef Kashef

unread,

Aug 18, 2015, 3:18:42 PM8/18/15

to Caffe Users

Hi Zizhao,

What do you mean by "My train/val data is totally from those 5105 images from PASCAL-Context webpage." Are you training and validating on the same data? If yes, I think you need to split your train and val so that there's no overlap and you don't get misleading evaluations due to overfitting.

You need to turn the fc6 and the fc7 layers of the VGG-16 model from fully connected into convolutional layers. I think the error you're getting in solve.py is because you're loading VGG-16 before making it fully convolutional. In that case, caffe will encounter a mismatch in the number of parameters it expects for layers fc6 and fc7. The trainval.prototxt for the FCN model expects them to be convolutional layers.

Hope this helps.

zzz

unread,

Aug 19, 2015, 3:58:24 PM8/19/15

to Caffe Users

(lets put the discuss back to google group)

Hi Youssef,

This time I got your data preparation method.

I think the set test_iter = 5105 is trying to test all val images one test iteration.

if test_interval is higher than max_iter, the test should never be carried out. I think you are correct.

I haven't successfully run train the FCN. I will discuss with you when I reach to next step.

Thanks so much!!

On Wed, Aug 19, 2015 at 3:55 AM, Youssef Kashef <youssef...@gmail.com> wrote:

Hello Zizhao,

Yes, I'm doing the 59-category scenario where I group all categories outside the 59-set into the background class. So my outputs have dimensions 60xHxW.
Something I still don't understand when it comes to the number 5105:
In solver.prototxt, you see these two lines:

test_iter: 5105
# make test net, but don't invoke it from the solver itself
test_interval: 1000000

According to the solver of caffe's MNIST tutorial:
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 500 training iterations.
test_interval: 500

I don't get it. The batch size in the FCN trainval.prototxt is exactly 1. I'm guessing due to memory restrictions.
This is what I think is going on, can you please correct me if I'm wrong:
test_iter and test_interval don't have anything to do with the batch size.
Since batch size is 1, the solver will perform a fwd pass on 5105 batches and calculate the gradients for each iteration. 1 iteration is equal to 1 image.
test_interval is so high the solver never carries out testing. But we still see it printing the loss for each iteration, once for on the training subset and once on val.

Is this correct?

Thanks,

Youssef

On Tue, Aug 18, 2015 at 10:26 PM, Zizhao Zhang <mr.zizh...@gmail.com> wrote:

Hi Youssef,

I am quite new to segmentation task. But now I am more clear meanings.

I thought you split the 5105 image into train and val (e.g., 3000 for train and 2105 for val). The reason why I am ask is that if you follow the Evan's FCN training instruction, the output layer is a 60 *H*W (so it is 59 categories + 1 background). That's how I infer you use all 5105 as train/val by non-overlapping split.
However, the total PASCAL-Context webpage dataset (10,103) segmentation masks have more than 400+ categories. So if I you train in this one, which means the output of your last layer should be 400+*H*W. Are you training in this way? Or you set the labels outside the 59 categories as background.

For the bug, it is totally clear now. Thank you so much for your great help.

--
You received this message because you are subscribed to a topic in the Google Groups "Caffe Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/caffe-users/3eIMYV0OlY8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to caffe-users+unsubscribe@googlegroups.com.

To post to this group, send email to caffe...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/6dec0d0c-661e-49b5-bdf8-09baec48ec02%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
Best Regards,
Zizhao

--

Best Regards,

Zizhao

Fatemeh Saleh

unread,

Aug 20, 2015, 7:33:32 AM8/20/15

to Caffe Users, youssef...@gmail.com

Hi Youssef,

Did you compare your accuracy with the paper? I have obtained the pixel accuracy and mean accuracy for 5105 test images and my results are 55.48 and 45.17 for FCN-32s and 58.57 and 47.01 for FCN-16s which are different from the paper.

I am now trying to fine tune using PASCAL VOC with 21 classes with the train and validation sets as mentioned in the paper and also the loss is very big.

I did another experiment using (http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/03-fine-tuning.ipynb) to see the fine-tuning loss and the scratch loss. The fine-tuning loss values are much bigger than the scratch loss values and I'm now wondering if I am doing something wrong !!!

Would you please help me with your new information around training this network?

Thank you very much in advance.

Youssef Kashef

unread,

Aug 20, 2015, 8:00:11 AM8/20/15

to Caffe Users, youssef...@gmail.com

Hello Fatemeh,

I only got as far as training FCN32s for 80K iterations and with only a qualitative assessment of generated segmentations. They looked similar to those generated by the pretrained FCN32s model in the Model Zoo. I haven't performed any quantitative evaluations yet. I did notice that the loss was still in the 10K range after 80K iterations. Although lower than what I started off with, I was hoping for lower values. The paper doesn't mention the magnitudes of loss values except in Figure 5, but the context of that is different. They're pretty low and I don't think those are the values one should expect to get. Different context.

You said your loss is very big, but does it decrease after 10K iterations?

BenG

unread,

Aug 20, 2015, 8:57:24 AM8/20/15

to Caffe Users, youssef...@gmail.com

Hi, what data are you using? I mean the experiment on pascal voc 2011 and 2012. How many training and validation?

I use the data from berkeley sds aside from the pascal voc data, about 10k images for training. And got high loss around 500k, I don't know why.

Youssef Kashef

unread,

Aug 20, 2015, 9:16:00 AM8/20/15

to Caffe Users, youssef...@gmail.com

Hello Ben,

Currently only using the PASCAL-Context dataset with the 59-category subset. It's basically full image annotations added to images from VOC 2010. It has about 10K images, approx. 5x that of the VOC segmentation challenge and fully annotated. The train/val split is commonly 50/50.

Are you training from scratch or fine-tuning from another network?

How many iterations did it go through when it reached loss 500K?

Steve Bengi

unread,

Aug 20, 2015, 9:33:11 AM8/20/15

to Youssef Kashef, Caffe Users

Hi, Youssef, I'm finetuning on Pascal voc 2011 and 2012 21-category segmentation task.

1. For voc2011, I use the data only from pascal voc 2011 with the given split: 1112/1111 train/val.

The error just keeps oscillating around 500k from the beginning, and doesn't decrease much after 10k iterations.

I'm looking for the reason.

2. For voc2012, I use the Semantic Boundaries Dataset and Benchmark plus the voc2012 training data for traning, about 10k images. And the error is high too.

I'm looking for the reason.

--
You received this message because you are subscribed to a topic in the Google Groups "Caffe Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/caffe-users/3eIMYV0OlY8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to caffe-users...@googlegroups.com.

To post to this group, send email to caffe...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/3f804e47-f960-4f92-a3fe-f0ad24164b77%40googlegroups.com.

Youssef Kashef

unread,

Aug 20, 2015, 9:49:02 AM8/20/15

to Caffe Users, youssef...@gmail.com

Hi Ben,

If you're not getting any decrease in loss for 10K iterations, I suggest you take a few steps back and:

Check the data and labels you're feeding into your network. Load your solver in python and run a single step, inspect the dimensions of your data and label blobs and display the images. You can also display the labels as images using pyplot's imshow().
Check the initial weights of your network. Are there zero weights where there shouldn't be any. My problem was that I had conv. layers with all-zero weights that stayed zero. The network was basically not learning anything. You can display the weights in python (e.g. print solver.net.params['fc6'][0].data). Please see my earlier post for details. The all-zero problem can also be verified if your network always produces all-zero predicions (e.g. print solver.net.blobs['score'].data).

Fatemeh Saleh

unread,

Aug 20, 2015, 9:38:50 PM8/20/15

to Caffe Users, youssef...@gmail.com

Hi,

I used SBD training samples as mentioned in the paper which is 8498 images and validation set of 736 images which is mentioned in the foot not of the paper. The loss in around 500K. It decreases after 10K but still is high with strong variations.

Steve Bengi

unread,

Aug 21, 2015, 2:44:57 AM8/21/15

to Fatemeh Saleh, Caffe Users, Youssef Kashef

It seems that we're facing similar problems, I'll update if any progress.

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/656eb3d9-c9cd-4f60-ae18-f2c444b3c990%40googlegroups.com.

Zizhao Zhang

unread,

Aug 21, 2015, 9:50:23 AM8/21/15

to Fatemeh Saleh, Caffe Users, Youssef Kashef

Hi Fatemeh,

You mentioned you may think you made wrong net surgery. Could you specific how you did? Or post the prototxt of model architecture of fully connected vgg16?

Thanks

You received this message because you are subscribed to a topic in the Google Groups "Caffe Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/caffe-users/3eIMYV0OlY8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to caffe-users...@googlegroups.com.

To post to this group, send email to caffe...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/42c3773d-a43f-4a91-a26d-d01948eebf79%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Best Regards,

Zizhao

zzz

unread,

Aug 24, 2015, 2:40:30 PM8/24/15

to Caffe Users, fateme...@gmail.com, youssef...@gmail.com

Hi,

Anyone has progresses of training FCN. How to solve the high loss issue?

Youssef Kashef

unread,

Aug 24, 2015, 2:58:11 PM8/24/15

to Caffe Users, fateme...@gmail.com, youssef...@gmail.com

How high of a loss is too high? The paper doesn't seem to mention much on loss values for the different datasets.

Is it correct to assume that the Euclidean loss generated is normalized and independent of the number of classes and image dimensions?

Zizhao Zhang

unread,

Aug 24, 2015, 3:48:01 PM8/24/15

to Youssef Kashef, Caffe Users, Fatemeh Saleh

I still have a loss around 10K and have a really large variation.

One difference is that for label lmab conversion, I use the type of numpy.uint8. There is only 60 categories, so the length is enough.

Does it influence?

I am going to use the already trained model FCN-32 provided by Evan to fine-tune and see the loss.

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/fdf39bec-6427-4153-824f-841636ebc547%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Best Regards,

Zizhao

eran paz

unread,

Aug 24, 2015, 4:22:39 PM8/24/15

to Caffe Users

I'm still getting all 0 output, I'm using the bilinear weight_filler, but it still doesn't work (also using group==num_output of previous layer).

BTW, I'm using my own dataset and training from scratch (not fine tuning), should I use steps for learning rate or fixed? ideas are welcomed...

Steve Bengi

unread,

Aug 24, 2015, 8:36:20 PM8/24/15

to Zizhao Zhang, Youssef Kashef, Caffe Users, Fatemeh Saleh

It's ok if the loss reachs 10k. Because the softmax loss is unnormalized. Hope it helps.

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/CACJLG3j%3DXCxxNBu_yshyrjqaQTHSFXZZ%3DqRbi2iVnbM75ZKRnw%40mail.gmail.com.

Zizhao Zhang

unread,

Aug 24, 2015, 8:50:57 PM8/24/15

to Steve Bengi, Youssef Kashef, Caffe Users, Fatemeh Saleh

Hi Steve,

Thanks for this information.

But in current situation, after about 90K iteration, the loss still oscillate a lot. I thought the loss could be large but should be goes down as iterating. Am I right?

--

Best Regards,

Zizhao

Fatemeh Saleh

unread,

Aug 24, 2015, 9:02:23 PM8/24/15

to Zizhao Zhang, Steve Bengi, Youssef Kashef, Caffe Users

Hi,

I have just plot the train loss. Although it is high and oscillate a lot but the diagram shows that it will decrease during the training process.

Untitled.png

Steve Bengi

unread,

Aug 24, 2015, 9:20:38 PM8/24/15

to Fatemeh Saleh, Zizhao Zhang, Youssef Kashef, Caffe Users

Hi, Fatemeh, the loss seem to decrease a log. What batch size are you using?

Message has been deleted

Etienne Perot

unread,

Sep 4, 2015, 11:39:20 AM9/4/15

to Caffe Users

Hi everyone!

something that worked for deconvolution layer, just set group number equal to class number

layer {
  name: "fc8-conv"
  type: "Convolution"
  bottom: "fc7-conv"
  top: "fc8-conv"
  convolution_param {
    num_output: num_of_classes
    kernel_size: 1
    weight_filler {
      type: "gaussian"
      std: 1.0
    }
    bias_filler {
      type: "constant"
      value: 0.0
    }
  }
}

layer { 
type: "Deconvolution"
name: 'upscore' 
bottom: 'fc8-conv' 
top: 'upscore'
param {
   lr_mult: 0
}
convolution_param {          
        kernel_size: 64 
        stride: 32
      pad: 16
 num_output: num_of_classes
  group: num_of_classes
        weight_filler{
  type: "constant"  
      value: 1
        }
 } 
}

also normalizing in softmaxwithLoss actually does not hurt at all, you can keep it, and i used a slightly smaller learning rate in my case that mentioned in the paper (1e-5 instead of 1e-3 for alexnet)...

also, i know it will probably sound obvious, but if like me, you are using opencv to read images from hard disk not set the transformer with raw_scale and channel swap :

#transformer init for preprocess pictures loaded with opencv cv2.imread(...)
shape=(1,3,imh,imw)
transformer = caffe.io.Transformer({'data': shape})
transformer.set_transpose('data', (2,0,1))
transformer.set_mean('data', np.array([100,109,113]))
#transformer.set_raw_scale('data',255) #this will do weird thing if you let it 
#transformer.set_channel_swap('data', (2,1,0)) # the reference model (caffenet) has channels in BGR order, so does opencv!no need for another swap

Finally i combined conv4 & pool5 using the deconvolution by a factor of 2, added the eltwise sum operation, and did a final deconvolution by a factor of 16 to get a bit finer results. It works on daimler but not on mscoco so far (no idea why...)

i got those results below for daimler dataset. i trained it for 10k iterations...probably it could be much better with VGG.

yu Magic

unread,

Sep 8, 2015, 11:29:43 PM9/8/15

to Caffe Users

Hello Youssef ,

I am a new to caffe .I can not understand.Since the author has provided a caffemodel in the http://dl.caffe.berkeleyvision.org/fcn-32s-pascalcontext.caffemodel,why should we be training the model? Recently I have been using python loading model,how to get the final result will be displayed?

在 2015年8月4日星期二 UTC+8下午6:34:06，Youssef Kashef写道：

vijay john

unread,

Sep 8, 2015, 11:59:40 PM9/8/15

to Caffe Users

Hi Etienne,

I am trying to get the FCN working on the Kitti dataset for the road segmentation and haven't been able to do so. I followed your suggestions and changed the train_val.prototxt, but I haven't managed to get it running. I am glad you managed to get the FCN working on the Daimler dataset. Could you kindly share the working train_val and solver prototxt, so I can try it on the kitti dataset.

Cheers,

Vijay

Youssef Kashef

unread,

Sep 9, 2015, 4:20:11 AM9/9/15

to Caffe Users

Hello Yu,

True, the model is shared for off-the-shelf use. In my case, I'm training the model to understand more about it, in case I want to train it on a different dataset, different method,..

re-displaying the final result:

The eval.py script shows how to load the input image, load the model, perform inference, apply argmax on the network predictions to produce the final 2d output.

You can treat the output matrix as an image and display it with:

import matplotlib.pyplot as plt; plt.imshow(out)

vijay john

unread,

Sep 9, 2015, 9:34:48 PM9/9/15

to Caffe Users

Hello everyone,

I managed to get the FCN working on the Kitti dataset. I changed the lr_mult from 0 to 1 in the upsample layer and got it working. I also followed the suggestions of Etienne and set the num_output and groups to the number of class.

layer {

name: "upsample-new"

type: "Deconvolution"

bottom: "score-fc7-new"

top: "bigscore"

param {

lr_mult: 1

}

convolution_param {

num_output: num_class

kernel_size: 63

stride: 32

pad: num_class

bias_term: false

}

Cheers,

Vijay

yu Magic

unread,

Sep 10, 2015, 11:18:31 AM9/10/15

to Caffe Users

Hello Youssef,

I am very pleased to receive your reply, I follow your guide to get the final result .

在 2015年9月9日星期三 UTC+8下午4:20:11，Youssef Kashef写道：

yu Magic

unread,

Sep 10, 2015, 11:35:12 AM9/10/15

to Caffe Users

Hello Youssef,

I still have some problems , I saw     train_val-prototxt document.In preparation for the training data set,how to convert ground turth to lable file?Also, I do not understand what is the meaning of solve.py? Retraining a new caffemodel for myself?

I look forward to your reply！Thank you for helping me!

在 2015年9月9日星期三 UTC+8下午4:20:11，Youssef Kashef写道：

Hello Yu,

Youssef Kashef

unread,

Sep 10, 2015, 1:27:58 PM9/10/15

to Caffe Users

Hello Yu,

Generating an lmdb for the labels: Your labels are matrices. You can treat them the same way you would treat your images and construct a separate lmdb for the labels.

Here's are python functions I used for generating an lmdb for the image data and another for label data. It's based on Evan Shellhammer's python snippet in this comment in PR#1698.

The solver file describes the parameters of how the network is trained. In this case it uses stochastic gradient descent and you define things like the file that defines your network, number of iterations, learning rate, how often to save snapshots of your network. More on this Caffe page: Caffe | Solver / Model Optimization.

Hope this helps.

Youssef

changi...@gmail.com

unread,

Sep 11, 2015, 7:10:32 AM9/11/15

to Caffe Users

Hi Youssef：
I am reproducing FCN-8s by fine-tuning VGG-16 on pascal context 59_label , loss is big and decreasing . When I use the default set from the authors, the occupation of memory in GPU is large, Memory required for data : 1074286756. the batch-size=1！！ I want th know if the same GPU memory needed when you doing this experiment ?

Youssef Kashef

unread,

Sep 11, 2015, 7:43:51 AM9/11/15

to Caffe Users

Hi changingivan,

I've only tried FCN-32s and get Memory required for data: 976695652, with batch size 1 as well.

Are you worried about having enough GPU RAM? If you're running out of GPU memory the authors recommend PR #2016 for reducing memory usage.

Do you get the same number when you load FCN-8s available on Model Zoo?

changi...@gmail.com

unread,

Sep 11, 2015, 8:26:20 AM9/11/15

to Caffe Users

Hi Youssef,
sorry for my first using google group, next time I will directly reply here , not by gmail:

*****************************************************************

My Titan X has 12G memory, so it's enough to cover this
training. Since your Memory requird data also big, I feel relieved.
The different in the Memory requied data between your and mine may
come from "FCN-32s" and "FCN-8s" , I just begin learning FCN, and will
study it's inner structure later. Besides, My fine-tuning is still
runing (30000 / 80000 ), I will tell your my test result as soon as
the training stop.

在 2015年9月11日星期五 UTC+8下午7:43:51，Youssef Kashef写道：

Saeed Izadi

unread,

Sep 13, 2015, 2:06:05 PM9/13/15

to Caffe Users

Hi guys,

I'm trying to load the pre-trained model of FCN and add up a Spatial pyramid on top of the last convolutional layer. In particular, I want to feed bounding boxes in different sizes to the network and get the last convolutional features of a pre-trained model on FCN. Have it implemented before in Caffe?

any help would be appreciated.

duchen...@gmail.com

unread,

Sep 16, 2015, 8:35:06 AM9/16/15

to Caffe Users, youssef...@gmail.com

hello youssef,

I use PASCAL VOC2007 to finetune the model(FCN-32s PASCAL-Context).However,it can't converge.The lmdb generation script is built around Evan Shellhammer's python snippet in this comment in PR#1698.And I follow the instruction https://gist.github.com/shelhamer/80667189b218ad570e82#file-train_val-prototxt.

Learning Rate Policy: fixed

I0916 16:19:31.477613 13724 solver.cpp:214] Iteration 0, loss = 97387.2

I0916 16:19:31.478613 13724 solver.cpp:229] Train net output #0: loss = 97387.2 (* 1 = 97387.2 loss)

I0916 16:19:31.478613 13724 solver.cpp:486] Iteration 0, lr = 1e-010

I0916 16:19:48.455585 13724 solver.cpp:214] Iteration 20, loss = 124957

I0916 16:19:48.455585 13724 solver.cpp:229] Train net output #0: loss = 129965 (* 1 = 129965 loss)

I0916 16:19:48.455585 13724 solver.cpp:486] Iteration 20, lr = 1e-010

I0916 16:20:05.811578 13724 solver.cpp:214] Iteration 40, loss = 128501

I0916 16:20:05.811578 13724 solver.cpp:229] Train net output #0: loss = 129965 (* 1 = 129965 loss)

I0916 16:20:05.812577 13724 solver.cpp:486] Iteration 40, lr = 1e-010

I0916 16:20:22.865552 13724 solver.cpp:214] Iteration 60, loss = 125512

I0916 16:20:22.866552 13724 solver.cpp:229] Train net output #0: loss = 129965 (* 1 = 129965 loss)

I0916 16:20:22.866552 13724 solver.cpp:486] Iteration 60, lr = 1e-010

I0916 16:20:39.818522 13724 solver.cpp:214] Iteration 80, loss = 124235

I0916 16:20:39.818522 13724 solver.cpp:229] Train net output #0: loss = 129965 (* 1 = 129965 loss)

I0916 16:20:39.819522 13724 solver.cpp:486] Iteration 80, lr = 1e-010

I0916 16:20:56.028450 13724 solver.cpp:214] Iteration 100, loss = 116807

I0916 16:20:56.028450 13724 solver.cpp:229] Train net output #0: loss = 129965 (* 1 = 129965 loss)

After 10K interation,the loss is still 120K. Can you give me some advice?

Thanks.

Chenting Du

在 2015年8月20日星期四 UTC+8下午9:49:02，Youssef Kashef写道：

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/ed48deaa-8285-433c-b477-ae61115895b9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Youssef Kashef

unread,

Sep 16, 2015, 8:59:18 AM9/16/15

to Caffe Users, youssef...@gmail.com

Hello Chenting,

Are you using a pre-trained FCN32s model to train on the PASCAL VOC2007? Did you change the num_output parameter of score59 and upscore layers from 60 to the number of classes in VOC 2007? Is it 20 classes?

Have you checked that the initial weights are non-zero?

What happens when you train for 10K more iterations.

If the network's output is all-zero, that's another indicator that something's wrong with the weights, and that the network is not learning anything.

Youssef

duchen...@gmail.com

unread,

Sep 17, 2015, 12:52:35 AM9/17/15

to Caffe Users, youssef...@gmail.com

hello youssef,

I use the pre-trained FCN32s model(FCN-32s PASCAL-Context) downlaoded from the Model Zoo.I finetune the model without adding any layers.The number of classes is 2, including class-agnostic object and background.I have tested successfully by using the pre-trained model. Are the initial weights you mentioned in the pre-trained model? I have trained for 150K iteration. The loss is still around 120k. I think the network is not learning anything. However,I can't find where I am wrong.

By the way, the caffe I used is in the Windows and CUDA 6.5.

Chenting Du

在 2015年9月16日星期三 UTC+8下午8:59:18，Youssef Kashef写道：

Youssef Kashef

unread,

Sep 17, 2015, 9:22:28 AM9/17/15

to Caffe Users, youssef...@gmail.com

Hello Chenting,

Didn't know caffe runs on windows, not bad.

120K loss after 150K iterations means something is wrong.

The model you're using to initialize the weights produces a 60-way classification output. Yes, the initial weights are in the .caffemodel file you downloaded from model zoo.

How are you using this network for binary classification without modifying the network?

Youssef

duchen...@gmail.com

unread,

Sep 17, 2015, 9:55:59 AM9/17/15

to Caffe Users, youssef...@gmail.com

hello youssef,

I finetune the model followed by the instruction https://gist.github.com/shelhamer/80667189b218ad570e82#file-train_val-prototxt. I just change the number of last convolutional layers. Do you mean this model can't be finetuned for binary classification?

I have checked the weights in the pre-trained model and the weights of the last two convolutional layers are nearly 0.0001. Is that right?

Thank you very much!

Chenting Du

在 2015年9月17日星期四 UTC+8下午9:22:28，Youssef Kashef写道：

Youssef Kashef

unread,

Sep 17, 2015, 10:14:22 AM9/17/15

to Caffe Users, youssef...@gmail.com

Helo Chenting,

That's what I meant earliier. The layers "score59" and "upscore" need to be changed by setting their num_outputs to 2 instead of 60. So it's good that you're already doing this.

That they're not zero is also a good sign. But I don't think it makes sense to use the weights from the pre-trained model to initialize those last two layers. It makes sense to do this for all the other layers but not those last two.

For "score59" I recommend using a random weight initialization, like so:

convolution_param {

num_output: 2

kernel_size: 1

engine: CAFFE
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}

}

Actually not 100% sure about using constant for the bias_filler. It works for the simple LeNet model and I don't have an alternative at the moment.

For the "upscore" layers, I recommend using the author's initialization via bilinear interpolation in solve.py. The script contains python functions to perform the intiialization and you usage is described in lines 43-44. You can ignore the steps before and after, since those have to do with training the FCN-32s by finetuning VGG-16-fcn. You're already passed that step, since you're working on the pretrained FCN-32s.

Hope this helps,

Youssef

Tianxiang Pan

unread,

Sep 18, 2015, 10:56:12 PM9/18/15

to Caffe Users

Has this issue solved? I alse have 5105 images with 59-category annotations, how to train the network with the 10000+ 400-category-label images.

在 2015年8月19日星期三 UTC+8上午1:47:12，zzz写道：

Hi Youssef,

I am kind of confused. You said you split train/val on 59-category segmentation results (totally 5105 labeled images) and you also said use 5105 as validation data. I want to make sure you are not using PASCAL full labeled training data with 400+ categories and 10000+ image right? I think you split this 5105 images as train/val right?
Thanks for your help !

Zizhao

On Tuesday, August 18, 2015 at 12:05:04 PM UTC-4, Youssef Kashef wrote:
Hello Zizhao,

I sort of guessed. My train/val split is based on the 59-category segmentation results reported on the PASCAL-Context webpage. When you scroll down to the "Project Specific Downloads" section, you can download segmentation results generated by Motthagi et al.'s CVPR paper from 2014. The generated segmentations for 5105 images. Those are the ones I grouped into the validation set. I don't know what their splitting strategy was, but curious to learn what it is.

Here's a link to a text file with those 5105 image names (excluding extension).

Youssef

On Monday, August 17, 2015 at 11:32:01 PM UTC+2, zzz wrote:
Hi Youssef,

Thanks for your details.
For PASCAL-Context database, there are 5105 training images. May I ask how do your split the train/val data?
Thanks in advance for helping!

Zizhao

Toru Hironaka

unread,

Sep 28, 2015, 5:03:32 PM9/28/15

to Caffe Users

Hi, Youssef

I download your read_img.py and used read_img_cv2 function to convert from ground truth images into lmdb database because PR#1698 always got me "ValueError: Incorrect array shape." error due to 2-d image. After the conversion, I ran solve.py and get below error.

"F0928 16:00:07.428268 19769 softmax_loss_layer.cpp:42] Check failed: outer_num_ * inner_num_ == bottom[1]->count() (187500 vs. 421500) Number of labels must match number of predictions; e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be N*H*W, with integer values in {0, 1, ..., C-1}."

Possible problems:

1. I download the original image files from VOC2010 and convert them into lmdb by using convert_imageset. I used your read_img_cv2 function to convert my ground truth images into lmdb.

2. I did not crop or resize images in one size

I have a couple of questions for you.

1. Do I have to use the same train.txt and val.txt data lists for creating images and ground truth dataset?

2. Caffe's train.txt and val.txt files usually have file names with their label information like this: <image_file_name.png> cat, lion, ... When I looked at your val_59.txt file. It did no have labels after each file. The file contains only file names so I assumes that FCN did not need labeled train.txt and val.txt. Or, do I have to create my own labeled train.txt and val.txt?

Youssef Kashef

unread,

Sep 29, 2015, 7:29:53 AM9/29/15

to Caffe Users

Hello Toru,

I think the problem you're having with converting ground truth images into lmdb is that the VOC2010 ground truth are RGB images. If you pass them to the read_img_cv2 function it will store them as multi-channel matrices. Caffe does not support multi-channel labels. This is different from multi-dim labels. You need to convert your ground truth images into 1xHxW images. One way to do this is to load them with OpenCV and converting them to grayscale and cast pixels to integers.

Regarding your last two questions:

1. Do I have to use the same train.txt and val.txt data lists for creating images and ground truth dataset?

I highly recommend it so you can compare with other work that use the same splitting.

2. Caffe's train.txt and val.txt files usually have file names with their label information like this: <image_file_name.png> cat, lion, ... When I looked at your val_59.txt file. It did no have labels after each file. The file contains only file names so I assumes that FCN did not need labeled train.txt and val.txt. Or, do I have to create my own labeled train.txt and val.txt?

I use the val_59.txt file to select which images go into to the validation set. If an image is not in that list it is assigned to the train test. This is specific to the PASCAL-Context dataset where the ground truth images are saved as .mat files. Because this is for solving a pixel segmentation task, there's no single label per image but a whole label file (the .mat file) for each image.

Youssef

Toru Hironaka

unread,

Sep 29, 2015, 10:00:55 AM9/29/15

to Caffe Users

Hi, Youssef

Thanks for your answer!! I appreciate it. I have one more question. I am learning your codes from your caffe_sandbox. I could not find python module called "import fileSystemUtils as fs". Is this your own python module or can I download it somewhere? This module looks like a java module. Please give me the link for fileSystemUtils module.

Thanks,

Toru

Youssef Kashef

unread,

Sep 29, 2015, 10:10:18 AM9/29/15

to Caffe Users

Hi Toru,

Here's the link to fileSystem.py. It's a python module I wrote for utility functions I use now and then for traversing directories and such. I might add it into the sandbox repo directly but I assumed that people would use the functions and not the scripts.

Youssef

Toru Hironaka

unread,

Sep 30, 2015, 8:28:01 PM9/30/15

to Caffe Users

Hi, Youssef

Thanks! I run your scripts and they successfully generated the train and validation datasets. I start fine tuning FCN with the datasets. You earlier told me this: "Caffe does not support multi-channel labels" so I can not training with multi-channel labels because of multi-label but not multi-channel or multi-spectrum images. Am I correct? I have 6 or 8 different spectrum images to train but I do not know how. Do you know any information about multi-spectrum image training?

Thanks,

Toru

Youssef Kashef

unread,

Oct 1, 2015, 7:19:37 AM10/1/15

to Caffe Users

Hi Toru,

I'm afraid I don't know what multi-spectrum images are. Is it the same as multi-channel?

You have N images, each of height H, width W. Each of the H*W pixels is a vector of length K. K is the number of channels in the image. K=1 for a grayscale image or any 2d matrix, for RGB images K=3.

Eventually you store each data point as a 1xKxHxW matrix into your db. During optimization a batch of N such matrices are loaded into a blob. The blob is of size NxKxHxW, where N is the batch size.

Labels can be stored in a db as simple scalars into the label member of a Datum object or you can store them as matrices in a separate db. If you have non-scalar labels, like segmentation ground truth, you're better off storing them in a db separate from the images.

What I think causes confusion for some:

For classification the network will produce a vector of length L. L is the number of classes. People who know that the loss involves an element-wise vector operation think that they need to provide the labels as a vector of size L. They think that this is what the multi-dim label support is for, but that's not the case. A scalar label (e.g. 0-9 for MNIST digitst) will be converted by the loss layer to a vector where the scalar is treated as the index of the non-zero elements in a vector of size L. Its the network definition that defines how long the vector is. So class 5 just means the the 5th element of the vector is non-zero.

In the case of semantic segmentation, each element in the ground truth matrix has a scalar value denoting a class. The per-element loss is unfolding the ground truth value for that element into a vector of size L filled with zeros, and inserting the non-zero value at the index according the scalar ground truth for that pixel.

Youssef

Toru Hironaka

unread,

Oct 1, 2015, 10:58:28 AM10/1/15

to Caffe Users

Hi, Youssef

Thanks for your help, I meant multi-Channel. I have been training medical images, which consists of different energy levels (this means different channels) of CT images. All these images extracted from the same positions from different energy levels. I have been training them one energy by one. I am now try to train them together but I have not found a way to do it. According to your comments, I can train multi-channel images. I will try it.

I think I need to learn more about caffe datasets and caffe layer structure. I need to fully understand how to design or build caffe layer because I have various images sizes for training. I have been resizing the images for CaffeNet or AlexNet training or fine tuning due to fixed image sizes. Do you have any suggestion about learning material for caffe? I need to go another level.

Toru

Youssef Kashef

unread,

Oct 1, 2015, 11:19:11 AM10/1/15

to Caffe Users

Hi Toru,

If you stack the energy levels into a multi-channel image and train from that, the filters of your first conv. layer are going to learn how to combine these channels to minimize the loss. Maybe this makes more sense than training a CNN on individual layers. Note that the number of channels can be as large as necessary. The input of the deeper conv. layers are mult-channel images as well, where each channel represents a feature map of the earlier conv. layer.

If you have variable-sized input and fixed-sized output. Do you even want your output to be of a fixed size or proportional to the input? If you need a fixed output, you can either warp your input image to a fixed size or try more flexible pooling like they do with SPP-Net. More info on global pooling with caffe here.

As for ramping up on caffe. I've found the tutorials to give a good overview but you find out about the details by trying things out, getting stuck, then getting help from places like here. Every now and then, people will contribute with an example or to the documentation for others to find the solution faster.

...

changi...@gmail.com

unread,

Oct 5, 2015, 12:42:03 AM10/5/15

to Caffe Users

Hi Youssef:

finally, I got the result of fine-tuning vgg-16 to fcn-8s and fcn-32s , here is the evaluation result :

for vgg16_2_fcn8:

for vgg16_2_fcn32:

在 2015年9月11日星期五 UTC+8下午7:43:51，Youssef Kashef写道：

Hi changingivan,

...

Auto Generated Inline Image 1

Auto Generated Inline Image 2

Message has been deleted

Jiri Tezky

unread,

Oct 5, 2015, 5:18:05 PM10/5/15

to Caffe Users

Hi,
is there anyone who tried create lmdb files in Matlab? I try somethink like this:

databaseRGB = lmdb.DB(lmdbRGBpath,'MAPSIZE', 5048576000, 'NOLOCK', 'true');
databaseGT = lmdb.DB(lmdbGTPath,'MAPSIZE', 5048576000, 'NOLOCK', 'true');

%RGB
imRGB = imread(fileRGBpath);

imRGB = imRGB(:, :, [3, 2, 1]);  % permute channels from RGB to BGR
imRGB = permute(imRGB, [2, 1, 3]);  % flip width and height

%datumRGB = caffe_proto_('toEncodedDatum', imRGB, 0);
databaseRGB.put(key, imRGB);

%GT
[imGT,  ~] = imread(fileGTpath);

imGT = permute(imGT, [2, 1]);  % flip width and height
imGT = reshape(imGT,[1 size(imGT)]); %add singleton dimension
%datumMASK = caffe_proto_('toEncodedDatum', imGT, 0);
databaseGT.put(keyGT, imGT);

but I'm not sure if this is correct.
I also try python script :

def imgs_to_lmdb_GT(paths_src, path_dst):
    in_db = lmdb.open(path_dst, map_size=int(1e12))
    with in_db.begin(write=True) as in_txn:
        for in_idx, in_ in enumerate(paths_src):
            # load image:
            # - as np.uint8 {0, ..., 255}
            # - in BGR (switch from RGB)
            # - in Channel x Height x Width order (switch from H x W x C)
            im = np.array(Image.open(in_)) # or load whatever ndarray you need
            print im.shape
            
            im = im [None, :]
            
            im_dat = caffe.io.array_to_datum(im)
            in_txn.put('{:0>10d}'.format(in_idx), im_dat.SerializeToString())
    in_db.close()   
    return 0
    
def imgs_to_lmdb(paths_src, path_dst):
    in_db = lmdb.open(path_dst, map_size=int(1e12))
    with in_db.begin(write=True) as in_txn:
        for in_idx, in_ in enumerate(paths_src):
            # load image:
            # - as np.uint8 {0, ..., 255}
            # - in BGR (switch from RGB)
            # - in Channel x Height x Width order (switch from H x W x C)
            im = np.array(Image.open(in_)) # or load whatever ndarray you need
            im = im[:,:,::-1]
            im = im.transpose((2,0,1))
            im_dat = caffe.io.array_to_datum(im)
            in_txn.put('{:0>10d}'.format(in_idx), im_dat.SerializeToString())
    in_db.close()   
    return 0

But during net training the loss is huge and constant, I suspect that is because lmdb is created in wrong way.
I have RGB images and one chanel ground truth images.

Toru Hironaka

unread,

Oct 8, 2015, 4:48:19 PM10/8/15

to Caffe Users

Hi, Youssef

Thanks for your advice, you said, "Note that the number of channels can be as large as necessary." so I can train with multi-channel images with data conversion on PR#1698. However, PR#1698 does not seem to set label values during image to lmdb data conversion (Now, I am not talking about Ground Truth Images of FCN here, just multi-channel images). For example, can I convert multi channel images into lmdb with the code on PR#1698 and train with CaffeNet model by running caffe train command or do I have to write my own python or C++ training codes to training my multi-channel image?

I think I have to add label information for each image class.

im_dat = caffe.io.array_to_datum(im)

im_dat.label=int(labelNum)

But, I still don't know how to stack my multi-channel images. I am now looking at code in https://github.com/BVLC/caffe/blob/master/src/caffe/util/io.cpp. There is a function called "CVMatToDatum" has cv_img.depth() <-- should I set my channel number here. I think datum->set_channels(cv_img.channels()); is for RGB.

Toru

...

zhang wang

unread,

Oct 12, 2015, 8:15:44 AM10/12/15

to Caffe Users

I have the same result, all predicts are zeros. And my 'upsample' params are copied from fcn-****-pascal.caffemodel.

I'm having some trouble with the label matrix.
I've created an image with 0 as background and 1...K marking pixels belonging to each class.
I've created the lmdb according to PR#1698 for both images and labels.
when I run the net I get this error:
Check failed: outer_num_ * inner_num_ == bottom[1]->count() (187500 vs. 562500) Number of labels must match number of predictions; e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be N*H*W, with integer values in {0, 1, ..., C-1}.

As far as I can tell, the problem is that my labels are saved with 3 channels and not 1, but I couldn't figure out how to save them with 1 channel.

Any help would be appreciated

THX

On Monday, June 8, 2015 at 5:00:23 AM UTC+3, Kien Nguyen Thanh wrote:

Hi all,

Is there anyone managing to run semantic segmentation FCN models on the future branch of Caffe? I have been around with the previous version of Caffe sometime but now having trouble installing and running, testing the model provided in the Model Zoo.

1) When installing using the same procedure as previous, the making commands (make pycaffe, make all and make test) return errors.
2) How to prepare image data for segmentation. Are we using the same python script "classify.py" to segment the probe images?

I appreciate any ideas. Thanks in advance.

zhang wang

unread,

Oct 13, 2015, 3:22:03 AM10/13/15

to Caffe Users, fateme...@gmail.com, youssef...@gmail.com

I think the normalized loss will be more helpful. In the dataset, the size of each sample is different, the loss will be very different.

I am training by using VOC2012 segmentation dataset.

iter loss

0 3.044523

6,500 2.790600

15,000 2.470375

... ...

37,000 1.728426

40,000 1.645899

45,000 1.508255

...

In the layer 'SoftmaxWithLoss', loss = -log(x). x is one predict value in the blob'score'

在 2015年8月25日星期二 UTC+8上午2:58:11，Youssef Kashef写道：

How high of a loss is too high? The paper doesn't seem to mention much on loss values for the different datasets.
Is it correct to assume that the Euclidean loss generated is normalized and independent of the number of classes and image dimensions?

On Monday, August 24, 2015 at 8:40:30 PM UTC+2, zzz wrote:
Hi,

Anyone has progresses of training FCN. How to solve the high loss issue?

On Friday, August 21, 2015 at 2:44:57 AM UTC-4, Ben wrote:
It seems that we're facing similar problems, I'll update if any progress.

On Fri, Aug 21, 2015 at 9:38 AM, Fatemeh Saleh <fateme...@gmail.com> wrote:
Hi,
I used SBD training samples as mentioned in the paper which is 8498 images and validation set of 736 images which is mentioned in the foot not of the paper. The loss in around 500K. It decreases after 10K but still is high with strong variations.

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/ed48deaa-8285-433c-b477-ae61115895b9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Caffe Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/caffe-users/3eIMYV0OlY8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit

...

rm

unread,

Oct 30, 2015, 1:06:34 PM10/30/15

to Caffe Users

My dataset has 2 classes; with 1000 training images of (5,256,256) also corresponding ground truth data (1,256,256) which is a binary image either 0 or 1 to represent the 2 classes.

When training in solve.py you use the existing caffemodel which I assume is 3-channel ; but as I want to implement in on my 5 channel dataset can I use the same model provided ?

Toru Hironaka

unread,

Nov 2, 2015, 5:23:26 PM11/2/15

to Caffe Users

Hi, Youssef

It's been for awhile. I have successfully run FCN but loss is too high, which you guys have already mentioned earlier in this thread. Did you find out what the high loss problem?

Toru

...

Zizhao Zhang

unread,

Nov 2, 2015, 7:28:34 PM11/2/15

to Toru Hironaka, Caffe Users

Because the original code does not normalize the loss. It is fine if you see the loss keeps decreasing:)

--

You received this message because you are subscribed to a topic in the Google Groups "Caffe Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/caffe-users/3eIMYV0OlY8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/0782508c-3d77-4165-ab01-8d8fa84451b0%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Best Regards,

Zizhao

Toru Hironaka

unread,

Nov 9, 2015, 12:48:02 PM11/9/15

to Caffe Users

Hi, Youssef

I have a question about stacking the energy level. Do I have to stack the different energy level images while I am converting from image files into LMDB? or should I adding different energy level images in train_val.prototxt files?

Toru

...

Youssef Kashef

unread,

Nov 10, 2015, 4:11:09 AM11/10/15

to Caffe Users

Hello Toru,

Two options for this:

In python while generating the lmdb. Stack all energy levels for a single sample as different channels of the blob. Of course HxW have to be consistent throughout all channels.
In python generate an lmdb for each energy level. In your network definition, you'll need a data layer to load each energy level and a Concatenation layer to combine them into a blob before proceeding deeper into the network. Example definition of a Concat layer:

layer {
  name: "concat"
  bottom: "in1"
  bottom: "in2"
  top: "out"
  type: "Concat"
  concat_param {
    axis: 1
  }
}

If the number of energy levels is very large, I'd recommend option 1. Option 2 is useful if you're very selective on which energy levels to use. If you want to experiment with excluding some.

Youssef

...

Toru Hironaka

unread,

Nov 16, 2015, 12:20:48 PM11/16/15

to Caffe Users

Hi, Youssef

I only know how to convert image files into lmdb by using caffe.io.array_to_datum. I referred images to lmdb python script from the example on https://github.com/BVLC/caffe/issues/1698#issuecomment-70211045. Below is my code:

in_db = lmdb.open(outputDir, map_size=int(1e12))

with in_db.begin(write=True) as in_txn:

for in_idx, in_ in enumerate(fileLists):

# load image:

# - as np.uint8 {0, ..., 255}

# - in BGR (switch from RGB)

# - in Channel x Height x Width order (switch from H x W x C)

# filePath=join(in_idx, in_)

# split input train or test file between filename or class number

fileName = in_.split(" ", 1)

print fileName[0]

print fileName[1]

filePath = join(dataDir, fileName[0])

print filePath

im = np.array(Image.open(filePath)) # or load whatever ndarray you need

im = im[:,:,::-1]

im = im.transpose((2,0,1))

im_dat = caffe.io.array_to_datum(im)

# set class number

im_dat.label = int(fileName[1].rstrip())

in_txn.put('{:0>10d}'.format(in_idx), im_dat.SerializeToString(), im_dat.label)

in_db.close()

I successfully trained my caffe model with this code's lmdb dataset. Do you mean that "Stack all energy levels for a single sample as different channels of the blob" is increase channel number and add image into single lmdb? if so, should I use array_to_blobproto instead of array_to_datum function to stack each channel? Because array_to_datum only works with 3D array but array_to_blobproto works with ND-array. Am I correct?

Toru

...

Yuchen Yuan

unread,

Dec 6, 2015, 8:15:03 PM12/6/15

to Caffe Users, youssef...@gmail.com

Hello Youssef,

I've been trying to finetune a custom binary-classification FCN-8s network, and I followed all of the steps you mentioned earlier, i.e. (1) generate the lmdbs for train and val set by your to_lmdb.py code (the ground truth files are stored in .mat files); (2) modify train_val.prototxt in which the output class are changed from 60 to 2 (binary), and the corresponding layer names are changed as well; (3) modify solver.prototxt, with base_lr=1e-10 and momentum=0.99; (4) use solve.py to initialize deconv layers; (5) start finetuning.

However, after 60K iterations, the training loss still fluctuates wildly, ranging from ~40K to ~220K. I remember that you have similar problem in your earlier post, have you solved the issues and get your FCN successfully finetuned? If so, I'd like to know what the "correct" training loss pattern should look like? As far as I know the training loss should decrease gradually and converge to a small number, instead of fluctuating without showing any clear pattern.

Thank you very much,

Yuchen

在 2015年8月20日星期四 UTC+10下午11:49:02，Youssef Kashef写道：

Hello everyone,

I've been trying to train FCN-32s Fully Convolutional Semantic Segmentation on PASCAL-Context but keep getting very high loss values regardless of how many iterations:"Train net output #0: loss = 767455 (* 1 = 767455 loss)". Sometimes it would go as low as 440K but then it'll just jump back up to something higher and oscillate.
Ignoring the high and letting it go through 80K iterations, I still end up with a network that produces all zero output.
I can't tell what's throwing it off like that.

My procedure in detail:

Follow instructions in future.sh from longjon:future, except that I apply the PR merges to BVLC:master instead of longjon:master. Building off of longjon:future results in cuDNN build errors like here. Applying some of the PR merges to BVLC:master is redundant since they've already been merged to the master branch.
Build Caffe with cuda 6.5, cuDNN. I've tried a CPU only build and got the same high-loss behavior, so I don't think it's related to GPU or the driver (then again, I only let it run for 5K iterations).
Generate LMDB for the PASCAL-Context database. The lmdb generation script is built around Evan Shellhammer's python snippet in this comment in PR#1698.
The images are stored as numpy.uin8 with shape C x H x W, with C=3
The ground truth is stored as numpy.int64 with shape C x H x W, with C=1
The order in which the images are stored is the same as the ground truth. One lmdb for the images and one for the labels. I have two pairs of each to reflect the train/val split.
Use net surgery to turn VGG-16 into a fully convolutional model. For this I pretty much followed the net surgery example and used the layer definitions from the FCN-32s' trainval.prototxt in Model Zoo.

Not sure I did this one right though. The output I get for the cat image is still a single element 2D matrix.

I've tried using VGG-16 fcn weights from HyeonwooNoh/DeconvNet/model/get_model.sh but still getting the same behavior.
How can I verify my fully convolutional VGG-16 better?

Apply the solve.py step for initialzing the deconv. parameters. According to shelhammer''s post here, not doing this, could leave things stuck at zero.
What's a good way of verifying the initialization is correct? I'm worried that the problem is there.
solver.protoxt and trainval.prototxt are identical to those shared on Model Zoo. They only differ in the paths to the lmdbs.
I start the training, I start getting "Train net output #0: loss = 767455 (* 1 = 767455 loss)" sometimes it will go down by several 100K, but I never see values < 10.0 that I've seen some report.

I could really use some help in figuring out what I'm doing wrong and understanding why the loss is so high. It seems that people have figured out how to train these fcn's without the need of a detailed guide, so it seems I'm missing a critical step somewhere.

Thank you

On Wednesday, July 8, 2015 at 5:53:34 AM UTC+2, Gavin Hackeling wrote:
Yes, the problem appears to be that your labels have three channels instead of one channel. Assuming that your image has the shape (C, H, W) and that the channel containing your integer class labels is c, you can index that channel using "img = img[c, :, :]".

On Sunday, July 5, 2015 at 3:01:06 AM UTC-4, eran paz wrote:
Hi
Were you able to run the network?
I'm having some trouble with the label matrix.
I've created an image with 0 as background and 1...K marking pixels belonging to each class.
I've created the lmdb accordin

...

Yuchen Yuan

unread,

Dec 6, 2015, 8:28:24 PM12/6/15

to Caffe Users, youssef...@gmail.com

BTW, I finefuned from fcn-8s-pascalcontext.caffemodel. Since the fc6 and fc7 layers in this model are already changed to conv layers, I didn't conduct the net surgery step in your earliest post.

...

Youssef Kashef

unread,

Dec 23, 2015, 10:33:03 AM12/23/15

to Caffe Users

Hello Toru,

true, caffe.io.array_to_datum saves 3-dim arrays to lmdb where each is CxHxW. In your case C could be the number of energy levels you have. An energy level would be analogous to a color channel of the image.

Or am I missing something?

Youssef

...

Youssef Kashef

unread,

Dec 23, 2015, 10:45:09 AM12/23/15

to Caffe Users, youssef...@gmail.com

Hello Yuchen,

I continued seeing strong fluctuations from one iteration to the other but could still make out a gradual decrease. The strong fluctuations are a result of the very small batch size.

If your fluctuations are just too strong with no indication of loss declining, I suggest checking the conv weights for all-zero filters.

I did not have to change the learning rate to get a decline. If you're doing this on a different dataset than PASCAL-Context and a different task, maybe it's running a training process with a smaller learning rate. Is it a large dataset?

Youssef

...

Toru Hironaka

unread,

Dec 23, 2015, 11:19:53 AM12/23/15

to Caffe Users

Hi, Youssef

Thanks for your reply, I am currently working on 3D Convolution and Pooling now. I have been training with my 3D images(width,length,depth) but the result is not good yet. I will try to do with multi-channel soon.

Toru

...

Jianyu Lin

unread,

Feb 4, 2016, 1:45:13 PM2/4/16

to Caffe Users

Hi all, I now try to export the prediction from the FCN using C++ for my project. But I end up with some very strange images. Could anyone help me with this please?

In my problem I tried to build my own cnn and there are only two classes to segment (0 and 1)

Here is my code to read the output of the forward pass:

////////////////////////////////////////////////////////////////

const vector<Blob<float>*>& result = caffe_net.Forward(bottom_vec, &iter_loss); // forward pass

const float* result_vec = result[0]->cpu_data();

// generate prediction from the output vector and store it in Mat

cv::Mat srcC = cv::Mat::zeros(cv::Size(512,384), CV_32FC1);

int nl= srcC.rows; //row number, height

int nc= srcC.cols; //col number, width

for (int j=0; j<nl; j++) {

float* data= srcC.ptr<float>(j);

for (int i=0; i<nc; i++) {

if (result_vec[i+j*nc+datum.height()*datum.width()] > result_vec[i+j*nc]);

// compare the value from different class and generate the prediction

data[i] = 255;

}

//////////////////////////////////////////////////////////////////

The output of CNN is in format of const float*, but I don't know whether the data is arranged in c*h*w or w*h*c order, and due to the wired output Mat I got, I think maybe I did it in the wrong way. Can somebody help with this?

Thank you very much.

Jianyu

Gonçalo Cruz

unread,

Feb 18, 2016, 9:25:12 AM2/18/16

to Caffe Users

Hello everyone,

I have been trying to get a FCN working and I have read Youssef's instructions on a previous instructions but now I am stuck at building the lmdb (step 3). I would like to train the network with the Pascal Context dataset.

I have looked into PR#1698 and also into Toru's script.

Probably, I am missing something but for me is not clear which script and/or what paths to include in the scripts.

Can you please clarify the usage?

Thanks in advance.

Best regards,

Gonçalo Cruz

On Thursday, February 4, 2016 at 6:45:13 PM UTC, Jianyu Lin wrote:

Hi all, I now try to export the prediction from the FCN using C++ for my project. But I end up with some very strange images. Could anyone help me with this please?

In my problem I tried to build my own cnn and there are only two classes to segment (0 and 1)

Here is my code to read the output of the forward pass:
////////////////////////////////////////////////////////////////
const vector<Blob<float>*>& result = caffe_net.Forward(bottom_vec, &iter_loss); // forward pass
const float* result_vec = result[0]->cpu_data();

// generate prediction from the output vector and store it in Mat
cv::Mat srcC = cv::Mat::zeros(cv::Size(512,384), CV_32FC1);
int nl= srcC.rows; //row number, height
int nc= srcC.cols; //col number, width

for (int j=0; j<nl; j++) {
float* data= srcC.ptr<float>(j);
for (int i=0; i<nc; i++) {
if (result_vec[i+j*nc+datum.height()*datum.width()] > result_vec[i+j*nc]);
// compare the value from different class and generate the prediction
data[i] = 255;
}

...

Youssef Kashef

unread,

Feb 18, 2016, 9:57:17 AM2/18/16

to Caffe Users

Hello Gonçalo,

I wrote some python code that follows the instructions from the PR#1698 comment. It includes a wrapper function specifically for PASCAL Context.

You can find them here. To run them you need to download the repo and to follow some brief installation instructions.

You basically tell it where to find the PASCAL Context dataset files and a text file that lists which samples go into the validation set.

You end up with 4 lmdb files (1 x training images + 1x training ground truth + 1 x validation images + 1x validation ground truth)

Hope this helps,

Youssef

On Thursday, February 18, 2016 at 3:25:12 PM UTC+1, Gonçalo Cruz wrote:

Hello everyone,

I have been trying to get a FCN working and I have read Youssef's instructions on a previous instructions but now I am stuck at building the lmdb (step 3). I would like to train the network with the Pascal Context dataset.

I have looked into PR#1698 and also into Toru's script.

Probably, I am missing something but for me is not clear which script and/or what paths to include in the scripts.

Can you please clarify the usage?
Thanks in advance.

Best regards,
Gonçalo Cruz

On Thursday, February 4, 2016 at 6:45:13 PM UTC, Jianyu Lin wrote:
Hi all, I now try to export the prediction from the FCN using C++ for my project. But I end up with some very strange images. Could anyone help me with this please?

In my problem I tried to build my own cnn and there are only two classes to segment (0 and 1)

Here is my code to read the output of the forward pass:
////////////////////////////////////////////////////////////////
const vector<Blob<float>*>& result = caffe_net.Forward(bottom_vec, &iter_loss); // forward pass
const float* result_vec = result[0]->cpu_data();

// generate prediction from the output vector and store it in Mat
cv::Mat srcC

...

Gonçalo Cruz

unread,

Feb 18, 2016, 7:29:38 PM2/18/16

to Caffe Users

Hello Youssef,

Thank you for the response.

That repo is gold :) It sure did helped.

Best regards,

Gonçalo Cruz

On Thursday, February 18, 2016 at 2:57:17 PM UTC, Youssef Kashef wrote:

Hello Gonçalo,

I wrote some python code that follows the instructions from the PR#1698 comment. It includes a wrapper function specifically for PASCAL Context.
You can find them here. To run them you need to download the repo and to follow some brief installation instructions.
You basically tell it where to find the PASCAL Context dataset files and a text file that lists which samples go into the validation set.
You end up with 4 lmdb files (1 x training images + 1x training ground truth + 1 x validation images + 1x validation ground truth)

Hope this helps,

Youssef

On Thursday, February 18, 2016 at 3:25:12 PM UTC+1, Gonçalo Cruz wrote:
Hello everyone,

I have been trying to get a FCN working and I have read Youssef's instructions on a previous instructions but now I am stuck at building the lmdb (step 3). I would like to train the network with the Pascal Context dataset.

I have looked into PR#1698 and also into Toru's script.

Probably, I am missing something but for me is not clear which script and/or what paths to include in the scripts.

Can you please clarify the usage?
Thanks in advance.

Best regards,
Gonçalo Cruz

On Thursday, February 4, 2016 at 6:45:13 PM UTC, Jianyu Lin wrote:

...

jade qian

unread,

Feb 24, 2016, 5:08:26 AM2/24/16

to Caffe Users

On Tuesday, August 4, 2015 at 11:34:06 AM UTC+1, Youssef Kashef wrote:

Hello everyone,

I've been trying to train FCN-32s Fully Convolutional Semantic Segmentation on PASCAL-Context but keep getting very high loss values regardless of how many iterations:"Train net output #0: loss = 767455 (* 1 = 767455 loss)". Sometimes it would go as low as 440K but then it'll just jump back up to something higher and oscillate.
Ignoring the high and letting it go through 80K iterations, I still end up with a network that produces all zero output.
I can't tell what's throwing it off like that.

My procedure in detail:

Follow instructions in future.sh from longjon:future, except that I apply the PR merges to BVLC:master instead of longjon:master. Building off of longjon:future results in cuDNN build errors like here. Applying some of the PR merges to BVLC:master is redundant since they've already been merged to the master branch.
Build Caffe with cuda 6.5, cuDNN. I've tried a CPU only build and got the same high-loss behavior, so I don't think it's related to GPU or the driver (then again, I only let it run for 5K iterations).

Generate LMDB for the PASCAL-Context database. The lmdb generation script is built around Evan Shellhammer's python snippet in this comment in PR#1698.
The images are stored as numpy.uin8 with shape C x H x W, with C=3
The ground truth is stored as numpy.int64 with shape C x H x W, with C=1
The order in which the images are stored is the same as the ground truth. One lmdb for the images and one for the labels. I have two pairs of each to reflect the train/val split.
Use net surgery to turn VGG-16 into a fully convolutional model. For this I pretty much followed the net surgery example and used the layer definitions from the FCN-32s' trainval.prototxt in Model Zoo.
Not sure I did this one right though. The output I get for the cat image is still a single element 2D matrix.
I've tried using VGG-16 fcn weights from HyeonwooNoh/DeconvNet/model/get_model.sh but still getting the same behavior.
How can I verify my fully convolutional VGG-16 better?
Apply the solve.py step for initialzing the deconv. parameters. According to shelhammer''s post here, not doing this, could leave things stuck at zero.
What's a good way of verifying the initialization is correct? I'm worried that the problem is there.
solver.protoxt and trainval.prototxt are identical to those shared on Model Zoo. They only differ in the paths to the lmdbs.
I start the training, I start getting "Train net output #0: loss = 767455 (* 1 = 767455 loss)" sometimes it will go down by several 100K, but I never see values < 10.0 that I've seen some report.

I could really use some help in figuring out what I'm doing wrong and understanding why the loss is so high. It seems that people have figured out how to train these fcn's without the need of a detailed guide, so it seems I'm missing a critical step somewhere.

Thank you

On Wednesday, July 8, 2015 at 5:53:34 AM UTC+2, Gavin Hackeling wrote:

Yes, the problem appears to be that your labels have three channels instead of one channel. Assuming that your image has the shape (C, H, W) and that the channel containing your integer class labels is c, you can index that channel using "img = img[c, :, :]".

On Sunday, July 5, 2015 at 3:01:06 AM UTC-4, eran paz wrote:

Hi
Were you able to run the network?
I'm having some trouble with the label matrix.
I've created an image with 0 as background and 1...K marking pixels belonging to each class.

I've created the lmdb according to PR#1698 for both images and labels.
when I run the net I get this error:
Check failed: outer_num_ * inner_num_ == bottom[1]->count() (187500 vs. 562500) Number of labels must match number of predictions; e.g., if softmax axis == 1 and prediction shape is (N, C, H, W), label count (number of labels) must be N*H*W, with integer values in {0, 1, ..., C-1}.

As far as I can tell, the problem is that my labels are saved with 3 channels and not 1, but I couldn't figure out how to save them with 1 channel.

Any help would be appreciated

THX

jade qian

unread,

Feb 24, 2016, 11:10:37 AM2/24/16

to Caffe Users

Hi, Youssef,

I followed your instruction to turn VGG-16 into a fully convolutional model by using net surgery. I fine-tuned on this fully convolutional model by using FCN-32s' trainval.prototxt in Model Zoo. The output all layers are zero after 1 iteration. If I used VGG-16 fcn weights from HyeonwooNoh/DeconvNet/model/get_model.sh, it was working. I can see all output layers(non-zeros).

I downloaded VGG16 fully connected network and VGG_ILSVRC_16_layers_deploy.prototxt from https://gist.github.com/ksimonyan/211839e770f7b538e2d8#file-readme-md

Based on VGG_ILSVRC_16_layers_deploy.prototxt, I wrote the full convolution prototxt by changing last few layers as

....
layers {
bottom: "pool5"
top: "fc6-conv"
name: "fc6-conv"
type: CONVOLUTION
convolution_param {
    num_output: 4096
    kernel_size: 7
}
}
layers {
bottom: "fc6-conv"
top: "fc6-conv"
name: "relu6"
type: RELU
}
layers {
bottom: "fc6-conv"
top: "fc6-conv"
name: "drop6"
type: DROPOUT
dropout_param {
    dropout_ratio: 0.5
}
}
layers {
bottom: "fc6-conv"
top: "fc7-conv"
name: "fc7-conv"
type: CONVOLUTION
convolution_param {
    num_output: 4096
    kernel_size: 1
}
}
layers {
bottom: "fc7-conv"
top: "fc7-conv"
name: "relu7"
type: RELU
}
layers {
bottom: "fc7-conv"
top: "fc7-conv"
name: "drop7"
type: DROPOUT
dropout_param {
    dropout_ratio: 0.5
}
}
layers {
bottom: "fc7-conv"
top: "fc8-conv"
name: "fc8-conv"
type: CONVOLUTION
convolution_param {
    num_output: 1000
    kernel_size: 1
}
}
layers {
bottom: "fc8-conv"
top: "prob"
name: "prob"
type: SOFTMAX
}

Following the instruction of net surgery, I run

# Make sure that caffe is on the python path:
caffe_root = '../' # this file is expected to be in {caffe_root}/examples
import sys
sys.path.insert(0, caffe_root + 'python')

import caffe

# Load the original network and extract the fully connected layers' parameters.
net = caffe.Net('/home/jade/caffe-future/examples/net_surgery/VGG_ILSVRC_16_layers_deploy.prototxt',
                '/home/jade/caffe-future/examples/net_surgery/VGG_ILSVRC_16_layers.caffemodel',
                caffe.TEST)
params = ['fc6', 'fc7', 'fc8']
# fc_params = {name: (weights, biases)}
fc_params = {pr: (net.params[pr][0].data, net.params[pr][1].data) for pr in params}

for fc in params:
    print '{} weights are {} dimensional and biases are {} dimensional'.format(fc, fc_params[fc][0].shape, fc_params[fc][1].shape)

# Load the fully convolutional network to transplant the parameters.
net_full_conv = caffe.Net('/home/jade/caffe-future/examples/net_surgery/VGG_ILSVRC_16_layers_full_conv_deploy.prototxt',
                          '/home/jade/caffe-future/examples/net_surgery/VGG_ILSVRC_16_layers.caffemodel',
                           caffe.TEST)
params_full_conv = ['fc6-conv', 'fc7-conv', 'fc8-conv']
# conv_params = {name: (weights, biases)}
conv_params = {pr: (net_full_conv.params[pr][0].data, net_full_conv.params[pr][1].data) for pr in params_full_conv}

for conv in params_full_conv:
    print '{} weights are {} dimensional and biases are {} dimensional'.format(conv, conv_params[conv][0].shape, conv_params[conv][1].shape)

for pr, pr_conv in zip(params, params_full_conv):
    conv_params[pr_conv][0].flat = fc_params[pr][0].flat # flat unrolls the arrays
    conv_params[pr_conv][1][...] = fc_params[pr][1]

net_full_conv.save('../examples/net_surgery/VGG_ILSVRC_16_layers_fcn.caffemodel')

I am not sure the method above to convert VGG connect network to full convolution network is correct?

Youssef Kashef

unread,

Feb 24, 2016, 11:26:30 AM2/24/16

to Caffe Users

Hello Jade,

You're all zero output seems to come from all-zero weights in layers fc* layers of your fully convolutional VGG-16.

In the net surgery snippet you transform the fc6 weights into weight of convolutional filters and save them under a new layer name fc6-conv. The same applies for layers fc7 and fc8.

When you save the reshaped weights (the weights of your new fully conv. net) to VGG_ILSVRC_16_layers_fcn.caffemodel, you're saving them under the new name (fc6-conv, fc7-conv,...).

If you load the weights from this caffemodel file together with the FCN32s network definition. It will load all the network weights except for weights of layer fc6, fc7. It can't find them in the .caffemodel because you saved them under a different name (e.g. fc6-conv, fc7-conv)

You can avoid this by keeping the layer names fixed during net surgery.

This is not a bug in caffe. It is expected behavior and helps in consciously selecting which weights to load from a .caffemodel file and which to initialize independently from the pre-trained network.

Before starting to train the network. I recommend verifying that none of the layers have all-zero weights. All-zero weights produce all-zero respones. The weights for your Deconvolution layer can either be initialized in python as described in solve.py OR by adding the following to the Donvolution layer definition:

layer {

name: "upscore"

type: "Deconvolution"

bottom: "score59"

top: "upscore"

param {

lr_mult: 0

}

convolution_param {

num_output: 60

bias_term: false

kernel_size: 64

stride: 32

weight_filler: { type: "bilinear" }

}

Hope this helps,

Youssef

...