Caffe fine tune MNIST Error

Saman Sarraf

unread,

Dec 17, 2015, 2:18:53 PM12/17/15

to Caffe Users

Dear Caffe experts,

I am trying to fine tune the MNIST model by just training new 500 images (the same number of class = 10) and testing by 200 images.

I am getting an error saying "Cannot copy param 0 weights from layer 'conv1'; shape mismatch. Source param shape is 20 1 5 5 (500); target param shape is 20 3 5 5 (1500). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer."

The only thing that I changed in the model file was the path of data and in Solver I just changed the name of model and LMDB. Except those changes, everything is the same.

and one more question, assuming that I know what overftting means, how can I know if my model is overfitted or not? (I took MNIST model and trained a piece of data , the output is Test net output #0: accuracy = 0.695 Test net output #1: loss = 1.41684 (* 1 = 1.41684 loss))

Is that overfitted?

You prompt help is much appreciated,

Saman

Becky Bai

unread,

Dec 18, 2015, 3:16:15 AM12/18/15

to Caffe Users

Do you solve the former problem? I met the same problem.

About the second, if you get a high accuracy in your training set, but a low accuracy in your testing set, maybe that's overfitting.

在 2015年12月18日星期五 UTC+8上午3:18:53，Saman Sarraf写道：

Saman Sarraf

unread,

Dec 18, 2015, 1:25:05 PM12/18/15

to Caffe Users

Actually, I am trying to fine tune the network that was trained using MNIST tutorial. It's kind of strange to get this error. Because, they mentioned for fine tuning you need just to change the input and output and the rest of network has to be remained the same. I did the same thing and it didn't work. I attached the the the model (that's short , you can take a look at it). I also attached my modified network that I basically changed the input and output.

And I am using the Window Caffe Master downloaded from here.

Regarding my second question and you answer, if you take a look at the solver file which is basically from the tutorial, there is a test phase. I'm wondering this "test phase" is actually Test or is Validation. If it "test" , the accuracy reported on the screen is "test data accuracy" and the my question is how should I get the training accuracy? If this is "training accuracy" , how can I get the "test accuracy"?

Thanks a lot for your help,

Saman

lenet_train_test_SS.prototxt

lenet_train_test_SS - Copy.prototxt

YAO CHOU

unread,

Jan 27, 2016, 8:16:10 PM1/27/16

to Caffe Users

Hey saman, for your first question. I think it is about the channel of images. You probably use color images which have 3 channels. you can change it in the lenet.prototxt.

But I got a similar problem in ip1 mismatch.
Cannot copy param 0 weights from layer 'ip1'; shape mismatch. Source param shape is 500 1250 (625000); target param shape is 500 800 (400000). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer.

Do you have the same problem? Any suggestions will be appreciated. Thanks.
Yao

Jan C Peters

unread,

Jan 28, 2016, 4:29:56 AM1/28/16

to Caffe Users

Yao is completely right, the second dimension correspond to the number of color channels, which is usually 3 with color images and 1 with grayscale images. However, grayscale images can also be saved with 3 channels, so you have to be a bit careful about that.

validation and test set are often used interchangeably, the point is that you only train on the training set and have another set (no matter what you call it) where you only feed-forward the data through and compare the outcomes with the actual labels to see how well the network performs on "new" data. So if this performance on new data gets worse or constant with time although the training set error keeps dropping you can be pretty sure there is some overfitting going on.

Jan

@Yao: Seems you indirectly changed the number of weights in the ip1 layer (e.g. by changing the number of outputs of the layer itself or its predecessor), so loading pre-trained weights in there doesn't work (well, how could it?).

YAO CHOU

unread,

Jan 29, 2016, 11:13:06 PM1/29/16

to Caffe Users

hey Jan,

Thanks for your reply. I checked again and again. Acutally I did not change anything. I still can do resume training. But when I try to run classification.bin to a single image, got the error again and again. So wired. if the model still works on the training and can get the accuracy for the test dataset. Why it does not work for the classification? Any suggestion? Thanks.

Yao

Jan C Peters

unread,

Jan 30, 2016, 6:12:37 AM1/30/16

to Caffe Users

When you use inputs of different size for testing/fine-tuning than the network was originally trained with (size corresonds to _all_ of width, height _and_ number of channels here), the blobs' sizes will be adjusted automatically. conv-layers, nonlinearities and pooling layers will still work, because their parameters (aka weights and biases) will still work with differently sized inputs. Not so for the Innerproduct, the number of weights it needs to keep depends on the number of elements of its input blob. So this can only go wrong if not for the exactly same size.

in your situaion, I'd look in both the training and the classification.bin runs at the scaffolding output messages there you will find the sizes of the blobs computed by caffe. They probably disagree between training and classification.bin runs, and where they disagree is the root of your problem.

Jan

Saman Sarraf

unread,

Feb 13, 2016, 12:26:40 PM2/13/16

to Caffe Users

Hi Jan,

Is there any way in Caffe (in the input text files for example) that I can force it to use 1 or 3 channels? As you know it's kind of tricky to save images in single channel format using OpenCV , do you have any suggestion to convert my 3-channel images to 1-channel ones?

Thanks

Jan C Peters

unread,

Feb 15, 2016, 3:23:05 AM2/15/16

to Caffe Users

I don't think so. Caffe adapts the sizes of the blobs automatically to the input data, and there is no way you can change that. But saving images in single-channel format should not be that hard with opencv. You just might need to do the conversion to single-channel grayscale yourself before calling imwrite (using cvtColor for example).

Jan

Hao Li

unread,

Apr 2, 2016, 10:16:10 PM4/2/16

to Caffe Users

Hello,

I have met the similar problem recently, and I have fixed it. I don't know whether your problem is exactly the same one with me. I will be glad if I could help you.

My problem is a little stupid... When I am creating lmdb data file(create lmdb from image files). Since my images are gray, I add --gray when creating train data, but I forgot to add --gray when creating test data. Then the error happens. So it is a problem of data.

在 2015年12月18日星期五 UTC+8上午3:18:53，Saman Sarraf写道：

Dear Caffe experts,

Reply all

Reply to author

Forward

Message has been deleted