Mysterious error while running modified LeNet training

230 views

caffeerrorimage_datainputsmnistnovicetraining

Skip to first unread message

Pastafarianist

unread,

Mar 22, 2015, 4:51:44 PM3/22/15

to caffe...@googlegroups.com

I am exploring whether I can apply Caffe to real-life handwritten digit recognition. I'd like to recognize certain numbers from a series of scans, where each digit is on average 12x12 px. I took the official MNIST example and decided to retrain the net to have inputs of size 12x12 instead of 28x28. So I changed the two input layers from LevelDB to IMAGE_DATA. For these layers, I cropped and downsampled the entire MNIST dataset by hand (cropped the images to [4:24, 4:24] and resized to 12x12 with OpenCV's cv2.resize, interpolation=INTER_LINEAR). For the resulting 60000+10000 images I generated indices, in order to plug them in IMAGE_DATA layers.

Here are the resulting *.prototxt files:

https://gist.github.com/Pastafarianist/71a0bfac5a9f1ee15bf7

https://gist.github.com/Pastafarianist/243f00a28fcbec3cbee4

I installed Caffe on a Gentoo machine from this repo and tested whether it would work with the official MNIST example (using LevelDB and all the fancy stuff), which it did. Then I ran it with my modified files and got a mysterious error:

$ caffe.bin train --solver=lenet_solver_modified.prototxt
<...(skipped lots of initialization)...>
I0322 23:04:56.064790 32722 net.cpp:113] Setting up conv2 
F0322 23:04:56.067253 32722 blob.cpp:101] Check failed: data_ 
*** Check failure stack trace: *** 
 @ 0x7f616bea10ee (unknown) 
 @ 0x7f616bea2f23 (unknown) 
 @ 0x7f616bea0d0a (unknown) 
 @ 0x7f616bea378f (unknown) 
 @ 0x7f616c2ca50e (unknown) 
 @ 0x7f616c2920d5 (unknown) 
 @ 0x7f616c1e97d6 (unknown) 
 @ 0x7f616c1ebd03 (unknown) 
 @ 0x7f616c2080c8 (unknown) 
 @ 0x7f616c2085e9 (unknown) 
 @ 0x7f616c2089e8 (unknown) 
 @ 0x40e507 (unknown) 
 @ 0x40e7d9 (unknown) 
 @ 0x408182 (unknown) 
 @ 0x40615a (unknown) 
 @ 0x7f616b3d6dc6 (unknown) 
 @ 0x406535 (unknown)

complete error log

The line in question is here, but it is not informative at all. My best guess is that the convolution and pooling kernels are too large for this size, but I'm not sure how to test that (and if that's the case, how to modify them). Could someone help me with debugging this?

Xinyu Zhang

unread,

Jun 1, 2016, 10:45:38 AM6/1/16

to Caffe Users

I have encountered the similar problem and solved it. Since you have different the input size, so the old pooling parameter may cause 0x0 dimension and this is where the error from.