I am exploring whether I can apply Caffe to real-life handwritten digit recognition. I'd like to recognize certain numbers from a series of scans, where each digit is on average 12x12 px. I took the official MNIST example and decided to retrain the net to have inputs of size 12x12 instead of 28x28. So I changed the two input layers from LevelDB to IMAGE_DATA. For these layers, I cropped and downsampled the entire MNIST dataset by hand (cropped the images to [4:24, 4:24] and resized to 12x12 with OpenCV's cv2.resize, interpolation=INTER_LINEAR). For the resulting 60000+10000 images I generated indices, in order to plug them in IMAGE_DATA layers.
Here are the resulting *.prototxt files:
I installed Caffe on a Gentoo machine from
this repo and tested whether it would work with the official MNIST example (using LevelDB and all the fancy stuff), which it did. Then I ran it with my modified files and got a mysterious error:
$ caffe.bin train --solver=lenet_solver_modified.prototxt
<...(skipped lots of initialization)...>
I0322 23:04:56.064790 32722 net.cpp:113] Setting up conv2
F0322 23:04:56.067253 32722 blob.cpp:101] Check failed: data_
*** Check failure stack trace: ***
@ 0x7f616bea10ee (unknown)
@ 0x7f616bea2f23 (unknown)
@ 0x7f616bea0d0a (unknown)
@ 0x7f616bea378f (unknown)
@ 0x7f616c2ca50e (unknown)
@ 0x7f616c2920d5 (unknown)
@ 0x7f616c1e97d6 (unknown)
@ 0x7f616c1ebd03 (unknown)
@ 0x7f616c2080c8 (unknown)
@ 0x7f616c2085e9 (unknown)
@ 0x7f616c2089e8 (unknown)
@ 0x40e507 (unknown)
@ 0x40e7d9 (unknown)
@ 0x408182 (unknown)
@ 0x40615a (unknown)
@ 0x7f616b3d6dc6 (unknown)
@ 0x406535 (unknown)
The line in question is here, but it is not informative at all. My best guess is that the convolution and pooling kernels are too large for this size, but I'm not sure how to test that (and if that's the case, how to modify them). Could someone help me with debugging this?