Pycaffe issues: unable to replicate Tensorflow results + kernel constantly crashes using my model

TS

unread,

Apr 24, 2017, 1:16:50 PM4/24/17

to Caffe Users

Hey everyone, thanks for taking a glance at my issue!

I'm currently trying to replicate a CNN model I had built with Keras/Tensorflow with Caffe so that I could better visualize the layers using the Deep Visualization Toolbox. Using a particular set of parameters, I've been able to get the Tensorflow CNN to reach 90-95% validation accuracy, and in general 85-95% validation accuracy with other settings.

Regarding Caffe, I currently have the toolbox working and have run the Caffe MNIST example both terminal-wise and with Pycaffe, all of which ran flawlessly with good results, so it seems everything is working properly.

I am using the CPU version.

Two issues:

My first problem is that when using the exact same CNN configuration as my Tensorflow model, I can't get the Caffe model to break 30%; in fact, it bottoms out to 0%, and the loss keeps diverging which to me indicated a poor learning rate, but it happens no matter what rate I use. I have a feeling that it's certainly user error somewhere, but I'm not sure where I'm going wrong.

Second, and probably related, the Caffe kernel (and terminal-wise) constantly crashes when running my model, especially if I do step-by-step iterations. Even if it doesn't crash, exploring any variable it creates crashes the kernel as well, even when I can print it with no issue. Did the same thing with the MNIST example and no issue. The error message is : *** Error in `/usr/bin/python': double free or corruption (out): 0x0000562ae845a4e0 ***

My model is structured like so:

3 classes

train set: 885 images

test set: 349 images

input size: 1 x 216 x 216 (grayscaled images)

4x:

Conv (5x5, pad 2, stride 1)
Relu

Max Pool (3x3, stride 3)

2x:

Inner Product (output 7)

Relu

Inner Product (output 3)

Softmax

I've attached a few files:

The python script that makes my prototxts.

error log

Any insight would be EXTREMELY helpful!! Thank you! =)

error_log

caffe_model_upload.py

TS

unread,

Apr 25, 2017, 10:49:50 AM4/25/17

to Caffe Users

As an update:

Strangely, the kernel crashing only happens when including the test set; in fact, when I include it, the code doesn't even make it past the first iteration (but it seems to do the initial pass just fine?). Is there a shortage of memory or something? I thought maybe it was a dimensionality issue somewhere from a poor conversion to the lmdb format, but I've double checked the dimensions of every picture and the data set seems to be fine.

Using the above code, I generated my own prototxts and ran the MNIST example (I made it almost exactly like the 2conv+2Pool+2fc example model), and that worked just fine, so I don't think there's anything wrong with my code.

Also fairly strange: Any model I seem to make causes the loss to oscillate between 1 and 10 repeatedly, with an insignificant level of overall divergence:

At this point, my only guess it that it's somehow an initialization issue? I've tried both Gaussian and Xavier initializations with no improvements.

Anyone have any guesses as to what to try or look for next?

Reply all

Reply to author

Forward

Message has been deleted