Caffe strats with loss at 0

Alperen AYDIN

unread,

Jul 1, 2016, 3:23:36 AM7/1/16

to Caffe Users

Hello, I am having some problems with a CNN I am working on.

The model is defined here: http://pastebin.com/QHSxwrsT
An image of the model: http://i.imgur.com/xRnuaUh.jpg

The solver: http://pastebin.com/gsurW6Ar

And the command I used was:
> caffe train -solver solver.prototxt

And lastly the log: http://pastebin.com/U4mr8i6x

As you can see the loss starts off at 0, and it stays there. Did I do something wrong

Hieu Do Trung

unread,

Jul 1, 2016, 6:31:56 AM7/1/16

to Caffe Users

Seems that your layers are not connected at all.

I'd change this:
# Followed by a layers of conv3x3x256
layer
{
    name:"Conv3x3x256"
    type:"Convolution"
    bottom: "conv5"
    top: "conv6"
    convolution_param
    {
        kernel_size: 3
        num_output: 256
        # The filters are 3x3x256
        pad: 1
        # So the output is H/4xW/4x256
    }
}
# And lastly, a with a conv3x3x1
layer
{
    name:"Conv3x3x1"
    type:"Convolution"
    bottom: "conv6"
    top: "conv7"
    convolution_param
    {
        kernel_size: 3
        num_output: 1
        # The filters are 3x3x64
        pad: 1
        # So the output is H/4xW/4x1
    }
}

into something like this:

# Followed by a layers of conv3x3x256
layer
{
    name:"Conv3x3x256"
    type:"Convolution"
    bottom: "Conv3x3x64_4"
    top: "Conv3x3x256"
    convolution_param
    {
        kernel_size: 3
        num_output: 256
        # The filters are 3x3x256
        pad: 1
        # So the output is H/4xW/4x256
    }
}
# And lastly, a with a conv3x3x1
layer
{
    name:"Conv3x3x1"
    type:"Convolution"
    bottom: "Conv3x3x256"
    top: "Conv3x3x1"
    convolution_param
    {
        kernel_size: 3
        num_output: 1
        # The filters are 3x3x64
        pad: 1
        # So the output is H/4xW/4x1
    }
}

And the base learning rate seems too small: 0.000000001
Stepsize is too small, stepsize: 10 / max_iter: 450000
After 10 iteration, learning rate become 0.000000001/10, and so on.
Even base learning rate is smaller than single precision value can handle.

Alperen AYDIN

unread,

Jul 1, 2016, 8:23:48 AM7/1/16

to Caffe Users

The layers are connected. If they weren't, drw_net.py would have drawn it as such (or so I assume). But your naming convention is better than mine so I adopted it.

Beyond that, what would be good values for the solver?

(Sorry, I am quiet new to Caffe and Neural Nets in general)

Reply all

Reply to author

Forward