How does backward pass works without a loss layer?

Matheus Abrantes Gadelha

unread,

Jan 28, 2016, 2:42:21 PM1/28/16

to Caffe Users

Hi,

I've been using the bvlc_reference_caffenet model and noticed that it doesn't have a loss layer. When I am using a model that doesnt have a loss layer, how the backward pass works? Is there any default loss function that is applied? The interesting fact is that the gradients computed by the backward pass do make sense, but I would like to know how it is working architecturally wise.

Regards,

Matheus.

Jan C Peters

unread,

Jan 29, 2016, 3:58:44 AM1/29/16

to Caffe Users

Mine does have a loss layer. Mind the diference between the "deploy" spec of the network for manual classification and the full one used for training. The main difference is that the "deploy" one does not have input layers, as the input blobs are expected to be filled manually by an API call. Optionally one could remove loss layers and only use accuracy layers there, depending on what quantity you care for. This is a common paradigm in caffe, and sadly it is not very well explained in the official docs. But in this user group there are some posts related to that.

Jan

Matheus Abrantes Gadelha

unread,

Feb 4, 2016, 11:54:57 AM2/4/16

to Caffe Users

Good to know the difference between a "deploy" spec and a training one. However, the network I am using does not specifies a loss function and I am still able to perform backward computation. Does caffe creates a default loss function if I do not have one? If yes, what kind of loss function it is? If it doesn't create, how is it possible that I am being able to compute the backward pass correctly?

I am using the following net:

input: "data"

force_backward: true

input_shape {

dim: 1

dim: 3

dim: 227

}

layer {

type: "Convolution"

bottom: "data"

top: "conv1"

convolution_param {

num_output: 96

kernel_size: 11

stride: 4

}

layer {

type: "ReLU"

bottom: "conv1"

top: "conv1"

}

layer {

type: "Pooling"

bottom: "conv1"

top: "pool1"

pooling_param {

pool: MAX

kernel_size: 3

stride: 2

}

layer {

type: "LRN"

bottom: "pool1"

top: "norm1"

lrn_param {

local_size: 5

alpha: 0.0001

beta: 0.75

}

layer {

type: "Convolution"

bottom: "norm1"

top: "conv2"

convolution_param {

num_output: 256

pad: 2

kernel_size: 5

group: 2

}

layer {

type: "ReLU"

bottom: "conv2"

top: "conv2"

}

layer {

type: "Pooling"

bottom: "conv2"

top: "pool2"

pooling_param {

pool: MAX

kernel_size: 3

stride: 2

}

layer {

type: "LRN"

bottom: "pool2"

top: "norm2"

lrn_param {

local_size: 5

alpha: 0.0001

beta: 0.75

}

layer {

type: "Convolution"

bottom: "norm2"

top: "conv3"

convolution_param {

num_output: 384

pad: 1

kernel_size: 3

}

layer {

type: "ReLU"

bottom: "conv3"

top: "conv3"

}

layer {

type: "Convolution"

bottom: "conv3"

top: "conv4"

convolution_param {

num_output: 384

pad: 1

kernel_size: 3

group: 2

}

layer {

type: "ReLU"

bottom: "conv4"

top: "conv4"

}

layer {

type: "Convolution"

bottom: "conv4"

top: "conv5"

convolution_param {

num_output: 256

pad: 1

kernel_size: 3

group: 2

}

layer {

type: "ReLU"

bottom: "conv5"

top: "conv5"

}

layer {

type: "Pooling"

bottom: "conv5"

top: "pool5"

pooling_param {

pool: MAX

kernel_size: 3

stride: 2

}

layer {

type: "InnerProduct"

bottom: "pool5"

top: "fc6"

inner_product_param {

num_output: 4096

}

layer {

type: "ReLU"

bottom: "fc6"

top: "fc6"

}

layer {

type: "Dropout"

bottom: "fc6"

top: "fc6"

dropout_param {

dropout_ratio: 0.5

}

layer {

type: "InnerProduct"

bottom: "fc6"

top: "fc7"

inner_product_param {

num_output: 4096

}

layer {

type: "ReLU"

bottom: "fc7"

top: "fc7"

}

layer {

type: "Dropout"

bottom: "fc7"

top: "fc7"

dropout_param {

dropout_ratio: 0.5

}

layer {

type: "InnerProduct"

bottom: "fc7"

top: "fc8"

inner_product_param {

num_output: 1000

}

--
You received this message because you are subscribed to a topic in the Google Groups "Caffe Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/caffe-users/vNyNHCqhTmg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/07a97576-087b-4ca7-a8cb-f69fc741f8b1%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Matheus A. Gadelha

Alex Orloff

unread,

Feb 4, 2016, 4:24:15 PM2/4/16

to Caffe Users

You don't need any backpropagation while you are "using" network.

Can you also train it?

Matheus Abrantes Gadelha

unread,

Feb 4, 2016, 5:01:43 PM2/4/16

to Caffe Users

I am aware that I don't need backpropagation to use the CNN. However, I need the gradient (diff) information that is in the final result of the backprop (I am experimenting w/ the network to synthesize some images). I didn't try to train it, but I would probably be able to, since backprop is working. I just don't understand why, since I don't have a loss layer.

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/b7f09da3-9d18-4fa7-bd96-6afaa4edce05%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Matheus A. Gadelha

Alex Orloff

unread,

Feb 4, 2016, 5:12:27 PM2/4/16

to Caffe Users

Matheus,

> I need the gradient (diff) information that is in the final result of the backprop

Can you bring more details to this point?

btw grad id d(loss)/dx, so you cannot calculate gradient without loss function.

Matheus Abrantes Gadelha

unread,

Feb 4, 2016, 5:38:13 PM2/4/16

to Caffe Users

Sure.

I've basically implemented this paper: http://arxiv.org/pdf/1312.6034.pdf. Thus, I execute the backward pass in order to get the gradients with respect to the initial nodes (the image). That's the reason the model description has the option force_backward: true.

In code:

target = 281
diff = np.zeros((1,1000,1,1))
diff[0,target,0,0] = 1
back = net.backward(**{net.outputs[0]:diff})

> btw grad id d(loss)/dx, so you cannot calculate gradient without loss function.

Exactly! That's this should not be working!

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/4e352015-605d-41be-b26e-a928fae88c73%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Matheus A. Gadelha

Matheus Abrantes Gadelha

unread,

Feb 4, 2016, 5:42:04 PM2/4/16

to Caffe Users

After giving some thought about the code I think that I found the solution. The backward pass probably just propagates the diff values from a given layer. This means that this code doesn't actually needs a loss function, since I am passing the differential already, not the final classification. I will run some experiments tomorrow and will report the results.

Thank you for the discussion.

Best,

Matheus.

--

Matheus A. Gadelha

Jan C Peters

unread,

Feb 5, 2016, 8:17:02 AM2/5/16

to Caffe Users

That is exactly what I though when I looked at your code snippet. Indeed caffe only propagates diffs backwards. The loss layer is responsible to compute the topmost diff values. Since you provided them yourself, there is no need for a loss layer.

By the way: I don't think caffe creates any layer automatically that you did not explicitly request. And for the loss: I have read (but not used myself) that in caffe every layer can made to contribute to the loss, by setting "loss_weight" to something other than 0 (https://groups.google.com/d/msg/caffe-users/4wRzMk5Lq0g/YrQpPeEAEQAJ). Not sure how/if this is useful, but it is nice to have the flexibility. loss_weight is by default 1 for layers of loss type and 0 for all other layers.

Jan

Matheus Abrantes Gadelha

unread,

Feb 5, 2016, 11:33:30 AM2/5/16

to Caffe Users

That's a very useful information! Thank you very much!

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/055247d2-565c-4aaa-bb0c-a8d3d1ba400a%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Matheus A. Gadelha

Reply all

Reply to author

Forward