How does backward pass works without a loss layer?

519 views
Skip to first unread message

Matheus Abrantes Gadelha

unread,
Jan 28, 2016, 2:42:21 PM1/28/16
to Caffe Users
Hi,

I've been using the bvlc_reference_caffenet model and noticed that it doesn't have a loss layer. When I am using a model that doesnt have a loss layer, how the backward pass works? Is there any default loss function that is applied? The interesting fact is that the gradients computed by the backward pass do make sense, but I would like to know how it is working architecturally wise.

Regards,
Matheus.

Jan C Peters

unread,
Jan 29, 2016, 3:58:44 AM1/29/16
to Caffe Users
Mine does have a loss layer. Mind the diference between the "deploy" spec of the network for manual classification and the full one used for training. The main difference is that the "deploy" one does not have input layers, as the input blobs are expected to be filled manually by an API call. Optionally one could remove loss layers and only use accuracy layers there, depending on what quantity you care for. This is a common paradigm in caffe, and sadly it is not very well explained in the official docs. But in this user group there are some posts related to that.

Jan

Matheus Abrantes Gadelha

unread,
Feb 4, 2016, 11:54:57 AM2/4/16
to Caffe Users
Good to know the difference between a "deploy" spec and a training one. However, the network I am using does not specifies a loss function and I am still able to perform backward computation. Does caffe creates a default loss function if I do not have one? If yes, what kind of loss function it is? If it doesn't create, how is it possible that I am being able to compute the backward pass correctly? 

I am using the following net:

name: "CaffeNet"
input: "data"
force_backward: true
input_shape {
  dim: 1
  dim: 3
  dim: 227
  dim: 227
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "pool1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "norm1"
  top: "conv2"
  convolution_param {
    num_output: 256
    pad: 2
    kernel_size: 5
    group: 2
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "pool2"
  top: "norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "norm2"
  top: "conv3"
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    group: 2
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "conv4"
}
layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    group: 2
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  inner_product_param {
    num_output: 4096
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc8"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8"
  inner_product_param {
    num_output: 1000
  }


--
You received this message because you are subscribed to a topic in the Google Groups "Caffe Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/caffe-users/vNyNHCqhTmg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/07a97576-087b-4ca7-a8cb-f69fc741f8b1%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Matheus A. Gadelha

Alex Orloff

unread,
Feb 4, 2016, 4:24:15 PM2/4/16
to Caffe Users
You don't need any backpropagation while you are "using" network.
Can you also train it?

Matheus Abrantes Gadelha

unread,
Feb 4, 2016, 5:01:43 PM2/4/16
to Caffe Users
I am aware that I don't need backpropagation to use the CNN. However, I need the gradient (diff) information that is in the final result of the backprop (I am experimenting w/ the network to synthesize some images). I didn't try to train it, but I would probably be able to, since backprop is working. I just don't understand why, since I don't have a loss layer.


For more options, visit https://groups.google.com/d/optout.



--
Matheus A. Gadelha

Alex Orloff

unread,
Feb 4, 2016, 5:12:27 PM2/4/16
to Caffe Users
Matheus,

>  I need the gradient (diff) information that is in the final result of the backprop
Can you bring more details to this point?
btw grad id d(loss)/dx, so you cannot calculate gradient without loss function.

Matheus Abrantes Gadelha

unread,
Feb 4, 2016, 5:38:13 PM2/4/16
to Caffe Users
Sure.

I've basically implemented this paper: http://arxiv.org/pdf/1312.6034.pdf. Thus, I execute the backward pass in order to get the gradients with respect to the initial nodes (the image). That's the reason the model description has the option force_backward: true.

In code:
target = 281
diff 
= np.zeros((1,1000,1,1))
diff
[0,target,0,0] = 1
back = net.backward(**{net.outputs[0]:diff})

> btw grad id d(loss)/dx, so you cannot calculate gradient without loss function.
Exactly! That's this should not be working!


For more options, visit https://groups.google.com/d/optout.



--
Matheus A. Gadelha

Matheus Abrantes Gadelha

unread,
Feb 4, 2016, 5:42:04 PM2/4/16
to Caffe Users
After giving some thought about the code I think that I found the solution. The backward pass probably just propagates the diff values from a given layer. This means that this code doesn't actually needs a loss function, since I am passing the differential already, not the final classification. I will run some experiments tomorrow and will report the results.

Thank you for the discussion.

Best,
Matheus.
--
Matheus A. Gadelha

Jan C Peters

unread,
Feb 5, 2016, 8:17:02 AM2/5/16
to Caffe Users
That is exactly what I though when I looked at your code snippet. Indeed caffe only propagates diffs backwards. The loss layer is responsible to compute the topmost diff values. Since you provided them yourself, there is no need for a loss layer.

By the way: I don't think caffe creates any layer automatically that you did not explicitly request. And for the loss: I have read (but not used myself) that in caffe every layer can made to contribute to the loss, by setting "loss_weight" to something other than 0 (https://groups.google.com/d/msg/caffe-users/4wRzMk5Lq0g/YrQpPeEAEQAJ). Not sure how/if this is useful, but it is nice to have the flexibility. loss_weight is by default 1 for layers of loss type and 0 for all other layers.

Jan

Matheus Abrantes Gadelha

unread,
Feb 5, 2016, 11:33:30 AM2/5/16
to Caffe Users
That's a very useful information! Thank you very much!


For more options, visit https://groups.google.com/d/optout.



--
Matheus A. Gadelha

Reply all
Reply to author
Forward
0 new messages