// The learning rate decay policy. The currently implemented learning rate
// policies are as follows:
// - fixed: always return base_lr.
// - step: return base_lr * gamma ^ (floor(iter / step))
// - exp: return base_lr * gamma ^ iter
// - inv: return base_lr * (1 + gamma * iter) ^ (- power)
// - multistep: similar to step but it allows non uniform steps defined by
// stepvalue
// - poly: the effective learning rate follows a polynomial decay, to be
// zero by the max_iter. return base_lr (1 - iter/max_iter) ^ (power)
// - sigmoid: the effective learning rate follows a sigmod decay
// return base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))
//
// where base_lr, max_iter, gamma, step, stepvalue and power are defined
// in the solver parameter protocol buffer, and iter is the current iteration.
I think we need to figure out by ourself. The forum is very popular, Evan might not have time to explain it.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/027094fa-369d-4cb5-9070-f10dd3ec52a1%40googlegroups.com.--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
So... does that mean you can only use the loss_weight with loss layers?
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/4da588af-43d5-4647-a632-913cc524f70a%40googlegroups.com.
[...] The scale parameter is stored in the diff()
of the top blob -- in the case of the loss layers that top blob is a singleton, so the loss layers had to be modified to multiply their gradients by a scale parameter specified by the singleton top blob diff, but all the other layers already knew how to backprop their diffs and could just be used as is. The only annoying thing was that to get top blobs to be both inputs to other layers and losses, I had to use split layers, as it's functionally the same thing as sending the output to two different layers [....]
eventually merge into Caffe master branch. From your provided link, I see it is merged into Caffe dev branch
Hi,
Thank Evan for the link and explanations. We all appreciate core devs who spend time to develop the code and spend time to answer questions.
Hi Jan, what the paragraph mean is that Caffe will split the topblob of a layer that want to produce loss into two top blob internally, one goes to compute loss ("top most") and one goes as output into latter layer. And only it makes sense. I think this also causes confusions for user in documentations (section loss weights http://caffe.berkeleyvision.org/tutorial/loss.html)However, any layer can be used as a loss by adding a fieldloss_weight: <float>
to a layer definition for eachtop
blob produced by the layer.
Hi Evan, I would like to know whether PR686 (https://github.com/BVLC/caffe/pull/686) eventually merge into Caffe master branch. From your provided link, I see it is merged into Caffe dev branch. I am keen to know why core devs decided to follow the internal splitting design. Instead, we can let users defining a loss layer on an intermediate layer that they want it to produce loss. The latter approach is more transparent and safe to users.
Best regards,
@An
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/5f946e2f-cd48-4cc4-84fa-d63d67efa633%40googlegroups.com.