I am writing the solver.prototxt that follows the rule of a paper as
In the training phase, the learning rate was set as 0.001 initially and decreased by a factor of 10 when the loss stopped decreasing till 10−7. The discount weight was set as 1 initially and decreased by a factor of 10 every ten thousand iterations until a marginal value 10−3.
Note that, the discount weight is loss_weight in Caffe. Based on the information above, I wrote my solver as
train_net: "train.prototxt"
lr_policy: "step"
gamma: 0.1
stepsize: 10000
base_lr: 0.001 #0.002
In train.prototxt, I also set
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "deconv"
bottom: "label"
top: "loss"
loss_weight: 1
}
However, I still don't know how to set solver to satisfy the rule "decreased by a factor of 10 when the loss stopped decreasing till 10−7" and "decreased by a factor of 10 every ten thousand iterations until a marginal value 10−3". I did not found any caffe's rule can do it:
// The learning rate decay policy. The currently implemented learning rate
// policies are as follows:
// - fixed: always return base_lr.
// - step: return base_lr * gamma ^ (floor(iter / step))
// - exp: return base_lr * gamma ^ iter
// - inv: return base_lr * (1 + gamma * iter) ^ (- power)
// - multistep: similar to step but it allows non uniform steps defined by
// stepvalue
// - poly: the effective learning rate follows a polynomial decay, to be
// zero by the max_iter. return base_lr (1 - iter/max_iter) ^ (power)
// - sigmoid: the effective learning rate follows a sigmod decay
// return base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))
//
// where base_lr, max_iter, gamma, step, stepvalue and power are defined
// in the solver parameter protocol buffer, and iter is the current iteration.
If anyone knows it, please give me some guide to writing the solver.prototxt to satisfy above condition.