Hey all,
As there has been some confusion about the Caffe batch norm layer, I have tried to address it in
PR #4704 now merged to master. In summary:
1. Caffe's batch norm layer only handles the mean/variance standardization. For the scale and shift a further `ScaleLayer` with `bias_term: true` is needed.
2. The layer parameters are not learnable parameters; they are the statistics estimated by batch norm. For this reason they should not be exposed to the solver since gradient descent on these parameters is an error and it will thrash training. This is now automatically handled by Caffe.
3. Before (2) it was necessary to manually mark the batch norm parameters as fixed by `param { lr_mult: 0 }` declarations. Since this is now handled by (2), old definitions are automatically upgraded to strip these now unnecessary declarations.
Here is an example batch norm layer definition in the latest master.
layer {
name: "bn1"
type: "BatchNorm"
bottom: "conv1"
top: "conv1"
}
Happy brewing,