Hi all,
I am trying to set a constant learning rate for the batch_norm parameters (gamma, beta). I couldn't find a way to do it in the prototxt so I was trying to hardcode it in SGDSolver.
I am a bit confused by this in common_layers.hpp. There are only two params (gamma, beta). Why is the lr_mult:0 ? If this is zero, what learning rate is finally used to train the batchnorm parameters?
* By default, during training time, the network is computing global mean/
* variance statistics via a running average, which is then used at test
* time to allow deterministic outputs for each input. You can manually
* toggle whether the network is accumulating or using the statistics via the
* use_global_stats option. IMPORTANT: for this feature to work, you MUST
* set the learning rate to zero for all three parameter blobs, i.e.,
* param {lr_mult: 0} three times in the layer definition.