is Batch Normalization supported by Caffe?

7,903 views
Skip to first unread message

Hossein Hasan Pour

unread,
Feb 16, 2016, 8:54:27 AM2/16/16
to Caffe Users
Is batch normalization which was proposed couple of months ago by google implemented in Caffe?
Thanks in advance

Etienne Perot

unread,
Feb 16, 2016, 12:33:44 PM2/16/16
to Caffe Users
yes it is, see cifar10 examples there is a prototxt which adds BatchNorm layer, you need to use use_global_stats:False in training and use_global_stats=True in test & write 3x param{lr_mult:0} in layer definition for some reason.

Hossein Hasan Pour

unread,
Feb 17, 2016, 12:58:01 AM2/17/16
to Caffe Users
Thanks, I noticed that, but what are the parameters ? I cant find any information on this specific layer!
What does those learning rates=0 mean anyway?

Evan Shelhamer

unread,
Feb 20, 2016, 4:42:17 PM2/20/16
to Hossein Hasan Pour, Caffe Users
The parameters are the collected batch norm statistics. The parameter learning rates need to be set to zero or else the solver will think these are learnable parameters that need to have updates applied to them (with momentum, weight decay, and so forth).


Further note: the BatchNorm layer only does the normalization! For the scale and shift also in the batch norm paper include a `ScaleLayer` with `scale_param { bias_term: true }`.



Evan Shelhamer





--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/1b19e01f-8733-4bfb-9e97-c6229e97d528%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Hossein Hasan Pour

unread,
Feb 21, 2016, 1:22:18 AM2/21/16
to Caffe Users, master....@gmail.com
Thanks alot, that makes sense now.
by the way about the second note on the scaling and shifting per channel that you just mentioned,
Do you mean that there is no need for the DummyLayer stuff anymore? as it is documented here ? https://github.com/BVLC/caffe/blob/master/include/caffe/layers/batch_norm_layer.hpp#L25-L27 or are these two different things here?
Thanks alot again

Evan Shelhamer

unread,
Feb 21, 2016, 1:35:50 AM2/21/16
to Hossein Hasan Pour, Caffe Users
Right, #3591 added scale and bias layers, and ScaleLayer with `bias_term: true` handles both together more efficiently, so the DummyDataLayer workaround is no longer needed. That header comment needs to be updated to suggest ScaleLayer.

Evan Shelhamer





Vimal Thilak

unread,
Feb 22, 2016, 3:12:33 PM2/22/16
to Caffe Users, master....@gmail.com
@Evan S,

Thanks to you and your team for adding batnch normalization in caffe. What was the rationale to break-up Batch Normalization implementation into "BatchNorm" followed by a "Scale" layer with bias set to true

By the way, I have successfully translated Inception-v3 into caffe and obtained top-1 accuracy of 0.7844 on ILSVRC12 validation set which is a significant jump from BVLC_GoogLeNet or Inception (v1) that Yangqing released last September.

Do you have any thoughts on what needs to be done to the BN layer as far as fine-tuning task is concerned?

Thanks,

-Vimal

Xin Li

unread,
Feb 29, 2016, 9:22:04 AM2/29/16
to Caffe Users, master....@gmail.com
Hi, it's exciting that you re-implemented inception-v3!
Would you like to share your re-implementation?

在 2016年2月23日星期二 UTC+8上午4:12:33,Vimal Thilak写道:

Hieu Nguyen

unread,
Mar 3, 2016, 10:05:36 PM3/3/16
to Caffe Users, master....@gmail.com
Hi, I'm also interested in the implementation. Can you share it in the model zoo?

Henggang Cui

unread,
Apr 3, 2016, 2:24:42 PM4/3/16
to Caffe Users, master....@gmail.com
Hi Vimal,

Could you share with us your train_val file and solver file of your Inception-v3 network? A testing accuracy of 0.7844 is very impressive.

Thank you so much!

Cui

Jeremy Rutman

unread,
May 30, 2016, 10:40:08 AM5/30/16
to Caffe Users, master....@gmail.com
If I understand right , to do BN you need to do:
layer {
  name
: "bn_conv1_1"
  type
: "BatchNorm"
  bottom
: "conv1_1"
  top
: "conv1_1"
  param
{
    lr_mult
: 0
 
}
  param
{
    lr_mult
: 0  
 
}
  param
{
    lr_mult
: 0
 
}
        batch_norm_param
{
                use_global_stats
: true
       
}
}
layer
{
  name
: "scale1"  
  type
: "ScaleLayer"
#  type: "Scale"
  bottom
: "conv1_1"
  top
: "conv1_1"
  scale_param
{bias_term: true}
}

Does this check out?

Is it possible to use this with nets for pixel-level parsing, where the batch size is 1 image but minibatch actually consists of all the pixels in the image (again iiuc) 

Jeremy Rutman

unread,
Aug 28, 2016, 1:48:27 PM8/28/16
to Caffe Users, master....@gmail.com
ping 
can anyone weigh in here on correct implementation of bn layer - from above it  requires batchnorm then scale, is it the case that
batch_norm_param { use_global_stats: true } must be set during test and set to false during train?

Jeremy Rutman

unread,
Aug 28, 2016, 1:56:49 PM8/28/16
to Caffe Users, master....@gmail.com
never mind: a definitive answer can be found here

Jeremy Rutman

unread,
Dec 21, 2016, 2:02:28 AM12/21/16
to Caffe Users, master....@gmail.com
do the batchnorm learning rates still need to be set to zero?
Reply all
Reply to author
Forward
0 new messages