Batch Normalization Layer behavior

George Kopanas

unread,

Apr 28, 2016, 9:55:02 AM4/28/16

to Caffe Users

Hello all,

I have created a cNN architecture with BatchNorm layer before every Sigmoid layer, attached you can find the respecting files. While training I noticed a weird behavior in my validation loss and want to double check that everything is running smoothly, to do that I added the same input layer for test and train and I am testing in every training iteration batches of the same size as in training, so in theory I am testing and training the same batch in every iteration, so the losses should be the same. To double check that my test is correct, I created a simple cNN architecture without BatchNorm and in fact the two losses are exactly the same in every iteration. When I run the test in the cNN that is attached with the BatchNorm I get different losses in test and train. Is there any explanation about that?

George Kopanas

solver.prototxt

test_bn.prototxt

train_bn.prototxt

George Kopanas

unread,

Apr 28, 2016, 9:58:47 AM4/28/16

to Caffe Users

Please ignore the previous files the are there by mistake, the corrected ones are here.

test_bn.prototxt

train_bn.prototxt

Laszlo Friedmann

unread,

Apr 29, 2016, 9:52:00 AM4/29/16

to Caffe Users

You should get much better results using a moving average fraction of 0.95, default is 0.999. Also set decay_mult to 0.

layer {
name: "conv1_1_BN"
type: "BatchNorm" include { phase: TRAIN}
bottom: "conv1_1"
top: "conv1_1_BN"
param {
    lr_mult: 0
    decay_mult: 0
}
param {
    lr_mult: 0
    decay_mult: 0
}
param {
    lr_mult: 0
    decay_mult: 0
}
batch_norm_param {
    use_global_stats: false
    moving_average_fraction: 0.95
}
}
layer {
name: "conv1_1_BN"
type: "BatchNorm" include { phase: TEST}
bottom: "conv1_1"
top: "conv1_1_BN"
param {
    lr_mult: 0
    decay_mult: 0
}
param {
    lr_mult: 0
    decay_mult: 0
}
param {
    lr_mult: 0
    decay_mult: 0
}
batch_norm_param {
    use_global_stats: true
    moving_average_fraction: 0.95

George Kopanas

unread,

Apr 30, 2016, 6:21:01 AM4/30/16

to Caffe Users

Hello, Lazlo

Thanks for your time, but my problem is not the quality of my results but the fact that I cannot make testing and traing provide the same loss when I provide the same exactly batch.

George

Roman Nagornov

unread,

Sep 1, 2016, 11:36:11 AM9/1/16

to Caffe Users

Hi, George.

Recently I've encountered exactly the same strange behavior.

I was making simple regression task with CNN, after several BatchNorm layers were added training speed and loss value improved drastically, but the catch was that interference results on the same dataset I'd used for train stage, seemed to be rather random than generated by a trained network (loss value was much higher). According to the original paper by Ioffe & Szegedy TRAIN and TEST stages of the algorithm differ. At TRAIN stage mean and variance are computed for each batch independently and their unbiased averages are used at TEST stage (controlled by use_global_stats parameter) - correct me if I'm wrong. So after I left just three images in dataset and set batch_size: 3 accordingly, so all images could be processed in one batch, I expected TRAIN and TEST stages mean and variance be the same and produce the same result on the same dataset which is not the case. If I print test loss values on the same dataset while training (as George do, according to his solver.prototxt) I see that TEST loss decrease along with TRAIN loss decreasing, but with much higher order of magnitude (TRAIN loss about 10^-7, TEST loss about 0.1 for example) and more shallow convergence slope than TRAIN loss do.

I'm wondering if someone got an explanation now.

Reply all

Reply to author

Forward