Model does not converge after adding batch norm

Soda Heng

unread,

Mar 1, 2016, 9:29:23 AM3/1/16

to Caffe Users

My model based on 3 convolutions has no problems converging when using dropout and LRN. After i substituted both layers out with a batch norm layer, it no longer converges. I believe im using it correctly but doesn't work no matter how small the learning rate i set is. Can anyone here shed some light on why?

Original Model:

layer {

type: "Convolution"

bottom: "data"

top: "conv1"

param {

lr_mult: 1

}

param {

lr_mult: 2

}

convolution_param {

num_output: 48

pad: 0

kernel_size: 5

stride: 1

weight_filler {

type: "xavier"

}

bias_filler {

type: "constant"

}

layer {

type: "Dropout"

bottom: "conv1"

top: "conv1"

dropout_param {

dropout_ratio: 0.5

}

layer {

type: "ReLU"

bottom: "conv1"

top: "conv1"

}

layer {

type: "LRN"

bottom: "conv1"

top: "norm1"

lrn_param {

local_size: 3

alpha: 0.0001

beta: 0.75

}

layer {

type: "Pooling"

bottom: "norm1"

top: "pool1"

pooling_param {

pool: MAX

kernel_size: 2

stride: 2

}

.... Repeat x3

New Model:

layer {

type: "Convolution"

bottom: "data"

top: "conv1"

param {

lr_mult: 1

}

param {

lr_mult: 2

}

convolution_param {

num_output: 48

pad: 0

kernel_size: 5

stride: 1

weight_filler {

type: "xavier"

}

bias_filler {

type: "constant"

}

layer {

type: "ReLU"

bottom: "conv1"

top: "conv1"

}

layer {

type: "BatchNorm" include { phase: TRAIN}

bottom: "conv1"

top: "conv1_BN"

param {

lr_mult: 0

decay_mult: 0

}

param {

lr_mult: 0

decay_mult: 0

}

param {

lr_mult: 0

decay_mult: 0

}

batch_norm_param {

use_global_stats: false

moving_average_fraction: 0.95

}

layer {

type: "BatchNorm" include { phase: TEST}

bottom: "conv1"

top: "conv1_BN"

param {

lr_mult: 0

decay_mult: 0

}

param {

lr_mult: 0

decay_mult: 0

}

param {

lr_mult: 0

decay_mult: 0

}

batch_norm_param {

use_global_stats: true

moving_average_fraction: 0.95

}

layer {

type: "Pooling"

bottom: "conv1_BN"

top: "pool1"

pooling_param {

pool: MAX

kernel_size: 2

stride: 2

}

.....repeat x3

Message has been deleted

Soda Heng

unread,

Mar 2, 2016, 10:29:21 AM3/2/16

to Caffe Users

Can anyone confirm if the data that gets fed in from the data layer can be normalized ahead of time or not?

My current model currently has normalized data already from a HDF5 input.

I'm wondering if that is affecting the batchnorm

Xi Yin

unread,

Mar 2, 2016, 10:53:01 AM3/2/16

to Caffe Users

for me, i just substract the mean from each image and save the result into hdf5 file.

I did not do normalization and it works well for me.

在 2016年3月2日星期三 UTC-5上午10:29:21，Soda Heng写道：

Soda Heng

unread,

Mar 2, 2016, 4:54:10 PM3/2/16

to Caffe Users

Thanks Xi,

I will give that a try and see if it works. Really appreciate the response.

Jan C Peters

unread,

Mar 3, 2016, 3:40:55 AM3/3/16

to Caffe Users

Sure you can normalize/preprocess the data yourself instead of using caffe's transforming capabilities. I have always done it this way and everything worked as expected. The only thing I want to add is that in any case there should be some kind of preprocessing. If you just feed the raw RGB or grayscale bytes to the network you will never get any useful training done (at least that is my experience). On the other hand, mean-subtraction may already be enough. But in my experience (and it seems to also be the general consensus) you should always do at least that.

Jan

Soda Heng

unread,

Mar 5, 2016, 2:32:36 AM3/5/16

to Caffe Users

Hi Xi,

I tried doing only mean subtraction on my data and its still having problems converging.

Do you have any other thing you can think of as to why it's not converging? I also attached my full prototxt file incase that would help. Any help would be really appreciated.

Thanks!

new3conv.prototxt

Amitoz Singh

unread,

Mar 7, 2016, 1:03:17 AM3/7/16

to Caffe Users

I am having a similar issue as OP where the graphs do not converge, anyone who has experience with batch_norm layers and can shed some light would be extremely helpful

Also should not use batch normalization before relu? Or do we want the input to be normalized for the whole convolution layer?

Thanks
Amitoz

Amitoz Singh

unread,

Mar 9, 2016, 6:05:31 AM3/9/16

to Caffe Users

Hi Soda,

Turned out to be a caffe build issue for me. I had been trying to integrate and build the latest layers with nvidia digits caffe library and messed something up. With Caffe itself I don't see the mentioned issue. Might be a caffe build issue for you too where the learning is not happening for batch norm layer

Regards

Amitoz

Soda Heng

unread,

Mar 9, 2016, 10:32:23 AM3/9/16

to Caffe Users

Hi Amitoz,

That could be the problem for me as well. I've been using the nvidia digits version of caffe for all of my tests. I'll build caffe and try again.

Thanks for the update.

Soda

Soda Heng

unread,

Mar 10, 2016, 10:12:37 AM3/10/16

to Caffe Users

I have verified that the issue was caused by nvidia's version of caffe. After getting the latest, batch norm works now. Thanks.

bingca...@gmail.com

unread,

Mar 14, 2016, 4:57:17 PM3/14/16

to Caffe Users

Hi Soda,

You are not alone! I tried Residual Learning from Microsoft research, which uses "BatchNorm". It does not converge at all. I am asking around if anyone has success to run BatchNorm with Residual Learning using Caffe.

cheers,

Bingcai

Igor Barbosa

unread,

Apr 20, 2016, 4:44:01 PM4/20/16

to Caffe Users

Hey all.

I have had some success with BN.

I did find some issues while doing classification using the CPU. ( Classify one while training). Other than that it seems to be working fine.

You can find the BN proto sample and some more information at Issue 694 ( https://github.com/NVIDIA/DIGITS/issues/694 )

Igor.

Igor Barbosa

unread,

Apr 25, 2016, 6:23:50 AM4/25/16

to Caffe Users

ANother thing.
Rmsprop never seems to converge with BN for me.

Did you try with SGD ? Do you have convergence issues with it ?

Xujun Peng

unread,

Feb 2, 2017, 8:05:46 PM2/2/17

to Caffe Users

Hi Amitoz,

May I know what is nvidia digits caffe library? That is a branch other than official caffe?

I encountered the problem that the network does not converge, with the BN layer. How should I fix it then?

Thanks,
Xujun

Reply all

Reply to author

Forward