BatchNormalization outputs NaN

1,041 views
Skip to first unread message

Andrés Felipe Romero Vergara

unread,
Jan 26, 2017, 9:08:26 AM1/26/17
to Caffe Users
Hello folks,

I've trying with BatchNormalization by following several tutorial to train with it:
- Using three of these param { lr_mult: 0 }.
- Using use_global_stats: false in order to train.
- Using BN followed by a Scale layer with bias_term: true

BUT, when I train with debug_info set to true, I got this at the very first lines.

I0126 08:55:26.362011 172388 caffe.cpp:248] Starting Optimization
I0126 08:55:26.362042 172388 solver.cpp:306] Solving
I0126 08:55:26.362046 172388 solver.cpp:307] Learning Rate Policy: step
I0126 08:55:26.370219 172388 net.cpp:593]     [Forward] Layer data, top blob data data: 126.89
I0126 08:55:26.370272 172388 net.cpp:593]     [Forward] Layer data, top blob label data: 11.3571
I0126 08:55:26.370312 172388 net.cpp:593]     [Forward] Layer label_data_1_split, top blob label_data_1_split_0 data: 11.3571
I0126 08:55:26.370362 172388 net.cpp:593]     [Forward] Layer label_data_1_split, top blob label_data_1_split_1 data: 11.3571
I0126 08:55:26.376642 172388 net.cpp:593]     [Forward] Layer conv1_1, top blob conv1_1 data: 48.5156
I0126 08:55:26.376684 172388 net.cpp:605]     [Forward] Layer conv1_1, param blob 0 data: 0.16864
I0126 08:55:26.376719 172388 net.cpp:605]     [Forward] Layer conv1_1, param blob 1 data: 0.501886
I0126 08:55:26.392660 172388 net.cpp:593]     [Forward] Layer bn_conv1_1, top blob conv1_1 data: nan
I0126 08:55:26.392700 172388 net.cpp:605]     [Forward] Layer bn_conv1_1, param blob 0 data: 38.2106
I0126 08:55:26.392734 172388 net.cpp:605]     [Forward] Layer bn_conv1_1, param blob 1 data: nan
I0126 08:55:26.392756 172388 net.cpp:605]     [Forward] Layer bn_conv1_1, param blob 2 data: 1
I0126 08:55:26.402289 172388 net.cpp:593]     [Forward] Layer scale_conv1_1, top blob conv1_1 data: nan
I0126 08:55:26.402333 172388 net.cpp:605]     [Forward] Layer scale_conv1_1, param blob 0 data: 1
I0126 08:55:26.402390 172388 net.cpp:605]     [Forward] Layer scale_conv1_1, param blob 1 data: 0.001
I0126 08:55:26.405901 172388 net.cpp:593]     [Forward] Layer relu1_1, top blob conv1_1 data: nan
I0126 08:55:26.418678 172388 net.cpp:593]     [Forward] Layer conv1_2, top blob conv1_2 data: nan
I0126 08:55:26.418715 172388 net.cpp:605]     [Forward] Layer conv1_2, param blob 0 data: 0.0292296
I0126 08:55:26.418777 172388 net.cpp:605]     [Forward] Layer conv1_2, param blob 1 data: 0.265496
I0126 08:55:26.434610 172388 net.cpp:593]     [Forward] Layer bn_conv1_2, top blob conv1_2 data: nan
I0126 08:55:26.434650 172388 net.cpp:605]     [Forward] Layer bn_conv1_2, param blob 0 data: nan
I0126 08:55:26.434682 172388 net.cpp:605]     [Forward] Layer bn_conv1_2, param blob 1 data: nan
I0126 08:55:26.434690 172388 net.cpp:605]     [Forward] Layer bn_conv1_2, param blob 2 data: 1

And everything is NaN from there, then LOSS never decrease. I understand the thhree BN blobs means: MEAN, VARIANCE and MOVING AVERAGE FACTOR as https://github.com/BVLC/caffe/blob/master/include/caffe/layers/batch_norm_layer.hpp#L25-L27 suggests, so why could I get NaN variance? Any advice?

Thanks!

Richard Burton

unread,
Feb 6, 2017, 9:33:55 AM2/6/17
to Caffe Users
You will get NaN on the first test phase before training as there are no learned batch norm parameters yet (mean,std etc.) and so caffe will use the std of the test image batch which is just one image so std = 0 giving us a divide by zero for the batch norms. Once you start training you will have some batch norm parameters and this nan goes away.

With regards to loss not decreasing try reducing your learning rate down.

Andrés Felipe Romero Vergara

unread,
Apr 6, 2017, 2:33:20 PM4/6/17
to Caffe Users
I have abandoned this problem since I could not figure it out. I still do not, yet I need to.

Please, look:


No matter what LR I put, the result is always the same.

After the first iteration, the BN parameters get this, just at the very first BN layer (others are the same):


I'm finetuning a network which NO BN, so the architecture initialize these weights at:
Auto Generated Inline Image 1
Auto Generated Inline Image 2
Auto Generated Inline Image 3

Andrés Felipe Romero Vergara

unread,
Apr 7, 2017, 4:58:10 PM4/7/17
to Caffe Users
Those who are facing the same problem, I could solved by merging this PR to my caffe folder.
https://github.com/BVLC/caffe/pull/5136
It turned out to be a numerical stability matter associated with BN.
Now it works beautifully.
Reply all
Reply to author
Forward
0 new messages