Hello folks,
I've trying with BatchNormalization by following several tutorial to train with it:
- Using three of these
param { lr_mult: 0 }.
- Using
use_global_stats: false in order to train.
- Using BN followed by a Scale layer with
bias_term: true
BUT, when I train with debug_info set to true, I got this at the very first lines.
I0126 08:55:26.362011 172388 caffe.cpp:248] Starting Optimization
I0126 08:55:26.362042 172388 solver.cpp:306] Solving
I0126 08:55:26.362046 172388 solver.cpp:307] Learning Rate Policy: step
I0126 08:55:26.370219 172388 net.cpp:593] [Forward] Layer data, top blob data data: 126.89
I0126 08:55:26.370272 172388 net.cpp:593] [Forward] Layer data, top blob label data: 11.3571
I0126 08:55:26.370312 172388 net.cpp:593] [Forward] Layer label_data_1_split, top blob label_data_1_split_0 data: 11.3571
I0126 08:55:26.370362 172388 net.cpp:593] [Forward] Layer label_data_1_split, top blob label_data_1_split_1 data: 11.3571
I0126 08:55:26.376642 172388 net.cpp:593] [Forward] Layer conv1_1, top blob conv1_1 data: 48.5156
I0126 08:55:26.376684 172388 net.cpp:605] [Forward] Layer conv1_1, param blob 0 data: 0.16864
I0126 08:55:26.376719 172388 net.cpp:605] [Forward] Layer conv1_1, param blob 1 data: 0.501886
I0126 08:55:26.392660 172388 net.cpp:593] [Forward] Layer bn_conv1_1, top blob conv1_1 data:
nanI0126 08:55:26.392700 172388 net.cpp:605] [Forward] Layer bn_conv1_1, param blob 0 data: 38.2106
I0126 08:55:26.392734 172388 net.cpp:605] [Forward] Layer bn_conv1_1, param blob 1 data:
nanI0126 08:55:26.392756 172388 net.cpp:605] [Forward] Layer bn_conv1_1, param blob 2 data: 1
I0126 08:55:26.402289 172388 net.cpp:593] [Forward] Layer scale_conv1_1, top blob conv1_1 data:
nanI0126 08:55:26.402333 172388 net.cpp:605] [Forward] Layer scale_conv1_1, param blob 0 data: 1
I0126 08:55:26.402390 172388 net.cpp:605] [Forward] Layer scale_conv1_1, param blob 1 data: 0.001
I0126 08:55:26.405901 172388 net.cpp:593] [Forward] Layer relu1_1, top blob conv1_1 data:
nanI0126 08:55:26.418678 172388 net.cpp:593] [Forward] Layer conv1_2, top blob conv1_2 data:
nanI0126 08:55:26.418715 172388 net.cpp:605] [Forward] Layer conv1_2, param blob 0 data: 0.0292296
I0126 08:55:26.418777 172388 net.cpp:605] [Forward] Layer conv1_2, param blob 1 data: 0.265496
I0126 08:55:26.434610 172388 net.cpp:593] [Forward] Layer bn_conv1_2, top blob conv1_2 data:
nanI0126 08:55:26.434650 172388 net.cpp:605] [Forward] Layer bn_conv1_2, param blob 0 data:
nanI0126 08:55:26.434682 172388 net.cpp:605] [Forward] Layer bn_conv1_2, param blob 1 data:
nanI0126 08:55:26.434690 172388 net.cpp:605] [Forward] Layer bn_conv1_2, param blob 2 data: 1
And everything is NaN from there, then LOSS never decrease. I understand the thhree BN blobs means: MEAN, VARIANCE and MOVING
AVERAGE FACTOR as https://github.com/BVLC/caffe/blob/master/include/caffe/layers/batch_norm_layer.hpp#L25-L27 suggests, so why could I get NaN variance? Any advice?
Thanks!