get NaN from the first layer of network

65 views
Skip to first unread message

解易

unread,
Jun 17, 2017, 5:51:17 AM6/17/17
to Caffe Users


After one iteration, the output blob of the first layer (encode_conv1) become NaN. I switch on the debug and get the debug log as follows.

I0617 15:41:12.879797 25266 net.cpp:619]     [Backward] Layer encode_conv2, bottom blob encode_relu1 diff: 0.000423132
I0617 15:41:12.879850 25266 net.cpp:630]     [Backward] Layer encode_conv2, param blob 0 diff: 0.153993
I0617 15:41:12.879885 25266 net.cpp:630]     [Backward] Layer encode_conv2, param blob 1 diff: 1.27533
I0617 15:41:12.882629 25266 net.cpp:619]     [Backward] Layer encode_relu1, bottom blob encode_conv1 diff: 0.000211059
I0617 15:41:12.890466 25266 net.cpp:630]     [Backward] Layer encode_conv1, param blob 0 diff: 0.349801
I0617 15:41:12.890503 25266 net.cpp:630]     [Backward] Layer encode_conv1, param blob 1 diff: 0.918177
E0617 15:41:12.918594 25266 net.cpp:719]     [Backward] All net params (data, diff): L1 norm = (4.42585e+06, 2.0093e+07); L2 norm = (386.806, 13006.7)
I0617 15:41:12.933250 25266 solver.cpp:218] Iteration 0 (0 iter/s, 40.5004s/200 iters), loss = 3154.03
I0617 15:41:12.933265 25266 solver.cpp:237]     Train net output #0: klloss = 875.227
I0617 15:41:12.933274 25266 solver.cpp:237]     Train net output #1: likelihoodloss = 3154.03 (* 1 = 3154.03 loss)
I0617 15:41:12.933281 25266 sgd_solver.cpp:105] Iteration 0, lr = 0.001
I0617 15:50:01.457751 25266 net.cpp:591]     [Forward] Layer img, top blob img data: 0.30248
I0617 15:50:01.458441 25266 net.cpp:591]     [Forward] Layer img_img_0_split, top blob img_img_0_split_0 data: 0.30248
I0617 15:50:01.458510 25266 net.cpp:591]     [Forward] Layer img_img_0_split, top blob img_img_0_split_1 data: 0.30248
I0617 15:50:01.458559 25266 net.cpp:591]     [Forward] Layer annotation, top blob annotation data: 0.00763359
I0617 15:50:01.458681 25266 net.cpp:591]     [Forward] Layer datasplit, top blob id data: 0.00401606
I0617 15:50:01.458716 25266 net.cpp:591]     [Forward] Layer datasplit, top blob camera data: 0.0769231
I0617 15:50:01.464388 25266 net.cpp:591]     [Forward] Layer encode_conv1, top blob encode_conv1 data: nan
I0617 15:50:01.464462 25266 net.cpp:603]     [Forward] Layer encode_conv1, param blob encode_conv1_w data: nan
I0617 15:50:01.464519 25266 net.cpp:603]     [Forward] Layer encode_conv1, param blob encode_conv1_b data: nan
I0617 15:50:01.465979 25266 net.cpp:591]     [Forward] Layer encode_relu1, top blob encode_relu1 data: nan
I0617 15:50:01.520959 25266 net.cpp:591]     [Forward] Layer encode_conv2, top blob encode_conv2 data: nan
I0617 15:50:01.520999 25266 net.cpp:603]     [Forward] Layer encode_conv2, param blob encode_conv2_w data: nan

The back propagation seems OK. I don't know which part of the network screw up the parameters as I don't see any NaN appearing in the back propagation phase. What reasons for the NaN do you suggest according to the log? Thanks!

Cham Su

unread,
Jun 18, 2017, 10:55:23 PM6/18/17
to Caffe Users
In my perspective, the learning rate is too large. You can try a smaller one.

在 2017年6月17日星期六 UTC+8下午5:51:17,解易写道:
Reply all
Reply to author
Forward
0 new messages