Get "loss=nan" info at the very beginning; even when setting base

Xiao Yang

unread,

Feb 22, 2015, 11:49:16 PM2/22/15

to caffe...@googlegroups.com

Any ideas about this problem? Thank you!

here is part of the output:

Learning Rate Policy: step
Iteration 0, Testing net (#0)
Test net output #0: accuracy = 0.44782
Test net output #1: loss = 0.720437 (* 1 = 0.720437 loss)
Iteration 0, loss = nan
Train net output #0: loss = nan (* 1 = nan loss)
Iteration 0, lr = 0
Iteration 20, loss = nan
Train net output #0: loss = nan (* 1 = nan loss)
Iteration 20, lr = 0

It seems that the testing part works well but the training part fails. However, the only difference between test and train is they use different dataset:

layers {
name: "data"
type: DATA
top: "data"
top: "label"
data_param {
source: "examples/train_lmdb" // val_lmdb when test phase
backend: LMDB
batch_size: 50
}
transform_param {
mean_file: "data/imagenet_mean.binaryproto"
mirror: false
}
include: { phase: TRAIN }
}

I also tried to exchange the dataset used in test and training phases, but no help.

shai harel

unread,

Feb 23, 2015, 4:11:40 AM2/23/15

to caffe...@googlegroups.com

could you attach the entire train_test.prototxt?

how do you initialize the weights?

do you use RelU/ sigmoid or tanh for nonlinearity?

how deep is your network?

try shallow network first (3 layers max)

set the initialization to xavier+sigmoid or xavier+tanh

read more about initializing RelU layers or try BatchNormalization (google it)

please let us know if any of this works out for you :]

Xiao Yang

unread,

Feb 23, 2015, 6:47:28 PM2/23/15

to caffe...@googlegroups.com

Thank you for the detailed reply!

For initialization, I use gaussian for weights and set bias to a constant 0. For nonlinearity, I just use maxout. The entire prototxt file is attached below. Will that be a problem due to improper initialization? I will also have a look for other possible reasons

name: "tp"

layers {
name: "data"
type: DATA
top: "data"
top: "label"
data_param {

source: "examples/tp/tp_train_lmdb"

backend: LMDB
batch_size: 50
}
transform_param {

mean_file: "data/tp/imagenet_mean.binaryproto"

mirror: false
}
include: { phase: TRAIN }
}

layers {

name: "data"
type: DATA
top: "data"
top: "label"
data_param {

source: "examples/tp/tp_val_lmdb"

backend: LMDB
batch_size: 50
}
transform_param {

mean_file: "data/tp/imagenet_mean.binaryproto"
mirror: false
}
include: { phase: TEST }
}
layers {
name: "conv1"
type: CONVOLUTION
bottom: "data"
top: "conv1"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 96
kernel_size: 9
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "slice1"
type: SLICE
bottom: "conv1"
top: "conv1_1"
top: "conv1_2"
slice_param {
slice_dim: 1
slice_point: 48
}
}
layers {
name: "maxout1"
type: ELTWISE
bottom: "conv1_1"
bottom: "conv1_2"
top: "maxout1"
eltwise_param {
operation: MAX
}
}
layers {
name: "drop1"
type: DROPOUT
bottom: "maxout1"
top: "maxout1"
dropout_param {
dropout_ratio: 1
}
}
layers {
name: "conv2"
type: CONVOLUTION
bottom: "maxout1"
top: "conv2"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 128
kernel_size: 9
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "slice2"
type: SLICE
bottom: "conv2"
top: "conv2_1"
top: "conv2_2"
slice_param {
slice_dim: 1
slice_point: 64
}
}
layers {
name: "maxout2"
type: ELTWISE
bottom: "conv2_1"
bottom: "conv2_2"
top: "maxout2"
eltwise_param {
operation: MAX
}
}
layers {
name: "drop2"
type: DROPOUT
bottom: "maxout2"
top: "maxout2"
dropout_param {
dropout_ratio: 0.5
}
}
layers {
name: "conv3"
type: CONVOLUTION
bottom: "maxout2"
top: "conv3"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 512
kernel_size: 8
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "slice3"
type: SLICE
bottom: "conv3"
top: "conv3_1"
top: "conv3_2"
top: "conv3_3"
top: "conv3_4"
slice_param {
slice_dim: 1
slice_point: 128
slice_point: 256
slice_point: 384
}
}
layers {
name: "maxout3_1"
type: ELTWISE
bottom: "conv3_1"
bottom: "conv3_2"
top: "maxout3_1"
eltwise_param {
operation: MAX
}
}
layers {
name: "maxout3_2"
type: ELTWISE
bottom: "conv3_3"
bottom: "conv3_4"
top: "maxout3_2"
eltwise_param {
operation: MAX
}
}
layers {
name: "maxout3"
type: ELTWISE
bottom: "maxout3_1"
bottom: "maxout3_2"
top: "maxout3"
eltwise_param {
operation: MAX
}
}
layers {
name: "drop3"
type: DROPOUT
bottom: "maxout3"
top: "maxout3"
dropout_param {
dropout_ratio: 0.5
}
}
layers {
name: "conv4"
type: CONVOLUTION
bottom: "maxout3"
top: "conv4"
blobs_lr: 1
blobs_lr: 2
weight_decay: 1
weight_decay: 0
convolution_param {
num_output: 8
kernel_size: 1
stride: 1
weight_filler {
type: "gaussian"
std: 0.01
}
bias_filler {
type: "constant"
value: 0
}
}
}
layers {
name: "slice4"
type: SLICE
bottom: "conv4"
top: "conv4_1"
top: "conv4_2"
top: "conv4_3"
top: "conv4_4"
slice_param {
slice_dim: 1
slice_point: 2
slice_point: 4
slice_point: 6
}
}
layers {
name: "maxout4_1"
type: ELTWISE
bottom: "conv4_1"
bottom: "conv4_2"
top: "maxout4_1"
eltwise_param {
operation: MAX
}
}
layers {
name: "maxout4_2"
type: ELTWISE
bottom: "conv4_3"
bottom: "conv4_4"
top: "maxout4_2"
eltwise_param {
operation: MAX
}
}
layers {
name: "maxout4"
type: ELTWISE
bottom: "maxout4_1"
bottom: "maxout4_2"
top: "maxout4"
eltwise_param {
operation: MAX
}
}
layers {
name: "drop4"
type: DROPOUT
bottom: "maxout4"
top: "maxout4"
dropout_param {
dropout_ratio: 0.5
}
}
layers {
name: "accuracy"
type: ACCURACY
bottom: "maxout4"
bottom: "label"
top: "accuracy"
include: { phase: TEST }
}
layers {
name: "loss"
type: SOFTMAX_LOSS
bottom: "maxout4"
bottom: "label"
top: "loss"
}

Best,

Xiao

在 2015年2月23日星期一 UTC-5上午4:11:40，shai harel写道：

Xiao Yang

unread,

Feb 23, 2015, 8:28:01 PM2/23/15

to caffe...@googlegroups.com

btw, I've tried to use xavier for weight init and/or set bias to 0.1, but still got (loss=nan) at iteration 0... I'm really confused since I set the base_lr to 0 and the test part seems working well at iteration 0

在 2015年2月23日星期一 UTC-5上午4:11:40，shai harel写道：

could you attach the entire train_test.prototxt?

shai harel

unread,

Feb 24, 2015, 3:22:40 AM2/24/15

to Xiao Yang, caffe...@googlegroups.com

what about shallow network?

do only 2 conv+maxout layers ans softmax

and no dropout (this is regularization to avoid overfitting, now you are underfitting)

and output the data from each layer

see if the NAN occurs only in the output layer (loss/ accuracy) or you have NAN in the intermediate layers as well

--
You received this message because you are subscribed to a topic in the Google Groups "Caffe Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/caffe-users/LVfttqqMN1M/unsubscribe.
To unsubscribe from this group and all its topics, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/6e93b7fd-83c0-42c7-9120-2496a0375c90%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Xiao Yang

unread,

Feb 24, 2015, 11:37:56 AM2/24/15

to caffe...@googlegroups.com, xyan...@gmail.com

Thank a lot for the advices! I removed all the dropout layers and my net works well now:)

BTW, how to output the data from each layer during training? In the tutorial they only show how to output each layer after you've already got the model I think. Correct me if i missed anything

For me, as a beginner, there are lots of tricks when playing with deep neural net. I am reading some books and papers about dnn now. Hope that we could have more theoretical stuff about dnn!

Best,

Xiao

在 2015年2月24日星期二 UTC-5上午3:22:40，shai harel写道：

shai harel

unread,

Feb 24, 2015, 12:35:09 PM2/24/15

to Xiao Yang, caffe...@googlegroups.com

#this line extract conv1 and conv2 data

res = net.forward(blobs = ['conv1','conv2'], data = im_batch)

about relevant papers i can recommend you on 3

http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf

http://arxiv.org/pdf/1502.01852.pdf

http://arxiv.org/pdf/1502.03167v2.pdf

And ofcorse Hugo Larochelleyoutube channel

https://www.youtube.com/channel/UCiDouKcxRmAdc5OeZdiRwAg

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/67c8e232-b676-461b-96ea-2250c8063f1e%40googlegroups.com.

shai harel

unread,

Feb 24, 2015, 12:36:58 PM2/24/15

to Xiao Yang, caffe...@googlegroups.com

net.forward get the output of the network with the current weights

so if you initialize and call net.forward,

you will get the output of an untrained network

Xiao Yang

unread,

Feb 24, 2015, 1:33:17 PM2/24/15

to caffe...@googlegroups.com, xyan...@gmail.com

Thank again for the recommended papers! I definitely should read more papers

Besides, actually my question is that how to initialize a net instance without providing *.caffemodel file, because after that I can call net.forward function. Or maybe I can just take a snapshot after a few iterations and then use the saved *.caffemodel?

Best,

Xiao

在 2015年2月24日星期二 UTC-5下午12:36:58，shai harel写道：

Evan Shelhamer

unread,

Feb 24, 2015, 1:35:29 PM2/24/15

to Xiao Yang, caffe...@googlegroups.com

Besides, actually my question is that how to initialize a net instance without providing *.caffemodel file, because after that I can call net.forward function. Or maybe I can just take a snapshot after a few iterations and then use the saved *.caffemodel?

`net = caffe.Net('net.prototxt')` will instantiate the net (and run any defined weight fillers) without pre-trained weights. All the `.caffemodel` snapshots work too for the `net = caffe.Net('net.prototxt', 'net.caffemodel')` constructor.

Evan Shelhamer

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.

To post to this group, send email to caffe...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/abb36b65-55e3-4329-ab8d-3b2282ad2202%40googlegroups.com.

Xiao Yang

unread,

Feb 24, 2015, 1:43:43 PM2/24/15

to caffe...@googlegroups.com, xyan...@gmail.com

Thanks! That helps a lot, since right now I am trying to have a look of the source code

Besides, could you tell me which file contains the code about constructor? I didn't find it on API documentation on Caffe website

Best,

Xiao

在 2015年2月24日星期二 UTC-5下午1:35:29，Evan Shelhamer写道：

Reply all

Reply to author

Forward

Get "loss=nan" info at the very beginning; even when setting base_lr=0

Xiao Yang

shai harel

Xiao Yang

Xiao Yang

shai harel

Xiao Yang

shai harel

shai harel

Xiao Yang

Evan Shelhamer

Xiao Yang