Get "loss=nan" info at the very beginning; even when setting base_lr=0

3,616 views
Skip to first unread message

Xiao Yang

unread,
Feb 22, 2015, 11:49:16 PM2/22/15
to caffe...@googlegroups.com
Any ideas about this problem? Thank you!

here is part of the output:

Learning Rate Policy: step
Iteration 0, Testing net (#0)
Test net output #0: accuracy = 0.44782
Test net output #1: loss = 0.720437 (* 1 = 0.720437 loss)
Iteration 0, loss = nan
Train net output #0: loss = nan (* 1 = nan loss)
Iteration 0, lr = 0
Iteration 20, loss = nan
Train net output #0: loss = nan (* 1 = nan loss)
Iteration 20, lr = 0

It seems that the testing part works well but the training part fails. However, the only difference between test and train is they use different dataset:
layers {
   name: "data"                                                                                                                  
   type: DATA                                                                                                                    
   top: "data"                                                                                                                   
   top: "label"                                                                                                                  
   data_param {                                                                                                                  
     source: "examples/train_lmdb"        //  val_lmdb when test phase                                                          
     backend: LMDB                                                                                                               
     batch_size: 50                                                                                                              
   }                                                                                                                             
   transform_param {                                                                                                             
     mean_file: "data/imagenet_mean.binaryproto"                                                                    
     mirror: false                                                                                                               
   }                                                                                                                             
   include: { phase: TRAIN }                                                                                                     

I also tried to exchange the dataset used in test and training phases, but no help. 

shai harel

unread,
Feb 23, 2015, 4:11:40 AM2/23/15
to caffe...@googlegroups.com
could you attach the entire train_test.prototxt?
how do you initialize the weights?
do you use RelU/ sigmoid or tanh for nonlinearity?
how deep is your network?

try shallow network first (3 layers max)
set the initialization to xavier+sigmoid or xavier+tanh 
read more about initializing RelU layers or try BatchNormalization (google it)

please let us know if any of this works out for you :] 

Xiao Yang

unread,
Feb 23, 2015, 6:47:28 PM2/23/15
to caffe...@googlegroups.com

Thank you for the detailed reply!


For initialization, I use gaussian for weights and set bias to a constant 0. For nonlinearity, I just use maxout. The entire prototxt file is attached below. Will that be a problem due to improper initialization? I will also have a look for other possible reasons


name: "tp"

layers {

  name: "data"

  type: DATA

  top: "data"

  top: "label"

  data_param {

    source: "examples/tp/tp_train_lmdb"

    backend: LMDB

    batch_size: 50

  }

  transform_param {

    mean_file: "data/tp/imagenet_mean.binaryproto"

    mirror: false

  }

  include: { phase: TRAIN }

}

layers {

  name: "data"

  type: DATA

  top: "data"

  top: "label"

  data_param {

    source: "examples/tp/tp_val_lmdb"

    backend: LMDB

    batch_size: 50

  }

  transform_param {

    mean_file: "data/tp/imagenet_mean.binaryproto"

    mirror: false

  }

  include: { phase: TEST }

}

layers {

  name: "conv1"

  type: CONVOLUTION

  bottom: "data"

  top: "conv1"

  blobs_lr: 1

  blobs_lr: 2

  weight_decay: 1

  weight_decay: 0

  convolution_param {

    num_output: 96

    kernel_size: 9

    stride: 1

    weight_filler {

      type: "gaussian"

      std: 0.01

    }

    bias_filler {

      type: "constant"

      value: 0

    }

  }

}

layers {

name: "slice1"

type: SLICE

bottom: "conv1"

top: "conv1_1"

top: "conv1_2"

slice_param {

slice_dim: 1

slice_point: 48

}

}

layers {

name: "maxout1"

type: ELTWISE

bottom: "conv1_1"

bottom: "conv1_2"

top: "maxout1"

eltwise_param {

operation: MAX

}

}

layers {

  name: "drop1"

  type: DROPOUT

  bottom: "maxout1"

  top: "maxout1"

  dropout_param {

    dropout_ratio: 1

  }

}

layers {

  name: "conv2"

  type: CONVOLUTION

  bottom: "maxout1"

  top: "conv2"

  blobs_lr: 1

  blobs_lr: 2

  weight_decay: 1

  weight_decay: 0

  convolution_param {

    num_output: 128

    kernel_size: 9

    stride: 1

    weight_filler {

      type: "gaussian"

      std: 0.01

    }

    bias_filler {

      type: "constant"

      value: 0

    }

  }

}

layers {

name: "slice2"

type: SLICE

bottom: "conv2"

top: "conv2_1"

top: "conv2_2"

slice_param {

slice_dim: 1

slice_point: 64

}

}

layers {

name: "maxout2"

type: ELTWISE

bottom: "conv2_1"

bottom: "conv2_2"

top: "maxout2"

eltwise_param {

operation: MAX

}

}

layers {

  name: "drop2"

  type: DROPOUT

  bottom: "maxout2"

  top: "maxout2"

  dropout_param {

    dropout_ratio: 0.5

  }

}

layers {

  name: "conv3"

  type: CONVOLUTION

  bottom: "maxout2"

  top: "conv3"

  blobs_lr: 1

  blobs_lr: 2

  weight_decay: 1

  weight_decay: 0

  convolution_param {

    num_output: 512

    kernel_size: 8

    stride: 1

    weight_filler {

      type: "gaussian"

      std: 0.01

    }

    bias_filler {

      type: "constant"

      value: 0

    }

  }

}

layers {

name: "slice3"

type: SLICE

bottom: "conv3"

top: "conv3_1"

top: "conv3_2"

top: "conv3_3"

top: "conv3_4"

slice_param {

slice_dim: 1

slice_point: 128

slice_point: 256

slice_point: 384

}

}

layers {

name: "maxout3_1"

type: ELTWISE

bottom: "conv3_1"

bottom: "conv3_2"

top: "maxout3_1"

eltwise_param {

operation: MAX

}

}

layers {

name: "maxout3_2"

type: ELTWISE

bottom: "conv3_3"

bottom: "conv3_4"

top: "maxout3_2"

eltwise_param {

operation: MAX

}

}

layers {

name: "maxout3"

type: ELTWISE

bottom: "maxout3_1"

bottom: "maxout3_2"

top: "maxout3"

eltwise_param {

operation: MAX

}

}

layers {

  name: "drop3"

  type: DROPOUT

  bottom: "maxout3"

  top: "maxout3"

  dropout_param {

    dropout_ratio: 0.5

  }

}

layers {

  name: "conv4"

  type: CONVOLUTION

  bottom: "maxout3"

  top: "conv4"

  blobs_lr: 1

  blobs_lr: 2

  weight_decay: 1

  weight_decay: 0

  convolution_param {

    num_output: 8

    kernel_size: 1

    stride: 1

    weight_filler {

      type: "gaussian"

      std: 0.01

    }

    bias_filler {

      type: "constant"

      value: 0

    }

  }

}

layers {

name: "slice4"

type: SLICE

bottom: "conv4"

top: "conv4_1"

top: "conv4_2"

top: "conv4_3"

top: "conv4_4"

slice_param {

slice_dim: 1

slice_point: 2

slice_point: 4

slice_point: 6

}

}

layers {

name: "maxout4_1"

type: ELTWISE

bottom: "conv4_1"

bottom: "conv4_2"

top: "maxout4_1"

eltwise_param {

operation: MAX

}

}

layers {

name: "maxout4_2"

type: ELTWISE

bottom: "conv4_3"

bottom: "conv4_4"

top: "maxout4_2"

eltwise_param {

operation: MAX

}

}

layers {

name: "maxout4"

type: ELTWISE

bottom: "maxout4_1"

bottom: "maxout4_2"

top: "maxout4"

eltwise_param {

operation: MAX

}

}

layers {

  name: "drop4"

  type: DROPOUT

  bottom: "maxout4"

  top: "maxout4"

  dropout_param {

    dropout_ratio: 0.5

  }

}

layers {

  name: "accuracy"

  type: ACCURACY

  bottom: "maxout4"

  bottom: "label"

  top: "accuracy"

  include: { phase: TEST }

}

layers {

  name: "loss"

  type: SOFTMAX_LOSS

  bottom: "maxout4"

  bottom: "label"

  top: "loss"

}


Best,

Xiao


在 2015年2月23日星期一 UTC-5上午4:11:40,shai harel写道:

Xiao Yang

unread,
Feb 23, 2015, 8:28:01 PM2/23/15
to caffe...@googlegroups.com
btw, I've tried to use xavier for weight init and/or set bias to 0.1, but still got (loss=nan) at iteration 0... I'm really confused since I set the base_lr to 0 and the test part seems working well at iteration 0


在 2015年2月23日星期一 UTC-5上午4:11:40,shai harel写道:
could you attach the entire train_test.prototxt?

shai harel

unread,
Feb 24, 2015, 3:22:40 AM2/24/15
to Xiao Yang, caffe...@googlegroups.com
what about shallow network?
do only 2 conv+maxout layers ans softmax
and no dropout (this is regularization to avoid overfitting, now you are underfitting)
and output the data from each layer 
see if the NAN occurs only in the output layer (loss/ accuracy) or you have NAN in the intermediate layers as well 

--
You received this message because you are subscribed to a topic in the Google Groups "Caffe Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/caffe-users/LVfttqqMN1M/unsubscribe.
To unsubscribe from this group and all its topics, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/6e93b7fd-83c0-42c7-9120-2496a0375c90%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Xiao Yang

unread,
Feb 24, 2015, 11:37:56 AM2/24/15
to caffe...@googlegroups.com, xyan...@gmail.com
Thank a lot for the advices! I removed all the dropout layers and my net works well now:) 

BTW, how to output the data from each layer during training? In the tutorial they only show how to output each layer after you've already got the model I think. Correct me if i missed anything

For me, as a beginner, there are lots of tricks when playing with deep neural net. I am reading some books and papers about dnn now. Hope that we could have more theoretical stuff about dnn!

Best,
Xiao

在 2015年2月24日星期二 UTC-5上午3:22:40,shai harel写道:

shai harel

unread,
Feb 24, 2015, 12:35:09 PM2/24/15
to Xiao Yang, caffe...@googlegroups.com
#this line extract conv1 and conv2 data
res = net.forward(blobs = ['conv1','conv2'], data = im_batch)

about relevant papers i can recommend you on 3 
http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf
http://arxiv.org/pdf/1502.01852.pdf
http://arxiv.org/pdf/1502.03167v2.pdf

And ofcorse Hugo Larochelleyoutube channel
https://www.youtube.com/channel/UCiDouKcxRmAdc5OeZdiRwAg

shai harel

unread,
Feb 24, 2015, 12:36:58 PM2/24/15
to Xiao Yang, caffe...@googlegroups.com
net.forward get the output of the network with the current weights
so if you initialize and call net.forward,
you will get the output of an untrained network

Xiao Yang

unread,
Feb 24, 2015, 1:33:17 PM2/24/15
to caffe...@googlegroups.com, xyan...@gmail.com
Thank again for the recommended papers! I definitely should read more papers

Besides, actually my question is that how to initialize a net instance without providing *.caffemodel file, because after that I can call net.forward function. Or maybe I can just take a snapshot after a few iterations and then use the saved *.caffemodel?

Best,
Xiao

在 2015年2月24日星期二 UTC-5下午12:36:58,shai harel写道:

Evan Shelhamer

unread,
Feb 24, 2015, 1:35:29 PM2/24/15
to Xiao Yang, caffe...@googlegroups.com
Besides, actually my question is that how to initialize a net instance without providing *.caffemodel file, because after that I can call net.forward function. Or maybe I can just take a snapshot after a few iterations and then use the saved *.caffemodel?

`net = caffe.Net('net.prototxt')` will instantiate the net (and run any defined weight fillers) without pre-trained weights. All the `.caffemodel` snapshots work too for the `net = caffe.Net('net.prototxt', 'net.caffemodel')` constructor.

Evan Shelhamer

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.

To post to this group, send email to caffe...@googlegroups.com.

Xiao Yang

unread,
Feb 24, 2015, 1:43:43 PM2/24/15
to caffe...@googlegroups.com, xyan...@gmail.com
Thanks! That helps a lot, since right now I am trying to have a look of the source code

Besides, could you tell me which file contains the code about constructor? I didn't find it on API documentation on Caffe website

Best,
Xiao

在 2015年2月24日星期二 UTC-5下午1:35:29,Evan Shelhamer写道:
Reply all
Reply to author
Forward
0 new messages