how to train a caffe model with 3 layers?

elahe

unread,

Jan 12, 2016, 2:51:04 PM1/12/16

to Caffe Users

hi

my solver

net: "models/testmodel/train_val.prototxt"
test_iter: 100
test_interval: 100
base_lr:0.0001
lr_policy: "step"
gamma: 0.1
stepsize: 5000
display: 100
max_iter: 30000
momentum: 0.2
weight_decay: 0.0005
snapshot: 1000
snapshot_prefix: "models/testmodel/caffe5_train"
solver_mode: CPU

my train_val.prototxt

name: "CaffeNet"
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    crop_size: 227
    mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
  }
# mean pixel / channel-wise mean instead of mean image
#  transform_param {
#    crop_size: 227
#    mean_value: 104
#    mean_value: 117
#    mean_value: 123
#    mirror: true
#  }
  data_param {
    source: "/home/elahe/Downloads/caffe-master/examples/imagenet/5train_lmdb"
    batch_size: 256
    backend: LMDB
  }
}
layer {
  name: "data"
  type: "Data"
  top: "data"
  top: "label"
  include {
    phase: TEST
  }
  transform_param {
    mirror: false
    crop_size: 227
    mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
  }
# mean pixel / channel-wise mean instead of mean image
#  transform_param {
#    crop_size: 227
#    mean_value: 104
#    mean_value: 117
#    mean_value: 123
#    mirror: false
#  }
  data_param {
    source: "/home/elahe/Downloads/caffe-master/examples/imagenet/5val_lmdb"
    batch_size: 96
    backend: LMDB
  }
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "pool1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "fc7_t"
  type: "InnerProduct"
  bottom: "norm1"
  top: "fc7_t"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 1
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7_t"
  top: "fc7_t"
}
layer {
  name: "fc8_t"
  type: "InnerProduct"
  bottom: "fc7_t"
  top: "fc8_t"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 1
  }
  inner_product_param {
    num_output: 5
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "fc8_t"
  bottom: "label"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "fc8_t"
  bottom: "label"
  top: "loss"
}

Output:

I0112 20:19:36.843014 3667 solver.cpp:408]     Test net output #0: accuracy = 0.166667
I0112 20:19:36.843159 3667 solver.cpp:408]     Test net output #1: loss = 23.2279 (* 1 = 23.2279 loss)
I0112 20:19:44.464568 3667 solver.cpp:236] Iteration 0, loss = 19.8601
I0112 20:19:44.464617 3667 solver.cpp:252]     Train net output #0: loss = 19.8601 (* 1 = 19.8601 loss)
I0112 20:19:44.464637 3667 sgd_solver.cpp:106] Iteration 0, lr = 0.0001
I0112 20:38:12.537981 3667 solver.cpp:340] Iteration 100, Testing net (#0)
I0112 20:41:32.721359 3667 solver.cpp:408]     Test net output #0: accuracy = 0.722083
I0112 20:41:32.721444 3667 solver.cpp:408]     Test net output #1: loss = 7.48023 (* 1 = 7.48023 loss)
I0112 20:41:42.956892 3667 solver.cpp:236] Iteration 100, loss = 12.3962
I0112 20:41:42.956938 3667 solver.cpp:252]     Train net output #0: loss = 12.3962 (* 1 = 12.3962 loss)
I0112 20:41:42.956950 3667 sgd_solver.cpp:106] Iteration 100, lr = 0.0001
I0112 21:02:22.146353 3667 solver.cpp:340] Iteration 200, Testing net (#0)
I0112 21:05:43.417537 3667 solver.cpp:408]     Test net output #0: accuracy = 0.388854
I0112 21:05:43.417616 3667 solver.cpp:408]     Test net output #1: loss = 50.6208 (* 1 = 50.6208 loss)
I0112 21:05:52.962345 3667 solver.cpp:236] Iteration 200, loss = 58.1768
I0112 21:05:52.962393 3667 solver.cpp:252]     Train net output #0: loss = 58.1768 (* 1 = 58.1768 loss)
I0112 21:05:52.962404 3667 sgd_solver.cpp:106] Iteration 200, lr = 0.0001
I0112 21:25:20.889557 3667 solver.cpp:340] Iteration 300, Testing net (#0)
I0112 21:28:47.969341 3667 solver.cpp:408]     Test net output #0: accuracy = 0.444479
I0112 21:28:47.969409 3667 solver.cpp:408]     Test net output #1: loss = 45.0209 (* 1 = 45.0209 loss)
I0112 21:28:58.119763 3667 solver.cpp:236] Iteration 300, loss = 48.0553
I0112 21:28:58.119810 3667 solver.cpp:252]     Train net output #0: loss = 48.0553 (* 1 = 48.0553 loss)
I0112 21:28:58.119823 3667 sgd_solver.cpp:106] Iteration 300, lr = 0.0001
I0112 21:48:42.978188 3667 solver.cpp:340] Iteration 400, Testing net (#0)
I0112 21:52:09.260004 3667 solver.cpp:408]     Test net output #0: accuracy = 0.277708
I0112 21:52:09.260082 3667 solver.cpp:408]     Test net output #1: loss = 60.1215 (* 1 = 60.1215 loss)
I0112 21:52:19.721530 3667 solver.cpp:236] Iteration 400, loss = 59.2108
I0112 21:52:19.721578 3667 solver.cpp:252]     Train net output #0: loss = 59.2108 (* 1 = 59.2108 loss)
I0112 21:52:19.721590 3667 sgd_solver.cpp:106] Iteration 400, lr = 0.0001
I0112 22:10:31.664820 3667 solver.cpp:340] Iteration 500, Testing net (#0)
I0112 22:13:14.781256 3667 solver.cpp:408]     Test net output #0: accuracy = 0.277917
I0112 22:13:14.781415 3667 solver.cpp:408]     Test net output #1: loss = 63.0641 (* 1 = 63.0641 loss)
I0112 22:13:22.490547 3667 solver.cpp:236] Iteration 500, loss = 66.8671
I0112 22:13:22.490595 3667 solver.cpp:252]     Train net output #0: loss = 66.8671 (* 1 = 66.8671 loss)
I0112 22:13:22.490607 3667 sgd_solver.cpp:106] Iteration 500, lr = 0.0001
I0112 22:28:21.003844 3667 solver.cpp:340] Iteration 600, Testing net (#0)
I0112 22:30:50.384950 3667 solver.cpp:408]     Test net output #0: accuracy = 0.500104
I0112 22:30:50.385020 3667 solver.cpp:408]     Test net output #1: loss = 43.6592 (* 1 = 43.6592 loss)
I0112 22:30:57.693332 3667 solver.cpp:236] Iteration 600, loss = 68.9141
I0112 22:30:57.693379 3667 solver.cpp:252]     Train net output #0: loss = 68.9141 (* 1 = 68.9141 loss)
I0112 22:30:57.693392 3667 sgd_solver.cpp:106] Iteration 600, lr = 0.0001
I0112 22:45:46.741204 3667 solver.cpp:340] Iteration 700, Testing net (#0)
I0112 22:48:22.395297 3667 solver.cpp:408]     Test net output #0: accuracy = 0.222292
I0112 22:48:22.395411 3667 solver.cpp:408]     Test net output #1: loss = 67.9223 (* 1 = 67.9223 loss)
I0112 22:48:29.634990 3667 solver.cpp:236] Iteration 700, loss = 69.2606
I0112 22:48:29.635037 3667 solver.cpp:252]     Train net output #0: loss = 69.2606 (* 1 = 69.2606 loss)
I0112 22:48:29.635051 3667 sgd_solver.cpp:106] Iteration 700, lr = 0.0001
I0112 23:03:28.678174 3667 solver.cpp:340] Iteration 800, Testing net (#0)
I0112 23:05:56.843686 3667 solver.cpp:408]     Test net output #0: accuracy = 0.166667
I0112 23:05:56.843754 3667 solver.cpp:408]     Test net output #1: loss = 72.7803 (* 1 = 72.7803 loss)
I0112 23:06:04.018985 3667 solver.cpp:236] Iteration 800, loss = 73.0079
I0112 23:06:04.019027 3667 solver.cpp:252]     Train net output #0: loss = 73.0079 (* 1 = 73.0079 loss)

i test any lr and momentum, then delete the dropout layer ,in all of them
after passed iterations loss=NaN...
why

:((

S.Majid Azimi

unread,

Jan 13, 2016, 9:19:14 AM1/13/16

to elahe, Caffe Users

hi

what do you mean by deleting the dropout layer? there is no dropout layer in AlexNet.

Best,

Majid Azimi

Max-Planck-Institut für Informatik

Germany

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/9473594c-ad7b-4af0-94e3-a2306c977ab1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

elahe

unread,

Jan 13, 2016, 10:01:05 AM1/13/16

to Caffe Users, elahe.sol...@gmail.com

i think

layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}

is dropout
yes?
even with any pretrained weights the loss=-NaN has happened!

S.Majid Azimi

unread,

Jan 13, 2016, 11:19:32 AM1/13/16

to elahe, Caffe Users

yes that's right and not only in 7th layer but is also in 6th layer.

did you check the labels? or check again solver file. not sure what caused this problem. If I found out I would let you know

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/a89407a4-11df-4393-985c-1520b917820d%40googlegroups.com.

Message has been deleted

elahe

unread,

Jan 13, 2016, 1:48:37 PM1/13/16

to Caffe Users, elahe.sol...@gmail.com

thanks for your answer very well
my data set is the imagenet and i use same labels and same data set for train and validation

can the problem related to version of caffe or ...?
the gole of this work is create a network with 3 layers with bellow config:

layer 1: conv1 has 50 filters. (i want to select 50 filters of conv1 layers of AlexNet for it)

layer 2: fully connected with 400 nerouns

layer 3: fully connected with 200 nerouns for classification

i create a model by attached code that you can find it in this link :

http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/net_surgery.ipynb

but with train the network accuracy is fix or low and loss are almost fix or NAN

then I'm trying to create the 3 layer network above and anything don't change....

Main.py

elahe

unread,

Jan 16, 2016, 12:58:11 AM1/16/16

to Caffe Users

any any idea?
please help me
I welcome any idea...

S.Majid Azimi

unread,

Jan 16, 2016, 3:39:13 AM1/16/16

to elahe, Caffe Users

attach convmodel/deploy.prototxt

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/b10ab810-49b2-4621-9d33-977c2bc9b644%40googlegroups.com.

elahe

unread,

Jan 16, 2016, 6:50:49 AM1/16/16

to Caffe Users, elahe.sol...@gmail.com

name: "CaffeNet"
input: "data"
input_shape {
  dim: 10
  dim: 3
  dim: 227
  dim: 227


}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "data"
  top: "conv1"


  convolution_param {
    num_output: 50
    kernel_size: 11
    stride: 4


  inner_product_param {
    num_output: 400


  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7_t"
  top: "fc7_t"
}
layer {


  name: "drop7"
  type: "Dropout"
  bottom: "fc7_t"
  top: "fc7_t"
  dropout_param {
    dropout_ratio: 0.5


  }
}
layer {
  name: "fc8_t"
  type: "InnerProduct"
  bottom: "fc7_t"
  top: "fc8_t"


  inner_product_param {
    num_output: 1000
  }
}
layer {
  name: "prob"
  type: "Softmax"
  bottom: "fc8_t"
  top: "prob"
}

elahe

unread,

Jan 17, 2016, 7:53:06 AM1/17/16

to Caffe Users, elahe.sol...@gmail.com

any any idea?
Any ideas are appreciated...

Reply all

Reply to author

Forward