Training error and validation error are the same - zero accuracy during training

68 views

batch_sizecaffecaffenetclassificationhdf5layerlearning_rateloss

Skip to first unread message

Sharp Weapon

unread,

Nov 3, 2016, 1:18:50 PM11/3/16

to Caffe Users

I am basically using caffeNet to do some sort of image classification, with 256 classes. I am feeding the network a list of HDF5 files. But my network doesn't seem to learn, I m having accuracy 0 all the time and the training error and validation error are the same. I would think if the dataset was not enough to learn, training error should be very small and the validation error should be large right? Also tried it with different batch sizes and learning rates, with no success. Here is the solver.prototxt and the network prototxt. Any suggestion is appreciate.

I1103 12:01:41.822055 108615 solver.cpp:337] Iteration 0, Testing net (#0)
I1103 12:01:41.849742 108615 solver.cpp:404]     Test net output #0: accuracy = 0
I1103 12:01:41.849761 108615 solver.cpp:404]     Test net output #1: loss = 6.02617 (* 1 = 6.02617 loss)
I1103 12:01:41.869380 108615 solver.cpp:228] Iteration 0, loss = 6.05644
I1103 12:01:41.869398 108615 solver.cpp:244]     Train net output #0: loss = 6.05644 (* 1 = 6.05644 loss)
I1103 12:01:41.869413 108615 sgd_solver.cpp:106] Iteration 0, lr = 0.1
I1103 12:01:47.624855 108615 solver.cpp:228] Iteration 500, loss = 87.3365
I1103 12:01:47.624876 108615 solver.cpp:244]     Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:01:47.624882 108615 sgd_solver.cpp:106] Iteration 500, lr = 0.1
I1103 12:01:53.290213 108615 solver.cpp:337] Iteration 1000, Testing net (#0)
I1103 12:01:53.299310 108615 solver.cpp:404]     Test net output #0: accuracy = 0
I1103 12:01:53.299327 108615 solver.cpp:404]     Test net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:01:53.314584 108615 solver.cpp:228] Iteration 1000, loss = 87.3365
I1103 12:01:53.314615 108615 solver.cpp:244]     Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:01:53.314621 108615 sgd_solver.cpp:106] Iteration 1000, lr = 0.01
I1103 12:01:58.991268 108615 solver.cpp:228] Iteration 1500, loss = 87.3365
I1103 12:01:58.991315 108615 solver.cpp:244]     Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:01:58.991322 108615 sgd_solver.cpp:106] Iteration 1500, lr = 0.01
I1103 12:02:04.664419 108615 solver.cpp:337] Iteration 2000, Testing net (#0)
I1103 12:02:04.673518 108615 solver.cpp:404]     Test net output #0: accuracy = 0
I1103 12:02:04.673537 108615 solver.cpp:404]     Test net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:02:04.690434 108615 solver.cpp:228] Iteration 2000, loss = 87.3365
I1103 12:02:04.690469 108615 solver.cpp:244]     Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:02:04.690481 108615 sgd_solver.cpp:106] Iteration 2000, lr = 0.001
I1103 12:02:10.373788 108615 solver.cpp:228] Iteration 2500, loss = 87.3365
I1103 12:02:10.373852 108615 solver.cpp:244]     Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:02:10.373859 108615 sgd_solver.cpp:106] Iteration 2500, lr = 0.001
I1103 12:02:16.047372 108615 solver.cpp:337] Iteration 3000, Testing net (#0)
I1103 12:02:16.056390 108615 solver.cpp:404]     Test net output #0: accuracy = 0
I1103 12:02:16.056407 108615 solver.cpp:404]     Test net output #1: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:02:16.070235 108615 solver.cpp:228] Iteration 3000, loss = 87.3365
I1103 12:02:16.070261 108615 solver.cpp:244]     Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:02:16.070267 108615 sgd_solver.cpp:106] Iteration 3000, lr = 0.0001
I1103 12:02:21.755348 108615 solver.cpp:228] Iteration 3500, loss = 87.3365
I1103 12:02:21.755369 108615 solver.cpp:244]     Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)
I1103 12:02:21.755375 108615 sgd_solver.cpp:106] Iteration 3500, lr = 0.0001

----------------------------------

net: "/A/B/train.prototxt"
test_iter: 10
test_interval: 1000
base_lr: 0.1
lr_policy: "step"
gamma: 0.1
stepsize: 1000
display: 10
max_iter: 4000
momentum: 0.9
weight_decay: 0.0005
snapshot: 1000
snapshot_prefix: "/A/B/model_"
solver_mode: GPU

--------------------------------------------------

layer {
    name: "data"
    type: "HDF5Data"
    top: "X"
    top: "y"
    hdf5_data_param{
	source:"/Path/to/trainh5list.txt"
	batch_size: 1
    }
    include{phase: TRAIN}
}
layer {
    name: "data"
    type: "HDF5Data"
    top: "X"
    top: "y"
    hdf5_data_param{
	source:"/Path/to/testh5list.txt"
	batch_size: 1
    }
    include{phase: TEST}
}
layer {
  name: "conv1"
  type: "Convolution"
  bottom: "X"
  top: "conv1"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu1"
  type: "ReLU"
  bottom: "conv1"
  top: "conv1"
}
layer {
  name: "pool1"
  type: "Pooling"
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm1"
  type: "LRN"
  bottom: "pool1"
  top: "norm1"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv2"
  type: "Convolution"
  bottom: "norm1"
  top: "conv2"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 2
    kernel_size: 5
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 1
    }
  }
}
layer {
  name: "relu2"
  type: "ReLU"
  bottom: "conv2"
  top: "conv2"
}
layer {
  name: "pool2"
  type: "Pooling"
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "norm2"
  type: "LRN"
  bottom: "pool2"
  top: "norm2"
  lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
  }
}
layer {
  name: "conv3"
  type: "Convolution"
  bottom: "norm2"
  top: "conv3"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "relu3"
  type: "ReLU"
  bottom: "conv3"
  top: "conv3"
}
layer {
  name: "conv4"
  type: "Convolution"
  bottom: "conv3"
  top: "conv4"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 1
    }
  }
}
layer {
  name: "relu4"
  type: "ReLU"
  bottom: "conv4"
  top: "conv4"
}
layer {
  name: "conv5"
  type: "Convolution"
  bottom: "conv4"
  top: "conv5"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 1
    }
  }
}
layer {
  name: "relu5"
  type: "ReLU"
  bottom: "conv5"
  top: "conv5"
}
layer {
  name: "pool5"
  type: "Pooling"
  bottom: "conv5"
  top: "pool5"
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layer {
  name: "fc6"
  type: "InnerProduct"
  bottom: "pool5"
  top: "fc6"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 1
    }
  }
}
layer {
  name: "relu6"
  type: "ReLU"
  bottom: "fc6"
  top: "fc6"
}
layer {
  name: "drop6"
  type: "Dropout"
  bottom: "fc6"
  top: "fc6"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc7"
  type: "InnerProduct"
  bottom: "fc6"
  top: "fc7"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 1
    }
  }
}
layer {
  name: "relu7"
  type: "ReLU"
  bottom: "fc7"
  top: "fc7"
}
layer {
  name: "drop7"
  type: "Dropout"
  bottom: "fc7"
  top: "fc7"
  dropout_param {
    dropout_ratio: 0.5
  }
}
layer {
  name: "fc8"
  type: "InnerProduct"
  bottom: "fc7"
  top: "fc8"
  param {
    lr_mult: 1
    decay_mult: 1
  }
  param {
    lr_mult: 2
    decay_mult: 0
  }
  inner_product_param {
    num_output: 256
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}
layer {
  name: "accuracy"
  type: "Accuracy"
  bottom: "fc8"
  bottom: "y"
  top: "accuracy"
  include {
    phase: TEST
  }
}
layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "fc8"
  bottom: "y"
  top: "loss"
}

Lemma

unread,

Nov 6, 2016, 3:44:08 PM11/6/16

to Caffe Users

Hi,

It seem you are using high lr_base, and opened the layers wight learning into high learning.

It's babysitting issue, i would suggest starting from already trained one or, fixed some layers and use lower lr's.

There is thumb rule of 1/100, 1/1000 etc' for lr_base and Alf, Gamma.

Read about it,

Take care of zero var', and scaling.

Reply all

Reply to author

Forward

0 new messages