Solving a network -> accuracy of 0

1,784 views

Skip to first unread message

Toby Hijzen

unread,

Apr 29, 2015, 10:51:35 AM4/29/15

to caffe...@googlegroups.com

Hi All,

I've modified the mnist example to work on a different dataset of 20.000 images (size 1x18x36 , CxWxH), with 2 classes. In short I did the following:

1. Modified the create_imagenet.sh in /examples/imagenet to produce an lmdb train and test set (no resizing).

2. Modified the make_imagenet_mean.sh and calculated the mean.

3. Modified lenet_solver.prototxt and lenet_train_test.prototxt to make use of the new data

4. Started training on the new data

This then gives me a bunch of output. When it starts training the accuracy is really low, almost 0. Which is very strange as I have only two classes and the minimal accuracy (By a random guess) should be at least 0.5 as I understand from the documentation (see http://caffe.berkeleyvision.org/doxygen/classcaffe_1_1AccuracyLayer.html then scroll to forward_cpu).

I've run the train command several times and found that in some of the cases (about 1 in 8) the accuracy stagnates at around 0.5.

What could I be doing wrong? Any help is greatly appreciated.

PS i also included part of the output from one off my training tests.

Solving LeNet

Learning Rate Policy: inv

Iteration 0, Testing net (#0)

Test net output #0: accuracy = 0.0065625

Test net output #1: loss = 70.2702 (* 1 = 70.2702 loss)

Iteration 0, loss = 73.7216

Train net output #0: loss = 73.7216 (* 1 = 73.7216 loss)

Iteration 0, lr = 0.01

Iteration 100, loss = 87.3365

Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)

Iteration 100, lr = 0.00992565

Iteration 200, loss = 87.3365

Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)

Iteration 200, lr = 0.00985258

Iteration 300, loss = 87.3365

Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)

Iteration 300, lr = 0.00978075

Iteration 400, loss = 87.3365

Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)

Iteration 400, lr = 0.00971013

Iteration 500, Testing net (#0)

Test net output #0: accuracy = 0

Test net output #1: loss = 87.3365 (* 1 = 87.3365 loss)

Iteration 500, loss = 87.3365

Train net output #0: loss = 87.3365 (* 1 = 87.3365 loss)

Toby Hijzen

unread,

Apr 30, 2015, 4:09:35 AM4/30/15

to caffe...@googlegroups.com

It turns out that if I decrease the training rate dramatically (from 0.01 to 0.0001) the network starts learning. It goes up to an accuracy of 0.6 which is a little better than random guessing (0.5). Still not really the performance I was hoping for, but possibly what I could expect? FYI I'm using part of the Daimler pedestrian classification test set found here: http://www.gavrila.net/Datasets/Daimler_Pedestrian_Benchmark_D/Daimler_Mono_Ped__Class__Bench/daimler_mono_ped__class__bench.html

Still, my original problem has not been solved. In that I should never get an accuracy of zero or very close to zero because that would mean that by using an inverter (logical not) for a two class problem would give me a perfect score.

An idea: could it be that since the code is not meant for 2 class classification (basically yes/no classifier) I'm getting unexpected behavior?

Toby Hijzen

unread,

Apr 30, 2015, 4:23:49 AM4/30/15

to caffe...@googlegroups.com

It turns out I forgot to adapt the number of outputs from 10 to 2 in:

layer {
  type: "InnerProduct"
  ...
  inner_product_param {
    num_output: 2
    ...
  }
}

On Wednesday, April 29, 2015 at 4:51:35 PM UTC+2, Toby Hijzen wrote:

Message has been deleted

Chris Novick

unread,

May 7, 2015, 12:14:19 PM5/7/15

to caffe...@googlegroups.com

Can I ask how you knew you had to change this? I've recently tried to run the imagenet model on my own data, but resized to 512x512. After letting the network train overnight, the accuracy is always 0, and testing provides this output:

Running for 50 iteration
Batch 0, accuracy = 0
Batch 0, loss = 1.#QNAN
Batch 1, accuracy = 0
Batch 1, loss = 1.#QNAN
Batch 2, accuracy = 0
Batch 2, loss = 1.#QNAN
....
Loss: 1.#QNAN

I'm very new to caffe and neural networks in general, so I'll post my very slightly modified model definition. Any help is greatly appreciated.

name: "derpNet2"
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
    phase: TRAIN
}
transform_param {
    mirror: true
#    crop_size: 227
    mean_file: "D:/_RDATA/py_train_mean.binaryproto"
}
# mean pixel / channel-wise mean instead of mean image
# transform_param {
#    crop_size: 227
#    mean_value: 104
#    mean_value: 117
#    mean_value: 123
#    mirror: true
# }
data_param {
    source: "D:/_RDATA/train/py_train_lmdb"
    batch_size: 50
    backend: LMDB
}
}
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
    phase: TEST
}
transform_param {
    mirror: false
#    crop_size: 227
    mean_file: "D:/_RDATA/py_test_mean.binaryproto"
}
# mean pixel / channel-wise mean instead of mean image
# transform_param {
#    crop_size: 227
#    mean_value: 104
#    mean_value: 117
#    mean_value: 123
#    mirror: true
# }
data_param {
    source: "D:/_RDATA/train/py_test_lmdb"
    batch_size: 50
    backend: LMDB
}
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
    lr_mult: 1
    decay_mult: 1
}
param {
    lr_mult: 2
    decay_mult: 0
}
convolution_param {
    num_output: 96
    kernel_size: 11
    stride: 4
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "conv1"
top: "conv1"
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
}
}
layer {
name: "norm1"
type: "LRN"
bottom: "pool1"
top: "norm1"
lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "norm1"
top: "conv2"
param {
    lr_mult: 1
    decay_mult: 1
}
param {
    lr_mult: 2
    decay_mult: 0
}
convolution_param {
    num_output: 256
    pad: 2
    kernel_size: 5
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.5
    }
}
}
layer {
name: "relu2"
type: "ReLU"
bottom: "conv2"
top: "conv2"
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
}
}
layer {
name: "norm2"
type: "LRN"
bottom: "pool2"
top: "norm2"
lrn_param {
    local_size: 5
    alpha: 0.0001
    beta: 0.75
}
}
layer {
name: "conv3"
type: "Convolution"
bottom: "norm2"
top: "conv3"
param {
    lr_mult: 1
    decay_mult: 1
}
param {
    lr_mult: 2
    decay_mult: 0
}
convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
}
}
layer {
name: "relu3"
type: "ReLU"
bottom: "conv3"
top: "conv3"
}
layer {
name: "conv4"
type: "Convolution"
bottom: "conv3"
top: "conv4"
param {
    lr_mult: 1
    decay_mult: 1
}
param {
    lr_mult: 2
    decay_mult: 0
}
convolution_param {
    num_output: 384
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.5
    }
}
}
layer {
name: "relu4"
type: "ReLU"
bottom: "conv4"
top: "conv4"
}
layer {
name: "conv5"
type: "Convolution"
bottom: "conv4"
top: "conv5"
param {
    lr_mult: 1
    decay_mult: 1
}
param {
    lr_mult: 2
    decay_mult: 0
}
convolution_param {
    num_output: 256
    pad: 1
    kernel_size: 3
    group: 2
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0.5
    }
}
}
layer {
name: "relu5"
type: "ReLU"
bottom: "conv5"
top: "conv5"
}
layer {
name: "pool5"
type: "Pooling"
bottom: "conv5"
top: "pool5"
pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
}
}
layer {
name: "fc6"
type: "InnerProduct"
bottom: "pool5"
top: "fc6"
param {
    lr_mult: 1
    decay_mult: 1
}
param {
    lr_mult: 2
    decay_mult: 0
}
inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 1
    }
}
}
layer {
name: "relu6"
type: "ReLU"
bottom: "fc6"
top: "fc6"
}
layer {
name: "drop6"
type: "Dropout"
bottom: "fc6"
top: "fc6"
dropout_param {
    dropout_ratio: 0.5
}
}
layer {
name: "fc7"
type: "InnerProduct"
bottom: "fc6"
top: "fc7"
param {
    lr_mult: 1
    decay_mult: 1
}
param {
    lr_mult: 2
    decay_mult: 0
}
inner_product_param {
    num_output: 4096
    weight_filler {
      type: "gaussian"
      std: 0.005
    }
    bias_filler {
      type: "constant"
      value: 1
    }
}
}
layer {
name: "relu7"
type: "ReLU"
bottom: "fc7"
top: "fc7"
}
layer {
name: "drop7"
type: "Dropout"
bottom: "fc7"
top: "fc7"
dropout_param {
    dropout_ratio: 0.5
}
}
layer {
name: "fc8"
type: "InnerProduct"
bottom: "fc7"
top: "fc8"
param {
    lr_mult: 1
    decay_mult: 1
}
param {
    lr_mult: 2
    decay_mult: 0
}
inner_product_param {
    num_output: 1000
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
}
}
layer {
name: "accuracy"
type: "Accuracy"
bottom: "fc8"
bottom: "label"
top: "accuracy"
include {
    phase: TEST
}
}
layer {
name: "loss"
type: "SoftmaxWithLoss"
bottom: "fc8"
bottom: "label"
top: "loss"
}

accuracy = 0
loss = 1.#QNAN (* 1 = 1.#QNAN loss)

npit

unread,

Jun 15, 2015, 9:48:26 AM6/15/15

to caffe...@googlegroups.com

Did you solve your issue?

Chris Novick

unread,

Jun 15, 2015, 11:41:10 AM6/15/15

to caffe...@googlegroups.com

I did not, no. I reduced the size back down to 256x256 and gave up after spending a bit more time than I had trying to work out the issue. My best guess was memory limitations or something similar.

On Monday, June 15, 2015 at 9:48:26 AM UTC-4, npit wrote:

Did you solve your issue?

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu