Why might my net be giving detection prediction scores that are all 0s?

296 views

Skip to first unread message

Rose Perrone

unread,

Nov 27, 2014, 1:07:48 PM11/27/14

to caffe...@googlegroups.com

When I detect using the bvlc reference net instead of the net I trained, I see non-zero detection prediction scores just fine. But when I train my own net on thousands of images, I get nothing but goose-eggs. My aim is to detect a single imagenet noun in imagenet, so the prediction output is two classes (presence of noun or non-presence of noun).

Here are my train_val.prototxt, solver.prototxt, and deploy.prototxt:

train_val.prototxt:

```

layers {

top: "data"

top: "label"

type: DATA

data_param {

source: "/Users/rose/home/video-object-detection/data/imagenet/n07840804/ilsvrc12_train_lmdb"

backend: LMDB

batch_size: 64

}

transform_param {

crop_size: 224

mirror: true

mean_file: "/Users/rose/home/video-object-detection/data/imagenet/n07840804/image_mean.binaryproto"

}

include: { phase: TRAIN }

}

layers {

top: "data"

top: "label"

type: DATA

data_param {

source: "/Users/rose/home/video-object-detection/data/imagenet/n07840804/ilsvrc12_test_lmdb"

backend: LMDB

batch_size: 89

}

transform_param {

crop_size: 224

mirror: false

mean_file: "/Users/rose/home/video-object-detection/data/imagenet/n07840804/image_mean.binaryproto"

}

include: { phase: TEST }

}

layers {

bottom: "data"

top: "conv1"

type: CONVOLUTION

blobs_lr: 1

blobs_lr: 2

weight_decay: 1

weight_decay: 0

convolution_param {

num_output: 96

kernel_size: 11

stride: 4

weight_filler {

type: "gaussian"

mean: 0

std: 0.01

}

bias_filler {

type: "constant"

value: 0

}

layers {

bottom: "conv1"

top: "conv1"

type: RELU

}

layers {

bottom: "conv1"

top: "cccp1"

type: CONVOLUTION

blobs_lr: 1

blobs_lr: 2

weight_decay: 1

weight_decay: 0

convolution_param {

num_output: 96

kernel_size: 1

stride: 1

weight_filler {

type: "gaussian"

mean: 0

std: 0.05

}

bias_filler {

type: "constant"

value: 0

}

layers {

bottom: "cccp1"

top: "cccp1"

type: RELU

}

layers {

bottom: "cccp1"

top: "cccp2"

type: CONVOLUTION

blobs_lr: 1

blobs_lr: 2

weight_decay: 1

weight_decay: 0

convolution_param {

num_output: 96

kernel_size: 1

stride: 1

weight_filler {

type: "gaussian"

mean: 0

std: 0.05

}

bias_filler {

type: "constant"

value: 0

}

layers {

bottom: "cccp2"

top: "cccp2"

type: RELU

}

layers {

bottom: "cccp2"

top: "pool0"

type: POOLING

pooling_param {

pool: MAX

kernel_size: 3

stride: 2

}

layers {

bottom: "pool0"

top: "conv2"

type: CONVOLUTION

blobs_lr: 1

blobs_lr: 2

weight_decay: 1

weight_decay: 0

convolution_param {

num_output: 256

pad: 2

kernel_size: 5

stride: 1

weight_filler {

type: "gaussian"

mean: 0

std: 0.05

}

bias_filler {

type: "constant"

value: 0

}

layers {

bottom: "conv2"

top: "conv2"

type: RELU

}

layers {

bottom: "conv2"

top: "cccp3"

type: CONVOLUTION

blobs_lr: 1

blobs_lr: 2

weight_decay: 1

weight_decay: 0

convolution_param {

num_output: 256

kernel_size: 1

stride: 1

weight_filler {

type: "gaussian"

mean: 0

std: 0.05

}

bias_filler {

type: "constant"

value: 0

}

layers {

bottom: "cccp3"

top: "cccp3"

type: RELU

}

layers {

bottom: "cccp3"

top: "cccp4"

type: CONVOLUTION

blobs_lr: 1

blobs_lr: 2

weight_decay: 1

weight_decay: 0

convolution_param {

num_output: 256

kernel_size: 1

stride: 1

weight_filler {

type: "gaussian"

mean: 0

std: 0.05

}

bias_filler {

type: "constant"

value: 0

}

layers {

bottom: "cccp4"

top: "cccp4"

type: RELU

}

layers {

bottom: "cccp4"

top: "pool2"

type: POOLING

pooling_param {

pool: MAX

kernel_size: 3

stride: 2

}

layers {

bottom: "pool2"

top: "conv3"

type: CONVOLUTION

blobs_lr: 1

blobs_lr: 2

weight_decay: 1

weight_decay: 0

convolution_param {

num_output: 384

pad: 1

kernel_size: 3

stride: 1

weight_filler {

type: "gaussian"

mean: 0

std: 0.01

}

bias_filler {

type: "constant"

value: 0

}

layers {

bottom: "conv3"

top: "conv3"

type: RELU

}

layers {

bottom: "conv3"

top: "cccp5"

type: CONVOLUTION

blobs_lr: 1

blobs_lr: 2

weight_decay: 1

weight_decay: 0

convolution_param {

num_output: 384

kernel_size: 1

stride: 1

weight_filler {

type: "gaussian"

mean: 0

std: 0.05

}

bias_filler {

type: "constant"

value: 0

}

layers {

bottom: "cccp5"

top: "cccp5"

type: RELU

}

layers {

bottom: "cccp5"

top: "cccp6"

type: CONVOLUTION

blobs_lr: 1

blobs_lr: 2

weight_decay: 1

weight_decay: 0

convolution_param {

num_output: 384

kernel_size: 1

stride: 1

weight_filler {

type: "gaussian"

mean: 0

std: 0.05

}

bias_filler {

type: "constant"

value: 0

}

layers {

bottom: "cccp6"

top: "cccp6"

type: RELU

}

layers {

bottom: "cccp6"

top: "pool3"

type: POOLING

pooling_param {

pool: MAX

kernel_size: 3

stride: 2

}

layers {

bottom: "pool3"

top: "pool3"

type: DROPOUT

dropout_param {

dropout_ratio: 0.5

}

layers {

bottom: "pool3"

top: "conv4"

type: CONVOLUTION

blobs_lr: 1

blobs_lr: 2

weight_decay: 1

weight_decay: 0

convolution_param {

num_output: 1024

pad: 1

kernel_size: 3

stride: 1

weight_filler {

type: "gaussian"

mean: 0

std: 0.05

}

bias_filler {

type: "constant"

value: 0

}

layers {

bottom: "conv4"

top: "conv4"

type: RELU

}

layers {

bottom: "conv4"

top: "cccp7"

type: CONVOLUTION

blobs_lr: 1

blobs_lr: 2

weight_decay: 1

weight_decay: 0

convolution_param {

num_output: 1024

kernel_size: 1

stride: 1

weight_filler {

type: "gaussian"

mean: 0

std: 0.05

}

bias_filler {

type: "constant"

value: 0

}

layers {

bottom: "cccp7"

top: "cccp7"

type: RELU

}

layers {

bottom: "cccp7"

top: "cccp8"

type: CONVOLUTION

blobs_lr: 1

blobs_lr: 2

weight_decay: 1

weight_decay: 0

convolution_param {

num_output: 2

kernel_size: 1

stride: 1

weight_filler {

type: "gaussian"

mean: 0

std: 0.01

}

bias_filler {

type: "constant"

value: 0

}

layers {

bottom: "cccp8"

top: "cccp8"

type: RELU

}

layers {

bottom: "cccp8"

top: "pool4"

type: POOLING

pooling_param {

pool: AVE

kernel_size: 6

stride: 1

}

layers {

type: ACCURACY

bottom: "pool4"

bottom: "label"

top: "accuracy"

include: { phase: TEST }

}

layers {

bottom: "pool4"

bottom: "label"

type: SOFTMAX_LOSS

include: { phase: TRAIN }

}

```

solver.prototxt:

```

net: "/Users/rose/home/video-object-detection/data/imagenet/n07840804/aux/train_val.prototxt"

test_iter: 1000

test_interval: 1000

base_lr: 0.01

lr_policy: "step"

gamma: 0.1

stepsize: 200000

display: 20

max_iter: 1000

momentum: 0.9

weight_decay: 0.0005

snapshot: 200

snapshot_prefix: "/Users/rose/home/video-object-detection/data/imagenet/n07840804/snapshots"

solver_mode: CPU

```

deploy.prototxt

```

input: "data"

input_dim: 10

input_dim: 3

input_dim: 227

layers {

bottom: "data"

top: "conv1"

type: CONVOLUTION

convolution_param {

num_output: 96

kernel_size: 11

stride: 4

}

layers {

bottom: "conv1"

top: "conv1"

type: RELU

}

layers {

bottom: "conv1"

top: "cccp1"

type: CONVOLUTION

convolution_param {

num_output: 96

kernel_size: 1

stride: 1

}

layers {

bottom: "cccp1"

top: "cccp1"

type: RELU

}

layers {

bottom: "cccp1"

top: "cccp2"

type: CONVOLUTION

convolution_param {

num_output: 96

kernel_size: 1

stride: 1

}

layers {

bottom: "cccp2"

top: "cccp2"

type: RELU

}

layers {

bottom: "cccp2"

top: "pool0"

type: POOLING

pooling_param {

pool: MAX

kernel_size: 3

stride: 2

}

layers {

bottom: "pool0"

top: "conv2"

type: CONVOLUTION

convolution_param {

num_output: 256

pad: 2

kernel_size: 5

stride: 1

}

layers {

bottom: "conv2"

top: "conv2"

type: RELU

}

layers {

bottom: "conv2"

top: "cccp3"

type: CONVOLUTION

convolution_param {

num_output: 256

kernel_size: 1

stride: 1

}

layers {

bottom: "cccp3"

top: "cccp3"

type: RELU

}

layers {

bottom: "cccp3"

top: "cccp4"

type: CONVOLUTION

convolution_param {

num_output: 256

kernel_size: 1

stride: 1

}

layers {

bottom: "cccp4"

top: "cccp4"

type: RELU

}

layers {

bottom: "cccp4"

top: "pool2"

type: POOLING

pooling_param {

pool: MAX

kernel_size: 3

stride: 2

}

layers {

bottom: "pool2"

top: "conv3"

type: CONVOLUTION

convolution_param {

num_output: 384

pad: 1

kernel_size: 3

stride: 1

}

layers {

bottom: "conv3"

top: "conv3"

type: RELU

}

layers {

bottom: "conv3"

top: "cccp5"

type: CONVOLUTION

convolution_param {

num_output: 384

kernel_size: 1

stride: 1

}

layers {

bottom: "cccp5"

top: "cccp5"

type: RELU

}

layers {

bottom: "cccp5"

top: "cccp6"

type: CONVOLUTION

convolution_param {

num_output: 384

kernel_size: 1

stride: 1

}

layers {

bottom: "cccp6"

top: "cccp6"

type: RELU

}

layers {

bottom: "cccp6"

top: "pool3"

type: POOLING

pooling_param {

pool: MAX

kernel_size: 3

stride: 2

}

layers {

bottom: "pool3"

top: "pool3"

type: DROPOUT

dropout_param {

dropout_ratio: 0.5

}

layers {

bottom: "pool3"

top: "conv4"

type: CONVOLUTION

convolution_param {

num_output: 1024

pad: 1

kernel_size: 3

stride: 1

}

layers {

bottom: "conv4"

top: "conv4"

type: RELU

}

layers {

bottom: "conv4"

top: "cccp7"

type: CONVOLUTION

convolution_param {

num_output: 1024

kernel_size: 1

stride: 1

}

layers {

bottom: "cccp7"

top: "cccp7"

type: RELU

}

layers {

bottom: "cccp7"

top: "cccp8"

type: CONVOLUTION

convolution_param {

num_output: 2

kernel_size: 1

stride: 1

}

layers {

bottom: "cccp8"

top: "cccp8"

type: RELU

}

layers {

bottom: "cccp8"

top: "pool4"

type: POOLING

pooling_param {

pool: AVE

kernel_size: 6

stride: 1

}

# R-CNN classification layer made from R-CNN ILSVRC13 SVMs.

layers {

type: INNER_PRODUCT

bottom: "cccp8"

top: "fc-rcnn"

inner_product_param {

num_output: 2

}

```

Rose Perrone

unread,

Nov 29, 2014, 11:43:22 PM11/29/14

to caffe...@googlegroups.com

Maybe I wrote these prototxt files incorrectly. Here are the instructions I created that say how to create these prototxt files. I based my protoxt files on the prototxt files of the nin_imagenet (see the Caffe Model Zoo).

After completeing these instructions, you will have made these 3 files:
train_val.protxt, solver.protxt, and deploy.prototxt
Use absolute paths for simplicity.
First, copy the model's train_val.prototxt to
data/imagenet/<wnid_dir>/images/<dataset>/aux
Change the files for each "source:" and "mean_file:" line to the lmbd
files and image mean created by prepare_data.py.
Find the last layer that contains a "num_output" field and change that to
the number of categories (in my case, two: one positive, one negative).
I also change the batch_size from 64 to 32, in case I run out of memory
on my macbook.
Copy the model's solver.prototxt to
data/imagenet/<wnid_dir>/images/<dataset>/aux
Change the "net:" path to the path to the train_val.prototxt you just created.
Change the "snapshot_prefix:" to
data/imagenet/<wnid_dir>/snapshots/images/<dataset>/snapshots
Change the "server_mode:" to CPU
Change the "snapshot:" to 200. On my macbook, it takes 3 minutes per 20
iterations, so this produces one snapshot per half hour. On a K40 machine
training the bvlc_reference_caffenet, training the
bvlc_refference_caffenet on 1000 ImageNet categories takes 26 seconds per
20 iterations, which is ~5.2 ms per image, so I think they're training on about
5100 images. I'm training on 19500 images (there are 7 times as many
negative images as positive images)
Change the "max_iter" to 1000. (this is 45000 for NIN, but that would take
me 112 hours, which is 4.7 days). This is the same as test_iter. It's
annoying that the test is also performed on an untrained network, because
that takes 70 minutes, and just tells us the ratio of our positive to negative
images.
Copy this to a new file called deploy.prototxt:
```
name: "<the name in the train_val.prototxt>"

input: "data"
input_dim: 10
input_dim: 3
input_dim: 227
input_dim: 227

```
Append to that all layers in train_val.prototxt.
Delete the first few layers that don't have a "bottom" field.
Delete all pramaters that have to do exclusively with learning.
e.g.:
- blobs_lr
- weight_decay
- weight_filler
- bias_filler
Delete the "accuracy" layer and any layer after it (probably the softmax)
Append to the file this final layer:
```

# R-CNN classification layer made from R-CNN ILSVRC13 SVMs.
layers {
name: "fc-rcnn"
type: INNER_PRODUCT

bottom: <name of layer right above this one>
top: "fc-rcnn"
inner_product_param {
num_output: <change this to the number of categories>
}
}
```
In the last layer before the R-CNN layer that contains a "num_output" field,
change that value to the number of categories (in my case, 2).
References for generating deploy.prototxt:
https://github.com/BVLC/caffe/issues/1245
https://github.com/BVLC/caffe/issues/261

Rose Perrone

unread,

Nov 29, 2014, 11:46:38 PM11/29/14

to caffe...@googlegroups.com

For full context, here's my repo: https://github.com/roseperrone/video-object-detection

Rose Perrone

unread,

Dec 2, 2014, 9:56:11 PM12/2/14

to caffe...@googlegroups.com

When I finetune the NIN model rather than train it from scratch, I still get 0s for the negative class, but for the positive class, I get high prediction scores. Here is a sample of the detection results, where the value on the left is the prediction score for the negative class, and the value on the right is the prediction score for the positive class:

0.0 10.1965

0.0 89.606

0.0 15.007

0.0 8.72097

0.0 27.801

0.0 55.6223

0.0 20.7561

0.0 11.9071

0.0 26.5453

0.0 69.9159

0.0 7.99755

0.0 53.7852

0.0 64.1716

0.0 56.2549

0.0 34.6965

0.0 21.247

0.0 31.1896

0.0 115.908

0.0 45.844

0.0 71.3556

0.0 17.0009

nod

unread,

Jan 29, 2015, 10:33:35 AM1/29/15

to caffe...@googlegroups.com

experiencing the same issue..

did you find a solution for it ?

Reply all

Reply to author

Forward

0 new messages