Training very deep networks for CIFAR 10

Gil Levi

unread,

Oct 6, 2014, 11:01:35 AM10/6/14

to caffe...@googlegroups.com

Hi,

I'm using Caffe for research on the CIFAR benchmark. My Caffe version is a bit old - the last time I "git pulled" was about a month ago.

First, I followed the example in Caffe's site and got about 82% accuracy.

Following the paper "Very Deep Convolutional Networks for Large-Scale Image Recognition" by K. Simonyan and A. Zisserman, I wanted to trained deeper networks by adding a duplication of each convolution layer.

However, when I'm training, the loss does not decrease and the validation accuracy is 0.1 (meaning the net performs random guessing).

I've tried two version of the leveldb - one that was created by Caffe's ready-to-go script and one that I created myself (I shuffled the data). I tried various learning rates. I tried to add and to drop norm layers, but nothing seems to work.

What could be the problem?

Thanks in advance !!!

Implementation details:

My prototxt file looks like this:

layers {

type: DATA

top: "data"

top: "label"

data_param {

source: "/home/michael/CIFAR10/data_for_caffe_training/leveldb/train_leveldb"

mean_file: "/home/michael/CIFAR10/data_for_caffe_training/mean_image/mean.binaryproto"

batch_size: 100

}

include: { phase: TRAIN }

}

layers {

type: DATA

top: "data"

top: "label"

data_param {

source: "/home/michael/CIFAR10/data_for_caffe_training/leveldb/val_leveldb"

mean_file: "/home/michael/CIFAR10/data_for_caffe_training/mean_image/mean.binaryproto"

batch_size: 100

}

include: { phase: TEST }

}

layers {

bottom: "data"

top: "conv1_1"

type: CONVOLUTION

blobs_lr: 1

blobs_lr: 2

convolution_param {

num_output: 32

pad: 2

kernel_size: 5

stride: 1

weight_filler {

type: "gaussian"

std: 0.0001

}

bias_filler {

type: "constant"

}

layers {

bottom: "conv1_1"

top: "conv1_1"

type: RELU

}

layers {

bottom: "conv1_1"

top: "conv1_2"

type: CONVOLUTION

blobs_lr: 1

blobs_lr: 2

convolution_param {

num_output: 32

pad: 2

kernel_size: 5

stride: 1

weight_filler {

type: "gaussian"

std: 0.0001

}

bias_filler {

type: "constant"

}

layers {

bottom: "conv1_2"

top: "conv1_2"

type: RELU

}

layers {

bottom: "conv1_2"

top: "pool1"

type: POOLING

pooling_param {

pool: MAX

kernel_size: 3

stride: 2

}

layers {

bottom: "pool1"

top: "conv2_1"

type: CONVOLUTION

blobs_lr: 1

blobs_lr: 2

convolution_param {

num_output: 32

pad: 2

kernel_size: 5

stride: 1

weight_filler {

type: "gaussian"

std: 0.01

}

bias_filler {

type: "constant"

}

layers {

bottom: "conv2_1"

top: "conv2_1"

type: RELU

}

layers {

bottom: "conv2_1"

top: "conv2_2"

type: CONVOLUTION

blobs_lr: 1

blobs_lr: 2

convolution_param {

num_output: 32

pad: 2

kernel_size: 5

stride: 1

weight_filler {

type: "gaussian"

std: 0.01

}

bias_filler {

type: "constant"

}

layers {

bottom: "conv2_2"

top: "conv2_2"

type: RELU

}

layers {

bottom: "conv2_2"

top: "pool2"

type: POOLING

pooling_param {

pool: AVE

kernel_size: 3

stride: 2

}

layers {

bottom: "pool2"

top: "conv3_1"

type: CONVOLUTION

convolution_param {

num_output: 64

pad: 2

kernel_size: 5

stride: 1

weight_filler {

type: "gaussian"

std: 0.01

}

bias_filler {

type: "constant"

}

layers {

bottom: "conv3_1"

top: "conv3_1"

type: RELU

}

layers {

bottom: "conv3_1"

top: "conv3_2"

type: CONVOLUTION

convolution_param {

num_output: 64

pad: 2

kernel_size: 5

stride: 1

weight_filler {

type: "gaussian"

std: 0.01

}

bias_filler {

type: "constant"

}

layers {

bottom: "conv3_2"

top: "conv3_2"

type: RELU

}

layers {

bottom: "conv3_2"

top: "pool3"

type: POOLING

pooling_param {

pool: AVE

kernel_size: 3

stride: 2

}

layers {

type: INNER_PRODUCT

bottom: "pool3"

top: "ip1"

blobs_lr: 1

blobs_lr: 2

weight_decay: 250

weight_decay: 0

inner_product_param {

num_output: 10

weight_filler {

type: "gaussian"

std: 0.01

}

bias_filler {

type: "constant"

}

layers {

type: ACCURACY

bottom: "ip1"

bottom: "label"

top: "accuracy"

include: { phase: TEST }

}

layers {

type: SOFTMAX_LOSS

bottom: "ip1"

bottom: "label"

top: "loss"

}

And here is my solver:

# reduce learning rate after 120 epochs (60000 iters) by factor 0f 10

# then another factor of 10 after 10 more epochs (5000 iters)

# The train/test net protocol buffer definition

net: "cifar10_full_train_test_gil4.prototxt"

# test_iter specifies how many forward passes the test should carry out.

# In the case of CIFAR10, we have test batch size 100 and 100 test iterations,

# covering the full 10,000 testing images.

test_iter: 100

# Carry out testing every 1000 training iterations.

test_interval: 1000

# The base learning rate, momentum and the weight decay of the network.

base_lr: 0.001

momentum: 0.9

weight_decay: 0.004

# The learning rate policy

lr_policy: "fixed"

# Display every 200 iterations

display: 200

# The maximum number of iterations

max_iter: 60000

# snapshot intermediate results

snapshot: 10000

snapshot_prefix: "cifar10_full_d2"

# solver mode: CPU or GPU

# Note: there seems to be a bug with CPU computation in the pooling layers,

# and changing to solver_mode: CPU may result in NaNs on this example.

# If you want to train a variant of this architecture on the

# CPU, try changing the pooling regions from WITHIN_CHANNEL to ACROSS_CHANNELS

# in both cifar_full_train.prototxt and cifar_full_test.prototxt.

solver_mode: CPU

Nanne van Noord

unread,

Oct 7, 2014, 3:46:05 AM10/7/14

to caffe...@googlegroups.com

I'd recommend to initially reread the paper you're referencing, specifically section 2.3. Try to keep in mind that the contribution of that paper isn't that simply duplicating layers should give better performance.

Gil Levi

unread,

Oct 7, 2014, 8:44:45 AM10/7/14

to caffe...@googlegroups.com

Hi,

Thanks for your comment.

I took a second look at the paper, specifically section 2.3 and notice two important details: the filters are smaller: 3x3 and the authors incorporated a non-linear rectification layer after each convolutional layer.

Following the paper I did the same - I reduced the size of the filters to 3x3 and I added a SIGMOID layer after each convolutional layer (I also tried RELU instead of SIGMOID).

However, the accuracy still remains constant.

Is they any other problem with the training?

Thanks in advance,

Gil.

Harsha Prabhakar

unread,

Mar 27, 2015, 12:54:22 AM3/27/15

to caffe...@googlegroups.com

Hi Gil,

You mentioned that you got ~81% accuracy initially. Just wanted to know, was it using the same prototxt given by default in caffe (train_quick.prototxt) ? Because I am stuck at 77% accuracy and would like to improve it.

Gil Levi

unread,

Mar 27, 2015, 9:02:34 AM3/27/15

to caffe...@googlegroups.com

Hi,

I'm pretty sure that I got 81% using the default prototxt. I was a few months ago, so I'm not 100% sure.

Gil.

Yingyu Liang

unread,

May 29, 2015, 11:43:12 AM5/29/15

to caffe...@googlegroups.com

Hi Gil,

I'm training a deepnet and facing the same problem: the accuracy stays 0.1 forever. Did you figure out a way to solve the problem? Thank you!

Best,

Yingyu

Gil Levi

unread,

May 29, 2015, 12:44:26 PM5/29/15

to caffe...@googlegroups.com

Hi,

I didn't solve it, but keep in mind that I used a very old version of Caffe.

Gil

Message has been deleted

Andy Wong

unread,

Jun 12, 2015, 11:53:18 AM6/12/15

to caffe...@googlegroups.com

Try reduce the learning rate and initialization magnitude.

Harsh Wardhan

unread,

Nov 2, 2015, 12:59:09 AM11/2/15

to Caffe Users

Keep your base_lr : 0.0001.

Ευάγγελος Μαυρόπουλος

unread,

Feb 19, 2016, 12:15:13 PM2/19/16

to Caffe Users

I had the same problem when i changed the number of output classes from 10 to 2. As Harsh Wardhan and Andy Wong adviced i decreased the learning rate by a factor of 10 and everything worked fine. Final accuracy 80.75%

Reply all

Reply to author

Forward