Training very deep networks for CIFAR 10

1,309 views
Skip to first unread message

Gil Levi

unread,
Oct 6, 2014, 11:01:35 AM10/6/14
to caffe...@googlegroups.com
Hi, 

I'm using Caffe for research on the CIFAR benchmark. My Caffe version is a bit old - the last time I "git pulled" was about a month ago.

First, I followed the example in Caffe's site and got about 82% accuracy. 

Following the paper "Very Deep Convolutional Networks for Large-Scale Image Recognition" by K. Simonyan and A. Zisserman, I wanted to trained deeper networks by adding a duplication of each convolution layer. 


However, when I'm training, the loss does not decrease and the validation accuracy is 0.1 (meaning the net performs random guessing).

I've tried two version of the leveldb - one that was created by Caffe's ready-to-go script and one that I created myself (I shuffled the data). I tried various learning rates. I tried to add and to drop norm layers, but nothing seems to work. 

What could be the problem?

Thanks in advance !!!


Implementation details:

My prototxt file looks like this:

name: "CIFAR10_full"
layers {
  name: "cifar"
  type: DATA
  top: "data"
  top: "label"
  data_param {
    source: "/home/michael/CIFAR10/data_for_caffe_training/leveldb/train_leveldb"
    mean_file: "/home/michael/CIFAR10/data_for_caffe_training/mean_image/mean.binaryproto"
    batch_size: 100
  }
  include: { phase: TRAIN }
}
layers {
  name: "cifar"
  type: DATA
  top: "data"
  top: "label"
  data_param {
    source: "/home/michael/CIFAR10/data_for_caffe_training/leveldb/val_leveldb"
    mean_file: "/home/michael/CIFAR10/data_for_caffe_training/mean_image/mean.binaryproto"
    batch_size: 100
  }
  include: { phase: TEST }
}
layers {
  bottom: "data"
  top: "conv1_1"
  name: "conv1_1"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.0001
    }
    bias_filler {
      type: "constant"
    }
  }
}
layers {
  bottom: "conv1_1"
  top: "conv1_1"
  name: "relu1_1"
  type: RELU
}
layers {
  bottom: "conv1_1"
  top: "conv1_2"
  name: "conv1_2"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.0001
    }
    bias_filler {
      type: "constant"
    }
  }
}
layers {
  bottom: "conv1_2"
  top: "conv1_2"
  name: "relu1_2"
  type: RELU
}
layers {
  bottom: "conv1_2"
  top: "pool1"
  name: "pool1"
  type: POOLING
  pooling_param {
    pool: MAX
    kernel_size: 3
    stride: 2
  }
}
layers {
  bottom: "pool1"
  top: "conv2_1"
  name: "conv2_1"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}
layers {
  bottom: "conv2_1"
  top: "conv2_1"
  name: "relu2_1"
  type: RELU
}
layers {
  bottom: "conv2_1"
  top: "conv2_2"
  name: "conv2_2"
  type: CONVOLUTION
  blobs_lr: 1
  blobs_lr: 2
  convolution_param {
    num_output: 32
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}
layers {
  bottom: "conv2_2"
  top: "conv2_2"
  name: "relu2_2"
  type: RELU
}
layers {
  bottom: "conv2_2"
  top: "pool2"
  name: "pool2"
  type: POOLING
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}

layers {
  bottom: "pool2"
  top: "conv3_1"
  name: "conv3_1"
  type: CONVOLUTION
  convolution_param {
    num_output: 64
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}
layers {
  bottom: "conv3_1"
  top: "conv3_1"
  name: "relu3_1"
  type: RELU
}
layers {
  bottom: "conv3_1"
  top: "conv3_2"
  name: "conv3_2"
  type: CONVOLUTION
  convolution_param {
    num_output: 64
    pad: 2
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}
layers {
  bottom: "conv3_2"
  top: "conv3_2"
  name: "relu3"
  type: RELU
}
layers {
  bottom: "conv3_2"
  top: "pool3"
  name: "pool3"
  type: POOLING
  pooling_param {
    pool: AVE
    kernel_size: 3
    stride: 2
  }
}
layers {
  name: "ip1"
  type: INNER_PRODUCT
  bottom: "pool3"
  top: "ip1"
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 250
  weight_decay: 0
  inner_product_param {
    num_output: 10
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
    }
  }
}
layers {
  name: "accuracy"
  type: ACCURACY
  bottom: "ip1"
  bottom: "label"
  top: "accuracy"
  include: { phase: TEST }
}
layers {
  name: "loss"
  type: SOFTMAX_LOSS
  bottom: "ip1"
  bottom: "label"
  top: "loss"
}



And here is my solver:

# reduce learning rate after 120 epochs (60000 iters) by factor 0f 10
# then another factor of 10 after 10 more epochs (5000 iters)

# The train/test net protocol buffer definition
net: "cifar10_full_train_test_gil4.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of CIFAR10, we have test batch size 100 and 100 test iterations,
# covering the full 10,000 testing images.
test_iter: 100
# Carry out testing every 1000 training iterations.
test_interval: 1000
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.001
momentum: 0.9
weight_decay: 0.004
# The learning rate policy
lr_policy: "fixed"
# Display every 200 iterations
display: 200
# The maximum number of iterations
max_iter: 60000
# snapshot intermediate results
snapshot: 10000
snapshot_prefix: "cifar10_full_d2"
# solver mode: CPU or GPU
# Note: there seems to be a bug with CPU computation in the pooling layers,
# and changing to solver_mode: CPU may result in NaNs on this example.
# If you want to train a variant of this architecture on the
# CPU, try changing the pooling regions from WITHIN_CHANNEL to ACROSS_CHANNELS
# in both cifar_full_train.prototxt and cifar_full_test.prototxt.
solver_mode: CPU


Nanne van Noord

unread,
Oct 7, 2014, 3:46:05 AM10/7/14
to caffe...@googlegroups.com
I'd recommend to initially reread the paper you're referencing, specifically section 2.3. Try to keep in mind that the contribution of that paper isn't that simply duplicating layers should give better performance.

Gil Levi

unread,
Oct 7, 2014, 8:44:45 AM10/7/14
to caffe...@googlegroups.com
Hi,

Thanks for your comment. 

I took a second look at the paper, specifically section 2.3 and notice two important details: the filters are smaller: 3x3 and the authors incorporated a non-linear rectification layer after each convolutional layer. 

Following the paper I did the same - I reduced the size of the filters to 3x3 and I added a SIGMOID layer after each convolutional layer (I also tried RELU instead of SIGMOID).

However, the accuracy still remains constant. 


Is they any other problem with the training? 

Thanks in advance,

Gil. 

Harsha Prabhakar

unread,
Mar 27, 2015, 12:54:22 AM3/27/15
to caffe...@googlegroups.com
Hi Gil,

You mentioned that you got ~81% accuracy initially. Just wanted to know, was it using the same prototxt given by default in caffe (train_quick.prototxt) ? Because I am stuck at 77% accuracy and would like to improve it. 

Gil Levi

unread,
Mar 27, 2015, 9:02:34 AM3/27/15
to caffe...@googlegroups.com
Hi,

I'm pretty sure that I got 81% using the default prototxt. I was a few months ago, so I'm not 100% sure.

Gil. 

Yingyu Liang

unread,
May 29, 2015, 11:43:12 AM5/29/15
to caffe...@googlegroups.com
Hi Gil,

I'm training a deepnet and facing the same problem: the accuracy stays 0.1 forever. Did you figure out a way to solve the problem? Thank you!

Best,
Yingyu

Gil Levi

unread,
May 29, 2015, 12:44:26 PM5/29/15
to caffe...@googlegroups.com
Hi,

I didn't solve it, but keep in mind that I used a very old version of Caffe. 

Gil 
Message has been deleted

Andy Wong

unread,
Jun 12, 2015, 11:53:18 AM6/12/15
to caffe...@googlegroups.com
Try reduce the learning rate and initialization magnitude.

Harsh Wardhan

unread,
Nov 2, 2015, 12:59:09 AM11/2/15
to Caffe Users
Keep your base_lr : 0.0001.

Ευάγγελος Μαυρόπουλος

unread,
Feb 19, 2016, 12:15:13 PM2/19/16
to Caffe Users
I had the same problem when i changed the number of output classes from 10 to 2. As Harsh Wardhan and Andy Wong adviced i decreased the learning rate by a factor of 10 and everything worked fine. Final accuracy 80.75%
Reply all
Reply to author
Forward
0 new messages