Getting low accuracy with network trained on Imagenet 2011 (ILSVRC 2011)

Gil Levi

unread,

Sep 25, 2014, 2:23:44 PM9/25/14

to caffe...@googlegroups.com

Hi,

I've downloaded the Imagenet2011 dataset and tried to train the Caffe imagenet network on it using the instructions here. I used about 500K images for training and 70K images for validation.

I've also download the ILSVRC 2012 validation set for experiment.

Here are the results:

On ILSVRC 2012 official validation set:

My network: 46% accuracy, Original Caffe Imagenet network: 56% accuracy.

On MY validation set:

My network: 53% accuracy, Original Caffe Imagenet network: 80% accuracy.

On MY training data (checking for overfitting):

My network: 99% accuracy, Original Caffe Imagenet network: 70% accuracy.

I would like to ask for some help and tips regarding training:

1. I have the sense that my new network is overfitting. Do you agree? Should I use more images? use data augmentation to mechanically increase the size of the training set? or perhaps change the dropout parameter?

2. In the paper "Imagenet classification with deep convolutional neural networks" by Krizhevsky et al. [1] it says that the authors used 1.2M images for training and used data augmentation[2] that increases the size of the training data by a factor of 2048. However, no data augmentation (aside from random flips) is done in the code provided for the imagenet training. Is data augmentation not really required?

3. I'm only interested in about 100 classes from the ImageNet dataset. Say I was to train the imagenet network (from scratch) with only 100 classes with a total of 100K images. It's far less than the original 1.2M images, but It's also far less classes. Will the network overfit or will it give good performance on those 100 categories? Or should I use a different net architecture for only 100 classes? Would it be better to train the Imagenet network and then only fine tune using those 100 classes?

Thanks in advance!

Gil

[1] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.‏

[2] "The first form of data augmentation consists of generating image translations and horizontal reflections. We do this by extracting random 224 x224 patches (and their horizontal reflections) from the 256x 256 images and training our network on these extracted patches4. This increases the size of our training set by a factor of 2048, though the resulting training examples are, of course, highly interdependent."

Amir Alush

unread,

Sep 27, 2014, 4:43:56 PM9/27/14

to caffe...@googlegroups.com

Gil hi,

1. Yes, you're definitely overfitted the gap between train and test is very big, you ought to add data by adding more images or as you suggested data augementation.

Also, I'd also try decreasing the number of your network parameters by making the network more shallow or by other means..

2. Data augmentation is nice to have. Look into the data layer there is a random picking of a crop.

3. I would just try it. Try finetuning using the Imagnet reference, It's a fast process, ~50K iterations should be enough.

4. The discrepancy of the results you're referring is due to several reasons. You're using a different training set (500K instead of 1.2M). Another issue is that you're not using multiple trained models for averaging which seems to be a common practice at all current state-of-the-art ILSVRC submissions.

Gil Levi

unread,

Oct 1, 2014, 9:38:35 AM10/1/14

to caffe...@googlegroups.com

Thanks for your help Amir!

Reply all

Reply to author

Forward