Result on Cifar10 with ResNet using Keras

346 views

Skip to first unread message

sanparith marukatat

unread,

Dec 28, 2016, 2:58:02 AM12/28/16

to Keras-users

I am trying to implement ResNet following the paper "Deep Residual Learning for Image Recognition" by He et al. (https://arxiv.org/pdf/1512.03385v1.pdf) using Keras/Theano.

1. With 20-layers ResNet, the test accuracy is around 89% that is still ~2% behind the result reported in the paper. I wonder what can I do to further improve this result.

The training set is split into 45,000 training examples and 5,000 validation examples.

I used 'he_normal' initialization with no bias for every convolution layers.

Data augmentation is performed using Keras's ImageDataGenerator.

I used batch size of 100. Changing this to 200 does not change the performance too much.

I used 'adam' optimizer.

I implement my own ResNet. It's a straightforward implementation that follows the structure described in the paper. The following code is used to construct the residual basic block:

def create_res_basicblock(input_shape, ksize, n_feature_maps, reduce_first):
    x = Input(shape=(input_shape))
    # identity path
    ss = (1,1)
    xx = x
    if reduce_first:
        ss = (2,2)
        xx = AveragePooling2D(pool_size=(2,2), dim_ordering='th')(x)
    # pading zero channels
    if n_feature_maps > input_shape[0]:
        tmp = Convolution2D(n_feature_maps-input_shape[0], 1, 1, border_mode='same', bias=False, init='zero')(xx)
        tmp.trainable = False # No train, just zero padding
        xx = merge([xx, tmp], mode='concat', concat_axis=1)
    # residual path
    residual = Convolution2D(n_feature_maps, ksize, ksize, border_mode='same', init='he_normal', bias=False, subsample=ss)(x)
    residual = BatchNormalization(axis=1, mode=2)(residual)
    residual = Activation('relu')(residual)
    residual = Convolution2D(n_feature_maps, ksize, ksize, border_mode='same', init='he_normal', bias=False)(residual)
    residual = BatchNormalization(axis=1, mode=2)(residual)
    y = merge([xx, residual], mode='sum')
    z = Activation('relu')(y)
    block = Model(input=x, output=z)
    return block

2. I googled and found several ResNet implementations using Torch. I wonder if the two percents difference are due to the difference between Keras and Torch?

3. I used a machine with K20 card, and it took about 6-8 hours to train 200 epoch (I am not alone on the machine). In the paper, the authors trained ResNet for more than 30,000 "iterations". I wonder if the "iteration" referred to in the paper is the same as epoch we use in Keras/Theano. Do they use really powerful computer or Torch is much faster than Keras/Theano.

kini5...@gmail.com

unread,

Oct 22, 2017, 3:00:05 AM10/22/17

to Keras-users

Hey,

Almost a year late! But i'm having similar issues. Did you find out any fix? or any reason as to why the above is happening? Getting 89 percent as well. And do iterations mean epochs? Any help would be greatly appreciated!

Reply all

Reply to author

Forward

0 new messages