CIFAR-10 and 100 with AlexNet

yongsk

unread,

Nov 9, 2016, 8:30:50 PM11/9/16

to torch7

I am given a task to solve CIFAR-10 and 100 problem based on AlexNet.

As I understand, AlexNet is based on the inputs of size 224x224x3 with 5 Conv layers.

However, data in CIFAR are 32x32x3, which is very small compared to 224x224x3!

Even if I use smaller kernels than the one in original AlexNet, still I do not think I can fully make use of those 5 Conv layers for feature extraction.
(After two Conv layers using smaller kernels, its sizes became 6x6x128. Too small to extract something further, I think.)

In this stage, should I start building a fully-connected layer for classification?

If so, can I still call this kind of network AlexNet?

Thank you.

Vislab

unread,

Nov 10, 2016, 6:00:42 AM11/10/16

to torch7

You can either resize the images to be 224x224 or you can downsize the alexnet to accept images of 32x32.

Also, take a look at this repo for other alternatives to alexnet: https://github.com/szagoruyko/wide-residual-networks

yongsk

unread,

Nov 16, 2016, 8:33:23 AM11/16/16

to torch7

Thank you for comments.

What you meant from downsizing alexnet includes reduction in the number of layers, the size of kernels, and etc.?

If it does, the downsized network from alexnet is still alexnet even if its architecture is different?

In other words, can I still call the downsized network from alexnet alexnet?

I think it is a silly question, but I am just wondering and want to make things clear.

Thank you!!

Vislab

unread,

Nov 16, 2016, 9:23:44 AM11/16/16

to torch7

Question1: What you meant from downsizing alexnet includes reduction in the number of layers, the size of kernels, and etc.?
Alexnet is generically composed of two parts: a feature part composed of convolutions and maxpools+relu, and by a multiple fully-connected layers. If you pass a 224x224 input through the feature sub-network, in the final layer (maxpool) you'll get an input of 256x7x7 (meaning that the image was converted from a 3x224x224 matrix to 256x7x7). The fully-connected sub-network simply resizes that matrix into 1D and feeds it to a sequence of linear layers. Now, in order for this to work with and input of 3x32x32, you need to remove the last maxpool layer from the feature block and you'll end up with a 256x1x1 matrix (this number checks out, I've tested this with an alexnet network). Then, all you need to do is redefine the number of inputs that the first fully-connected layer accept and you should have a modified version of alexnet for inputs of size 32x32.

PS: the best thing to do would be to resize the input images to 224x224 pixels instead, and then you can use a pre-trained alexnet to fine-tune for whatever dataset you want to use.

Question2: can I still call the downsized network from alexnet alexnet
Sure, just call it a modified version of alexnet.

yongsk

unread,

Nov 17, 2016, 12:33:39 AM11/17/16

to torch7

Thank you so much for clear answer!! :-)

Reply all

Reply to author

Forward