AlexNet resizing down to 256x256 aspect ratio [Beginner question]

506 views

00-classificationAlexNetImageNetcaffepycaffe

Skip to first unread message

bruno...@gmail.com

unread,

Mar 5, 2017, 9:18:12 AM3/5/17

to Caffe Users

Hi,

In the original AlexNet paper in section 2 "The Dataset", preprocessing is described in the following way:
"ImageNet consists of variable-resolution images, while our system requires a constant input dimensionality.
Therefore, we down-sampled the images to a fixed resolution of 256x 256. Given a
rectangular image, we first rescaled the image such that the shorter side was of length 256, and then
cropped out the central 256 x256 patch from the resulting image. [...]"
I skipped the following steps: mean substraction, cropping to 227x227 (and not 224x224 which is an error of the paper as far as I understand), etc.

In the classification example coming with caffee (https://github.com/BVLC/caffe/blob/master/examples/00-classification.ipynb), the image is rescaled to 227x227 (and not to 256x256 because the 10 crop step is skipped in the example) without keeping the aspect ratio fixed. So taking the cat image in the notebook, it gets squeezed horizontally and intuitively I would say that this would reduce the quality of the classification

I have checked this and on the cat image following the original procedure described in the AlexNet article does improve the classification:

With "naive" rescaling (as in the notebook):

31.2% 281 n02123045 tabby, tabby cat
23.8% 282 n02123159 tiger cat
12.4% 285 n02124075 Egyptian cat
10.1% 277 n02119022 red fox, Vulpes vulpes
7.1% 287 n02127052 lynx, catamount

With AlexNet article rescaling (rescale to smallest dim then center crop) by executing image = image[:, 60:420, :] before sending the image to the Transformer:

40.5% 281 n02123045 tabby, tabby cat
30.5% 282 n02123159 tiger cat
16.0% 285 n02124075 Egyptian cat
5.6% 287 n02127052 lynx, catamount
0.8% 278 n02119789 kit fox, Vulpes macrotis

Tabby cat is now recognized better, the separation between cat and non-cat categories is now higher (10% instead of 2%) and interestingly, fox category is now very low (<1% instead of 10%): indeed foxes are much more skinny that cats and in my opinion this shows that not preserving the aspect ratio of the image does reduce the quality of the classification.

I have the following questions:
- Is my analysis correct? Did I miss something? (I am new to DNN)
Assuming my analysis is correct:
- How far can I generalize it? Generally speaking are CNN network robust to aspect ratio changes? Or should I expect the effect to occur on other images as well?
- If yes, given the fact that even if "alone" in the picture, the object of interest is not necessarily centered, should the "center crop" to obtain a 256x256 picture be in fact turned into a series of various crops to find the correct 256x256 square that allows for best classification? (in a similar way localisation is done as far as I understand)
- If AlexNet caffe version is indeed trained on aspect ratio preserving rescaled images, is it safe to assume other caffe model that can be found on BVLC page are also trained this way? If not how can I find out the precise preprocessing done on the training set for each model?
- Even if using the original procedure implies that we assume the object of interest is well centered in the original image (which seems to be the case in the ImageNet training set), would not it be better for the tutorial to stick to this original procedure? When reading the original paper and trying to understand all the details, it would be very useful to be able to follow exactly the same steps in the tutorial notebook. By the way, reproducing the exact test procedure of the article (taking a 256x256 image, cropping + mirror to get 10 227x227 images and averaging the output) would help following the original article (and also this would make clear why mean files are 256x256 instead of 227x227, instead of scratching one's head about the hack where one uses the per-channel mean when using the Classifier class, such as in Google's deepdream notebook).
Would it be a good idea to do a PR for a change of the notebook?

Thank you,
Bruno

Reply all

Reply to author

Forward

0 new messages