Fine-tune VGG or AlexNet for non-square inputs

Evan Weiner

unread,

Nov 20, 2015, 5:41:04 AM11/20/15

to Caffe Users

VGG and AlexNet, amongst others, require a fixed image input of square dimensions. How can one fine-tune or otherwise perform net surgery such that non-square inputs can be provided?

el

unread,

Nov 20, 2015, 5:48:12 AM11/20/15

to Caffe Users

VGG requires images with one dimension fixed to some value and not square inputs. You can preprocess your inputs by resizing them to one fixed dimension, e.g. 384 and keeping the aspect ratio. Check the testing details in the paper.

Evan Weiner

unread,

Nov 21, 2015, 2:56:03 AM11/21/15

to Caffe Users

Thank you El. I have reviewed the testing section (3.2) of the paper. Indeed it matches what you wrote. But I don't understand how to actually test it with Caffe. From the code I see here (https://github.com/BVLC/caffe/blob/master/python/caffe/classifier.py) it seems any input is resized and stretched to the dimensions of the net (the VGG prototxt here: https://gist.github.com/ksimonyan/211839e770f7b538e2d8#file-readme-md) shows 224x224.

I would greatly appreciate if you could provide a Python example on how to achieve non-square input testing with VGG in Caffe.

el

unread,

Nov 23, 2015, 3:20:45 AM11/23/15

to Caffe Users

224x224 is the crop size. When I tried testing images of any size I got an error related to datum.width and desired width. So I made an LMDB with images resized 384-smallest dimension and tested again. Input dimensions in prototxt is the crop size that network requires. For example, 384-smallest dimension pictures are compatible for network to crop the size it wants. "Then, the network is applied densely over the rescaled test image in a way similar..."

Please rescale your data, make it lmdb file and define it as input in the prototxt.

Evan Weiner

unread,

Nov 23, 2015, 9:58:35 AM11/23/15

to Caffe Users

Thanks El. If I want to just perform non-square image testing on the pre-trained VGG model, without changing its default smallest dimension or retraining/tuning, do you mean that I can just rescale a given test image such that it's smallest side is say 224 and the other side may be something larger depending on the original aspect ratio, and the network will be able to forward-pass? And successive images in the testing may be of various sizes? (e.g. all images have a fixed min side size, but the variable size may be different amongst the set)

el

unread,

Nov 24, 2015, 5:48:37 AM11/24/15

to Caffe Users

Yeap. As I understood VGG-19 located in the original site of the project is trained for this kind of pics. So change your testing data by rescaling it appropriately with the smallest dimension of each pic being 384 (height or width) and keeping the original aspect ratio. Then the network can make a forward pass by cropping these inputs and make the prediction you want.

Evan Weiner

unread,

Dec 1, 2015, 9:22:12 PM12/1/15

to Caffe Users

Hi El,

I'm having a problem with using VGG for testing non-square images. Although the VGG paper does mention in section 3.2 that it can handle the entire image, and not require crops, it later mentions in 3.3 that the authors made "significant modifications" to Caffe to enable that behavior. I also noticed on their website it says "Please note that the aforementioned ConvNet toolboxes might not have a readily available implementation of our dense multi-scale evaluation procedure, so the image classification results can be different from ours." I also checked the prototxt files and couldn't find the fully convolutional final layers.

So I'm currently at a loss here on how to use VGG for testing non-square images. If you could provide a working example or other insights, it would be greatly appreciated. Thank you.

Reply all

Reply to author

Forward