If I am understanding the "finetune_flickr_style" solver.prototxt correctly all the images are first squeezed to 256x256 and then cropped to 227x227 since that's the data layers input size.
layer { |
| name: "data" |
| type: "ImageData" |
| top: "data" |
| top: "label" |
| include { |
| phase: TRAIN |
| } |
| transform_param { |
| mirror: true |
| crop_size: 227 |
| mean_file: "data/ilsvrc12/imagenet_mean.binaryproto" |
| } |
| image_data_param { |
| source: "data/flickr_style/train.txt" |
| batch_size: 50 |
| new_height: 256 |
| new_width: 256 |
| } |
| } |
But looking at the caffe code that does the actual cropping it seems that the crop area is always a centered square. So this means that are border of about 15 pixels gets discarded all the time. I would understand it if the crop would always pick a randomly positioned square inside the image area, but since the result is the same all the time, why not scale the image to 227x277 from the beginning?