What's the reasoning behind first scaling images to 256x256 and then cropping them to 227x227?

108 views

data-augmentationfinetuneimageslayersprototxttraining

Skip to first unread message

Mario Klingemann

unread,

Jul 15, 2015, 1:43:03 PM7/15/15

to caffe...@googlegroups.com

If I am understanding the "finetune_flickr_style" solver.prototxt correctly all the images are first squeezed to 256x256 and then cropped to 227x227 since that's the data layers input size.

layer {
  name: "data"
  type: "ImageData"
  top: "data"
  top: "label"
  include {
    phase: TRAIN
  }
  transform_param {
    mirror: true
    crop_size: 227
    mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
  }
  image_data_param {
    source: "data/flickr_style/train.txt"
    batch_size: 50
    new_height: 256
    new_width: 256
  }
}

But looking at the caffe code that does the actual cropping it seems that the crop area is always a centered square. So this means that are border of about 15 pixels gets discarded all the time. I would understand it if the crop would always pick a randomly positioned square inside the image area, but since the result is the same all the time, why not scale the image to 227x277 from the beginning?

Reply all

Reply to author

Forward

0 new messages

layer {
	name: "data"
	type: "ImageData"
	top: "data"
	top: "label"
	include {
	phase: TRAIN
	}
	transform_param {
	mirror: true
	crop_size: 227
	mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
	}
	image_data_param {
	source: "data/flickr_style/train.txt"
	batch_size: 50
	new_height: 256
	new_width: 256
	}
	}