Scaling and Image Mean

Yotam Hechtlinger

未读，

2015年4月23日 05:03:112015/4/23

收件人 caffe...@googlegroups.com

Hello Everyone,

I am trying to use the MNIST architecture to create a 0-1 classification of people smiling vs not.
How do I know which scale to use if I have 128x128 gray scale images rather than 28x28 (in other words - why 1/256 in MNIST?)
Also, can someone please explain how does the image mean (or per pixel mean) works?
I am encountering difficulties figuring out the correct way to input the data using "ImageData" layer, and any help will be greatly appreciated.

Thanks a lot!
Yotam.

Nikiforos Pittaras

未读，

2015年4月23日 05:32:292015/4/23

收件人 caffe...@googlegroups.com

You should use the scale in which the network was trained in.

The image mean is subtracted from each image before it is fed to the network. In my understanding, it's done in order to eliminate noise and redundant information.

Here's my imageData layer that i use for feature extraction. Change the phase to TRAIN if you want to use it for training instead.

layer {

name: "data"

type: "ImageData"

top: "data"

top: "label"

include {

phase: TEST

}

transform_param {

mirror: false

crop_size: 224

mean_file: "path/to/mean/file/imagenet_mean.binaryproto"

}

image_data_param {

source: "path/to/textFile/with/image/paths"

new_height: 256

new_width: 256

batch_size: 1

}

Keep in mind that the textfile has to be in the format:

Path/To/Images/image1.jpg 0

Path/To/Images/image2.jpg 0

....

Also, you should potentially modify the number of output classes of the net, if it does not already output 2 classes.

Yotam Hechtlinger

未读，

2015年4月23日 16:24:172015/4/23

收件人 caffe...@googlegroups.com

Hello Nikiforos,

Thanks, that is very helpful. I am training a network from scratch, not using the pre-trained image net one.

So how do I calculate the image mean? and/or use the scale parameter so that it works on images on a different size than 28*28?

Also, what is the difference between the image mean and the scale? and why did you choose crop_size = 224?

Thanks a lot!

Yotam.

npit

未读，

2015年4月23日 17:01:222015/4/23

收件人 caffe...@googlegroups.com

Image mean = the "average image" of the training dataset. It is calculated with the compute_image_mean C++ binary. Think of it as a DC offset for every image, that you can strip away and keep only the meaningful "AC" variations from that offset.

Image scale = the input size of an image for a network. If a network has been trained in MxN images, you should provide such a size of images for testing.

To my understanding, that is a user-selected parameter. You can try and set a different size.

Since the ILSVRC challenge contains a variety of object classes (dog, cat, chair, bike), many nets for that task use a "medium" image size , 200-something x 200-something.

I have no experience with MNIST, but I am guessing that since it's a digit recognition challenge, you can afford using small image sizes.Thus the 28x28 scales.

Of course, it would be interesting to see the performance-accuracy trade off of smaller scales on imagenet.

回复全部

回复作者