# input preprocessing: 'data' is the name of the input blob == net.inputs[0] transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
Informing the transformer of the necessary input shape
transformer.set_transpose('data', (2,0,1))
Defining the order of the channels in input data (doesn't matter for grayscale images)
transformer.set_channel_swap('data', (2,1,0)) # the reference model has channels in BGR order instead of RGB
Instructions on how to swap the channels (doesn't matter for grayscale images)
transformer.set_mean('data', np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1)) # mean pixel
We set the mean to normalize the data - but do we use the mean of the images used to train the network or the mean of the dataset i want to test?
transformer.set_raw_scale('data', 255) # the reference model operates on images in [0,255] range instead of [0,1]
The code comment explains it
net.blobs['data'].reshape(1,3,227,227)
This I do not really understand - Why is it explicitly setting the shape? We can do it only once, or for every image?
net.blobs['data'].data[...] = transformer.preprocess('data', caffe.io.load_image(caffe_root + 'examples/images/cat.jpg'))
Execute the transormation
transformer.set_mean('data', np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1))
net.blobs['data'].reshape(1,3,227,227)
# input preprocessing: 'data' is the name of the input blob == net.inputs[0] transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})Informing the transformer of the necessary input shape
transformer.set_mean('data', np.load(caffe_root + 'python/caffe/imagenet/ilsvrc_2012_mean.npy').mean(1).mean(1)) # mean pixelWe set the mean to normalize the data - but do we use the mean of the images used to train the network or the mean of the dataset i want to test?
net.blobs['data'].reshape(1,3,227,227)This I do not really understand - Why is it explicitly setting the shape? We can do it only once, or for every image?
transformer.set_transpose('data', (2,0,1))
transformer.set_raw_scale('data', 255)
# input preprocessing: 'data' is the name of the input blob == net.inputs[0] transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})Informing the transformer of the necessary input shape
net.blobs['data'].data[...] = transformer.preprocess('data', caffe.io.load_image(os.getcwd() + "/" + img_filename))
Set the input channel order for e.g. RGB to BGR conversion as needed for the reference ImageNet model. |
Hi thecro...@gmail.comYour post could explain most of the doubts. Thanks for that. However, I was wondering if there is a good documentation of the Transformer class? I am encountering some other confusion too. I'm providing them below.
1. What is the role of the delimiter : in the code - transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape}) ? Is it some sort of slicing delimiter? -- I figured out that the argument is a dictionary and : is used to differentiate between the key and the value. Specifically 'data' is the key and the tuple returned by 'net.blobs['data'].data.shape' (whose value for this particular example is (10, 3, 227, 227)) is the value. So basically the 'inputs' field of the transformer object is initialized with the key-value pair 'data', '(10, 3, 227, 227)'.
2. I'm not also comfortable with transformer.set_transpose('data', (2,0,1)) . I looked into the comments in the code in github. It just says -
Set the input channel order for e.g. RGB to BGR conversion as needed for the reference ImageNet model.
What does (2,0,1) signify here? -- This code snippet also sets a field inside the transformer object with a dictionary. The field is 'transpose' and the key-value pair is 'data', (2,10,1). Will see what these fields do and update the post accordingly.
I got some answers by debugging and Googling. I'm modifying the post below accordingly. The modified text is in Green font.
On Monday, January 18, 2016 at 2:46:25 PM UTC-5, ada...@ucr.edu wrote:
Hi thecro...@gmail.comYour post could explain most of the doubts. Thanks for that. However, I was wondering if there is a good documentation of the Transformer class? I am encountering some other confusion too. I'm providing them below.1. What is the role of the delimiter : in the code - transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape}) ? Is it some sort of slicing delimiter? -- I figured out that the argument is a dictionary and : is used to differentiate between the key and the value. Specifically 'data' is the key and the tuple returned by 'net.blobs['data'].data.shape' (whose value for this particular example is (10, 3, 227, 227)) is the value. So basically the 'inputs' field of the transformer object is initialized with the key-value pair 'data', '(10, 3, 227, 227)'.2. I'm not also comfortable with transformer.set_transpose('data', (2,0,1)) . I looked into the comments in the code in github. It just says -
Set the input channel order for e.g. RGB to BGR conversion as needed for the reference ImageNet model.
What does (2,0,1) signify here? -- This code snippet also sets a field inside the transformer object with a dictionary. The field is 'transpose' and the key-value pair is 'data', (2,0,1). Will see what these fields do and update the post accordingly. -- Say the image is an array of size H x W x K, then the transposing operation (which is done in the line "net.blobs['data'].data[...] = transformer.preprocess('data', caffe.io.load_image(caffe_root + 'examples/images/cat.jpg'))") is going to generate a K x H x W array by just swapping the array axes/dimensions.I can also see that in a couple of lines there is another channel swapping code - transformer.set_channel_swap('data', (2,1,0)). Going to the description of set_channel_swap in github, I found the description is same as the function set_transpose. Why is it needed to perform the channel swap twice? -- This is used in the transformer.preprocess method to convert the channels from RGB to BGR.