extracting feature produce no error eventhough the model and the inputs have different dimension

73 views
Skip to first unread message

Pramita Winata

unread,
Feb 2, 2016, 11:08:03 AM2/2/16
to Caffe Users
Hi, 

I am using bvlc_reference_caffenet.caffemodel to extract my features.  As I know this model is trained with RGB images.
I would like to extract features of grayscale images with this model. I was expecting to get some error but I did not.

How the model handles the input images ? Does it replicate the intensity into 3 channels ?

Thanks!!

Jan C Peters

unread,
Feb 3, 2016, 4:52:19 AM2/3/16
to Caffe Users
The number of color channels in the input images is handled in the same way as the number of computed feature maps after each conv layer.

As for your question: caffe does not do any replication or other manipulation of the input (other than the explicitly specified transforms, like mean-file utilization). But of course it could be that your grayscale images actually have three channels (with equal R, G and B components at each pixel).

Apart from the technicalities: How a RGB-trained model performs on grayscale images in general is of course a completely different question. But it would be interesting to investigate that. From the technical point of view I'd probably try it with 3-channel grayscale images, since then you do not have to modify anything. If you just use 1-channel images caffe should actually complain about a shape mismatch in the first conv layer.

Jan

Pramita Winata

unread,
Feb 3, 2016, 9:25:31 AM2/3/16
to Caffe Users
Yes, indeed. 
¨ If you just use 1-channel images caffe should actually complain about a shape mismatch in the first conv layer¨ 
-> This is the part that makes me confused as it does not throw any error for me. 1-channel images still works in the 3-channel pre-trained model. I can extract the features from every layer without any error. 
My guest would be : 
1. The model is bvlc_reference_caffenet.caffemodel , can handle both types of images. but how dos it handle it ?
2. The input is wrong. This has been verified that they are a 1-channel images. 

Any insights?

Jan C Peters

unread,
Feb 3, 2016, 9:48:33 AM2/3/16
to Caffe Users
That is indeed puzzling... The code that loads in the trained weights (found at https://github.com/BVLC/caffe/blob/master/src/caffe/net.cpp#L805) actually checks for the number and size of the parameter blobs, and gives a fatal error if not matched. Maybe you can put some debug outputs in there and recompile your caffe to see what is actually happening there.

bvlc_reference_caffenet uses 3 channels. My guess would be that you actually just fill one channel of the input blob, the rest is just empty/zero but still forwarded through the network. How exactly do you fill the input blobs/feed your image through the network?

Jan

Pramita Winata

unread,
Feb 4, 2016, 8:59:17 AM2/4/16
to Caffe Users
For update, 

I did debug it line by line, not efficient indeed. Aparrently, the ImageDataLayer uses ¨CV_LOAD_IMAGE_COLOR¨ to read the input images. 
This CV_LOAD_IMAGE_COLOR will replicate the intensity to 3 channels if the input is 1-channel. So, if the input is 1-channel, the resulting output will have 3 channels of replicated values. 

Hope it helps. 

Jan C Peters

unread,
Feb 5, 2016, 7:31:59 AM2/5/16
to Caffe Users
Ok, I did not know that use the ImageDataLayer, which is responsible for the channel replication. The DataLayer and HDF5DataLayer will not mess with your input data. Ok, then your initial question should be answered. How a network trained on color will perform on grayscale images is more of a research question, so you will just need to try and find out.

Jan
Reply all
Reply to author
Forward
0 new messages