How do I add a slice? layer to crop gray images?

Paul Krush

unread,

May 10, 2017, 11:22:36 AM5/10/17

to Caffe Users

For example instead of inferring a batch of 64 28x28 images and adding 64 results together why can I add a layer to the network and crop out these 64 images from a 224x224 input image? It seems this would be more elegant and faster.

How do you do this? I find it odd I can’t find slice examples like this and I am guess I must be using the wrong terms or someone asking the question wrong.

I tried the slice layer but it keep wanting to slice the 8 bit gray. For example to create four 224x224 2bit images.

Any Ideas?

By the way my application is really cool! I am doing unsupervised grouping of 3D objects using many different lighting angles. This eliminates manual labeling of classes!

https://github.com/GemHunt/lighting-augmentation

Thanks Much! Paul Krush

Przemek D

unread,

May 11, 2017, 2:00:56 AM5/11/17

to Caffe Users

Slice is not what you're looking for, as it only allows you to, well, slice through a blob in a given dimension at a given point. An example use of slicing is separating RGB channels of an image, where an Nx3xHxW blob is sliced into 3 blobs of shape Nx1xHxW.
What you want to do is to crop the image. It is possible in caffe, but a bit tedious since you need to specify a separate crop layer for each subimage that you want to get from your large one. Also a bit uncomfortable because you do not pass crop dimensions explicitly - instead, you pass another bottom blob that is supposed to have dimensions of your desired crop (DummyData might be of help here), and some offset. Best would be to look for usage examples, I know FCN uses crop so you might want to check out their prototxts. Once you get your cropped images, it's entirely possible to merge them into a batch using concat.
Having said that, I do not guarantee it will be either faster, nor (especially) more elegant. I prefer to keep my nets as simple as possible, and do all preprocessing like that before loading data. Easier to debug a python script that crops and merges images than a network you want to force into doing that. Just my opinion though.

Paul Krush

unread,

May 11, 2017, 10:44:35 AM5/11/17

to Caffe Users

Wow that was a great answer to get me thinking. So it's clear the Slice and Crop layers are not what I want. I know some implementations of FCN use a sliding window, and of course RCNN does. I want to use the sliding window to create a new blob, if that it’s possible.

Getting a 64x64 heat map would be really cool as well to display what angles perform the best.

Paul Krush

unread,

May 11, 2017, 10:56:43 AM5/11/17

to Caffe Users

Corrections:

"Sliding Windows" might use the Crop layer?

"64x64 heat map" makes no sense. It would be cool to tint the image with the 64 crops in it to visualize lighting angle performance.

Nathan Ing

unread,

May 11, 2017, 8:50:52 PM5/11/17

to Caffe Users

Just throwing my 2c in to agree with Przemek: This sounds like pre/post processing that could simply be done with a standalone python script, or if you really want with a python layer. Unless I'm wrong, you don't actually want to consider all the angles at once, but you want to compare the performance of different lighting angles for separating your inputs.. ?

Interesting application :)

Paul Krush

unread,

May 11, 2017, 10:10:06 PM5/11/17

to Caffe Users

Thanks Nathan!

I would like to try to consider all the results at once.It's small enough to do it. Right now I add them together at the end.

I am pretty pumped about this. I got 100% accuracy on a 1000 coins(57000 images) on heads vs tails. This was totally unsupervised, no manual training. Just a tiny LeNet network, and no tuning or fancy layers. It's not a hard problem, but still you would expect a few dirty or worn coins to throw this off.

It’s not hard with pre and post processing, but why if I don’t have to... The reason I am asking is I might want to do a 1000 files at once. This gets into millions of files so you have to build your own lmdbs and write custom C++ code infer because pyCaffe gets slow as well. All of this I have done and it works great, but I would like to avoid it if I can. It’s hard to show this to someone in a quick demo.

I need to spend more time learning about how RCNN is coded with the sliding window.

Przemek D

unread,

May 12, 2017, 3:47:55 AM5/12/17

to Caffe Users

FCN sliding window approach is something different to what you - I think - want to accomplish. It only allows you to free the network from the input shape dependency, i.e. you can feed images of any size the network (provided it fits in memory etc.). Then instead of having one prediction vector for a fixed image size, you get an image of predictions, for the custom (assuming: larger) image. This can be envisioned as sliding a fixed-input network over an image, producing a prediction vector for each spatial location on it.

This works, because for each spatial location the rest of the image does not matter (only the local information withing the current position is used). I suppose you're trying to simultaneously extract information from all the subimages, since there is a strong correlation between one pixel, and another one 28 pixels in any given direction (subimage size). For this approach, I still think you're better off with preprocessing done outside the network rather than trying to create an elaborate layer combination just to reshape the data. If you deployed this net in some practical application, you could simply have the acquisition system stack images in the channel dimension instead of concatenating them horizontally/vertically, causing no slowdown to the entire thing.

PS: Speaking of reshape... since your problem is simply a matter of aligning pixels in memory a little bit differently, maybe there would be a way to just use the existing reshape layer? It's just an idea though because I don't know how exactly does it move pixels around - a quick attempt at directly reshaping (1,3,448,448)->(1,192,56,56) did not work well.

Paul Krush

unread,

May 12, 2017, 4:03:29 PM5/12/17

to Caffe Users

I will have to play with reshape as well. This sounds like what I am looking for. I am going to call this done for now.

Reply all

Reply to author

Forward