Help in understanding Convolution Network

Arkadi Kagan

unread,

Mar 11, 2016, 4:12:12 PM3/11/16

to Caffe Users

Hello everybody,

I need a help in understanding Convolution Network.

I built Caffe and run Python sample from
http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/00-classification.ipynb
The batch size is changed to 1 for simplicity.

After analyzing the cat picture, here is the content of the net:

>>> net.blobs['data'].data.shape
(1, 3, 227, 227)
>>> net.params['conv1'][0].data.shape
(96, 3, 11, 11)
>>> net.blobs['conv1'].data.shape
(1, 96, 55, 55)

That is, apply 96 kernels of size (3, 11, 11)
on a picture (3, 227, 227).

The layer 'conv1' is defined as
    num_output: 96
    kernel_size: 11
    stride: 4

Therefore, (4 * 55 + (11 - 4)) = 227.
That is, loop 96 given kernels
    for each (3, 11, 11) kernel, convolve the (3, 227, 227) picture
    with stride 4, i.e. skip 4 values on each step

Farther, if I do this loop, I get roughly the same numbers as in net.blobs['conv1'].data:

def convolve(kernels, pictures, stride):
    result = []
    for p in pictures:
        res1 = []
        for k in kernels:
            res2 = []
            for i in range(0, pictures.shape[2] - (kernels.shape[2] - stride), stride):
                res3 = []
                for j in range(0, pictures.shape[3] - (kernels.shape[3] - stride), stride):
                    m = p[:, i : i + kernels.shape[2], j : j + kernels.shape[3]] * k
                    res3 += [np.sum(m)]
                res2 += [res3]
            res1 += [res2]
        result += [res1]
    return np.array(result)

c1 = convolve(net.params['conv1'][0].data, net.blobs['data'].data, 4)
c1[c1 < 0] = 0 # prune negative

However, on the next convolution layer 'conv2', I don't understand this numbers:

>>> net.blobs['norm1'].data.shape
(1, 96, 27, 27)
>>> net.blobs['conv2'].data.shape
(1, 256, 27, 27)
>>> net.params['conv2'][0].data.shape
(256, 48, 5, 5)

The layer 'conv2' is defined as
    num_output: 256
    pad: 2
    kernel_size: 5
    group: 2

In this case the kernel size is twice smaller than the "picture"
kernel of size (48, 5, 5)
convolved on "picture" of size (96, 27, 27)
Stride is not defined, so I assume it's 1.
23 + (5 - 1) = 27, therefore there must be 23 steps in each direction to convolve 27x27 flat rectangle.
There are, evidently 27.
I don't undrestand where 256 comes too.

Any corrections, suggestions or useful links are greatly appreciated.

Nam Vo

unread,

Mar 11, 2016, 5:20:06 PM3/11/16

to Caffe Users

It's because conv2 has:
group: 2

It is basically splitted, due to legacy reason (read AlexNet paper). These days nobody does that.

Nam Vo

unread,

Mar 11, 2016, 5:23:34 PM3/11/16

to Caffe Users

256 is the number of filters, you specify that with
num_output: 256

Arkadi Kagan

unread,

Mar 13, 2016, 5:09:43 AM3/13/16

to Caffe Users

Thanks for prompt reply.

I read again this "ImageNet Classification with Deep Convolutional Neural Networks":

The first convolutional layer filters the 224 x 224 x 3 input image with 96 kernels of size 11 x 11 x 3
with a stride of 4 pixels (this is the distance between the receptive field centers of neighboring

neurons in a kernel map). The second convolutional layer takes as input the (response-normalized
and pooled) output of the first convolutional layer and filters it with 256 kernels of size 5 x 5 x 48.

The dimensions fit to what I see, so I hope I am looking into the right paper.

> It is basically splitted, due to legacy reason (read AlexNet paper).

I did not found any mention of this split.
The resulting data is still (256, 48, 5, 5), therefore the two parts are merged back somehow.
I tried to convolve the two parts separately:

c2_1 = convolve(net.params['conv2'][0].data, net.blobs['norm1'].data[:,:48], 1)
c2_2 = convolve(net.params['conv2'][0].data, net.blobs['norm1'].data[:,48:], 1)
c2_1[c2_1 < 0] = 0 # prune negative
c2_2[c2_2 < 0] = 0 # prune negative

The result is too different from the original.
How do we merge them back?
How do I get definition of what "group" really means?

Another question: why do I get numbers different from the original even with "conv1", where my numbers are very close but not exact somehow?
Is it a round-up error?

Thanks,
Arkadi.

Arkadi Kagan

unread,

Mar 13, 2016, 5:30:49 AM3/13/16

to Caffe Users

Answering part of my own question:
http://caffe.berkeleyvision.org/tutorial/layers.html

group (g) [default 1]: If g > 1, we restrict the connectivity of each filter to a subset of the input. Specifically, the input and output channels are separated into g groups, and the i'th
output group channels will be only connected to the i'th input group channels.

I have to think about it yet.

Nam Vo

unread,

Mar 13, 2016, 2:08:27 PM3/13/16

to Caffe Users

I think they split both the data and the filters, but who cares lol? If you do I'm afraid that you'll have to read their code.

Arkadi Kagan

unread,

Mar 13, 2016, 6:03:28 PM3/13/16

to Nam Vo, Caffe Users

It must be possible to train new network without "group > 1".

However, all pre-trained networks (https://github.com/BVLC/caffe/tree/master/models), except Google-Net, use it.

Unfortunately, I don't have capacity to train my own network.
Google-Net, at least on first glance, have even more parameters to hide the convolution itself.

Do you have a link into a pre-trained network without "group"?

--
You received this message because you are subscribed to a topic in the Google Groups "Caffe Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/caffe-users/mA9zPdnCuTM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/77fef1e3-95a3-4e33-97e3-cf4a6f4baabf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nam Vo

unread,

Mar 13, 2016, 6:29:02 PM3/13/16

to Caffe Users, nam....@gmail.com

Check caffe model zoo, all networks except alexnet/caffenet shouldn't have that group thing.

Reply all

Reply to author

Forward