Hello everybody,
I need a help in understanding Convolution Network.
I built Caffe and run Python sample from
http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/00-classification.ipynbThe batch size is changed to 1 for simplicity.
After analyzing the cat picture, here is the content of the net:
>>> net.blobs['data'].data.shape
(1, 3, 227, 227)
>>> net.params['conv1'][0].data.shape
(96, 3, 11, 11)
>>> net.blobs['conv1'].data.shape
(1, 96, 55, 55)
That is, apply 96 kernels of size (3, 11, 11)
on a picture (3, 227, 227).
The layer 'conv1' is defined as
num_output: 96
kernel_size: 11
stride: 4
Therefore, (4 * 55 + (11 - 4)) = 227.
That is, loop 96 given kernels
for each (3, 11, 11) kernel, convolve the (3, 227, 227) picture
with stride 4, i.e. skip 4 values on each step
Farther, if I do this loop, I get roughly the same numbers as in net.blobs['conv1'].data:
def convolve(kernels, pictures, stride):
result = []
for p in pictures:
res1 = []
for k in kernels:
res2 = []
for i in range(0, pictures.shape[2] - (kernels.shape[2] - stride), stride):
res3 = []
for j in range(0, pictures.shape[3] - (kernels.shape[3] - stride), stride):
m = p[:, i : i + kernels.shape[2], j : j + kernels.shape[3]] * k
res3 += [np.sum(m)]
res2 += [res3]
res1 += [res2]
result += [res1]
return np.array(result)
c1 = convolve(net.params['conv1'][0].data, net.blobs['data'].data, 4)
c1[c1 < 0] = 0 # prune negative
However, on the next convolution layer 'conv2', I don't understand this numbers:
>>> net.blobs['norm1'].data.shape
(1, 96, 27, 27)
>>> net.blobs['conv2'].data.shape
(1, 256, 27, 27)
>>> net.params['conv2'][0].data.shape
(256, 48, 5, 5)
The layer 'conv2' is defined as
num_output: 256
pad: 2
kernel_size: 5
group: 2
In this case the kernel size is twice smaller than the "picture"
kernel of size (48, 5, 5)
convolved on "picture" of size (96, 27, 27)
Stride is not defined, so I assume it's 1.
23 + (5 - 1) = 27, therefore there must be 23 steps in each direction to convolve 27x27 flat rectangle.
There are, evidently 27.
I don't undrestand where 256 comes too.
Any corrections, suggestions or useful links are greatly appreciated.