Hi all,
I am looking into the netparameter dumped for lenet, as well as the lenet.prototxt available in the examples. And I got questions on how one maps the proto to actual construction (or connections) of the network, at the 2nd convolutional layer. Specifically,
1. I understand the first convolution and pooling layer. Starting with a 28*28 input image, we first build 20 feature maps, each of size 24*24, using 20 kernels of 5*5. In other words, for each of the 20 feature maps, all the units in that feature map share a kernel of 5*5 weights and 1 bias. Then we do pooling of 2*2 (stride 2), and so we get 20 feature maps, each of size 12*12. This matches what I got from the dumped parameters, the first blob of conv1 has parameters 20*1*5*5.
2. My question comes when looking at the second conv layer, where the dimension becomes 50*20*5*5 (this indicates that there should be 50*20 kernels, each of size 5*5). Conv2 says that the output number is 50. I understand this as that there should be 50 feature maps. But how does these 50 feature maps connect to the 20 input 12*12 input feature maps? One way I guess is that for each i of the 50 feature maps, and each j of the input feature maps, we have a kernel of size 5*5 (shared). If this is the case, it means that for example, for the first neuron of the first output feature map, there are 20 kernels connect to it (this neuron acutally connects with *every* 5*5 neighborhood through a shared kernel). This does give the parameters shown up in the dumped proto: there are 50 * 20 * 5 * 5 kernel weight parameters. However, it does not explain why there is only 50 bias parameters..
3. From the above two points, it seems that caffe is assuming some implicit way to construct, or connect the neurons. What is this implicit way? Is it explicitly stated somewhere?
Best.