Processing of activation map passing to next conv layer...

29 views
Skip to first unread message

Brett

unread,
Mar 24, 2021, 8:31:30 AM3/24/21
to deep-learning-illustrated
Hello,

The figures on page 164-166 illustrate very well how from a colored image (a volume 6x6x3) get convolved with a volume kernel K (3x3x3) to produce a single convolution map (figure 10.5). I notices that the slices of the filter K are different matrices for each one of the different channels.

Since there are 16 kernels (F1 to F16) in the 1st Conv layer, the same process is repeated on the same image 15 more times (total of 16 convolutions). the final result is an activation map AM1 (32x32x16). Each slice of that volume is a convolution associated to one of the 16 filters. So far so good.

Ignoring pooling for now, suppose there is a 2nd Conv layer with 10 filters (F17 to F27). First of all, I believe these filters are not 3D arrays but only 2D arrays. Is that correct?

How does each one of those 10 filters in the 2nd convolutional layer act on the the activation map AM1 previously generated? For example, does filter F17 act on all 16 slices of volume AM1 to produce 16 convolutions? Same goes for F18, F19, etc.
We would end up with 160 slices ( 16 convolutions times 10: 1) that get stored in the new activation map AM2.

Is that what happens?

To pass AM2 to the fully connected ANN, we must convert the last activation map AM2 to a row vector concatenating all the rows of all the slices in the volume AM2. Is that what happens?

Thank you in advance,
Brett

Grant Beyleveld

unread,
May 5, 2021, 11:18:14 AM5/5/21
to deep-learning-illustrated
Hi Brett,

Thanks for reading our book and thanks for your question. I hope this response effectively addresses your question.

First, the numerical examples given in figures 10.3 - 10.5 are small and simple enough to fit on a page, whereas the graphical example in figure 10.6 is distinct and represents better the kind of dimensions one might actually work with in a computer vision problem. Figure 10.6 is not the direct logical conclusion of the examples in 10.3 - 10.5.

We like to think of the output from a convolutional layer with any number of filters (this output is called an activation map) as something of a latent image itself. So moving forward with the example in Figure 10.6 (a 32x32x3 image convolved with 16 filters), the activation map produced is 32x32x16 (ignoring kernel size, stride length and padding and so assuming the output is the same size as the input - explained in detail on pages 168-169).

When this output is fed into the next layer (in your example, with 10 filters), each filter there is indeed a 3D array: much like in the first layer, each filter has as much depth as the input (here, an activation map instead of an image) which is 16. The resulting output from this layer would be another activation map with the dimensions 32x32x10.

Extending your example, a third layer with 8 filters would yield a 32x32x8 activation map, with each of those 8 filters having a depth of 10 to match the input.

Hopefully that's clear now — let us know if not!

Grant and Jon
Reply all
Reply to author
Forward
0 new messages