A general question about CNN and convolutional layers

35 views
Skip to first unread message

Toposkovich

unread,
Jul 31, 2022, 4:58:54 PM7/31/22
to Keras-users
Hello to everybody,

I have one question, that - maybe... probably stupid - but I couldn't find any answer to it.
My question is about the number of kernels in a convolution layer.
After implementing many LeNet, VGG16, VGG19, etc. I realize that the number of kernels in the convolutional layers is always > 3.

Let's say I would like to create a completely new CNN-network from scratch, then I would think "Ok... since I'm going to process BGR-images (no gray pictures), I have for sure 3 different layers". Then I would think "Since I have 3 layers (one for Green, one for Blue and the last for Red) I need just 3 kernels. One kernel for layers"
Why I would think that? Because to me using e.g. 32 kernels in the same convolutional layer would be redundant, since the 4th kernel would have the same output as the 1st, 2nd or 3rd kernel (one for each channel).

So I don't understand whats the benefit or the target for using more than 3 kernels on RGB-images. I read that increasing the number of kernels, increases the depth of the activation map. But why? Mathematically speaking it is clear... we had another matrix to a tensor. But the reason behind that operation is for me not clear.

I hope I could espress my thought clearly and without misunderstandings.
Thanks for the help.

arpit gupta

unread,
Aug 1, 2022, 2:11:53 AM8/1/22
to Keras-users
Hey,

when you are using kernel size 3x3 that means you are actually using a tensor of size 3x3x3; this last 3 is for your BGR image's 3 different layers. 
3x3x3 means, a matrix of 3x3 kernel with three layers.
Suppose if you start with kernel size 5x5 then it processing as 5x5x3 kernel size. In later layers the number channels keeps on increasing of kernel sizes. 

Matias Valdenegro

unread,
Aug 1, 2022, 4:08:10 AM8/1/22
to keras...@googlegroups.com

Have you read the AlexNet paper?


https://proceedings.neurips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf


Figure 3 in that paper shows what the filter kernels actually learned, what you think does not happen in practice, a diverse set of kernels is needed to extract different features.


On Sunday, 31 July 2022 22:58:54 CEST Toposkovich wrote:

| Hello to everybody,

|

| I have one question, that - maybe... probably stupid - but I couldn't find

| any answer to it.

| My question is about the number of kernels in a convolution layer.

| After implementing many LeNet, VGG16, VGG19, etc. I realize that the number

| of kernels in the convolutional layers is *always > 3*.

|

| Let's say I would like to create a completely new CNN-network from scratch,

| then I would think *"Ok... since I'm going to process BGR-images (no gray

| pictures), I have for sure 3 different layers*". Then I would think "*Since

| I have 3 layers (one for Green, one for Blue and the last for Red) I need

| just 3 kernels. One kernel for layers*"

| Why I would think that? Because to me using e.g. 32 kernels in the same

| convolutional layer would be redundant, since the* 4th kernel* would have

Zoran Sevarac

unread,
Aug 1, 2022, 6:17:56 AM8/1/22
to Toposkovich, Keras-users
Hi,

To put it simply, each kernel learns to detect specific features in a 3x3 pixel square, so the number of kernels corresponds to the number of different features you want to learn.
The number of kernels is the same thing as the number of channels in the convolutional layer.
The number of layers is a different thing, and it is how you combine features recognized by small 3x3 kernels to create a big picture.

I hope this gives you general intuition about how this works.

Best,
Zoran

Virus-free. www.avast.com

--
You received this message because you are subscribed to the Google Groups "Keras-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to keras-users...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/keras-users/12f5f688-b98a-46c8-b843-ecb60d9be3c6n%40googlegroups.com.


--
Zoran Sevarac, PhD, Associate Professor
University of Belgrade, Faculty of Organisational Sciences, Department for Software Engineering
Java Champion | Deep Netts Co-founder & CEO

Lance Norskog

unread,
Aug 2, 2022, 10:27:17 PM8/2/22
to Zoran Sevarac, Toposkovich, Keras-users
Every tutorial on CNNs that I have found stops short of explaining what happens with RGB data.
Is there one that addresses this head-on?

Cheers,

Lance Norskog



--
Lance Norskog
lance....@gmail.com
Redwood City, CA

Davide P.

unread,
Aug 3, 2022, 2:21:02 AM8/3/22
to Lance Norskog, Zoran Sevarac, Keras-users
@Lance: I'm not sure, whether you understood my question. But. to make you happy, my question arose after reading this tutorial: Link.
Despite what has been written above ("when you are using kernel size 3x3 that means you are actually using a tensor of size 3x3x3; this last 3 is for your BGR image's 3 different layers") it stated in the tutorial:
"Each kernel produces a 2D output, called an activation map." And a 2D output is definitely not a tensor but a matrix.
But again... the original question was and is (I try with another approach): Let's say I have defined a convolutional layer with 32 kernels (just as an example). What is the difference between, let's say, the 5th and the 6th kernel? Or the 14th and the 15th kernel? The only difference, to me, is that they are initialized with different random values and nothing else.
Is this correct? Or am I missing something?

 @all: thanks for your help and comments. I'm going to read the AlexNet paper in the next few days.

Lance Norskog

unread,
Aug 3, 2022, 10:16:51 PM8/3/22
to Davide P., Zoran Sevarac, Keras-users
Yes, this is exactly it. Each kernel is initialized with a random set of numbers with a common conditioning algorithm (glorot_uniform, he_normal are examples).
The idea is that applying loss will drive each set of numbers towards a different kernel that happens to calculate a particular aspect of the image.

I have used this library. Scroll down to see a visualization of what each feature map at a layer finds in a picture of a cat:


Lance Norskog

unread,
Aug 3, 2022, 10:20:19 PM8/3/22
to Davide P., Zoran Sevarac, Keras-users
Oh! "Tensor" in deep learning does not mean what it means in "real math". It just means "generalized n-dimensional hypercube of numbers".
I didn't get that far in real math, but I think tensor has a more specific meaning and mathematical toolkit.

In deep learning, a vector is a tensor. A batch of samples is a tensor with a variable dimension on the 0th axis. Text input sentences can be "ragged tensors" which have unique lengths on the 1th axis for every row.

Cheers,

Lance Norskog
Reply all
Reply to author
Forward
0 new messages