What is the significance of the number of convolution filters in a convolutional network?

Hossein Hasanpour

unread,

Feb 15, 2016, 6:34:01 AM2/15/16

to Caffe Users

Hello all I'm a newbie in this field and I would be grateful if anyone could help me .
What does the number of filters in a convolution layer convey?
How does this number effect the performance or quality of the architecture? I mean should we always opt for a higher numbers of filters? whats good of them?
and How does people assign different number of filters for different layers ? I mean looking at this question : How to determine the number of convolutional operators in CNN?
The answer specified 3 convolution layer with different numbers of filters and size, Again in this question : number of feature maps in convolutional neural networks you can see from the picture that, we have 28*28*6 filters for the first layer and 10*10*16 filter for the second conv layer.

I tried to add up to 5 convolution layer like this ( input->C1-relu-pool1-norm1 ->C2-relu-pool2-norm2 ->C3-relu-pool3-norm3 ->C4-relu-pool4-norm4 ->C5-relu-pool5-norm5 ->fullyconnected->

softmaxwithloss
to make CIFAR10 get better accuracy! yet it stuck at 0.1!!!!. (in case you want to have a look at it I posted the net configuration here : https://groups.google.com/forum/#!topic/caffe-users/93y7Vpvhk3Q )

How do they come up with these numbers, Is this through trial and error?

and by the way, Shouldn't we always do reluing after pooling ? since pooling would reduce the amount of computations needed for relu, imho, it makes no sense, to first compute all elements and then subsample them using pooling .
Am I right or wrong ?

Thanks in advance

Jan C Peters

unread,

Feb 15, 2016, 8:17:44 AM2/15/16

to Caffe Users

Hello,

see interleaved comments.

Am Montag, 15. Februar 2016 12:34:01 UTC+1 schrieb Hossein Hasanpour:

Hello all I'm a newbie in this field and I would be grateful if anyone could help me .
What does the number of filters in a convolution layer convey?
How does this number effect the performance or quality of the architecture? I mean should we always opt for a higher numbers of filters? whats good of them?

Well, more filters means more trainable parameters. This can be both a good and a bad thing: It is good because there are more degrees freedom to fit the data better, and this can also be bad, since it paves the way for overfitting. But how many are actually a good choice depends heavily on your problem, and I have no better idea than to find out by trial and error.

and How does people assign different number of filters for different layers ? I mean looking at this question : How to determine the number of convolutional operators in CNN?

Your question is valid, but the linked question does not provide any useful info on that topic. And neither have I a good answer to this question, other than "look at similar cases, do a lot of trial-and-error".

The answer specified 3 convolution layer with different numbers of filters and size, Again in this question : number of feature maps in convolutional neural networks you can see from the picture that, we have 28*28*6 filters for the first layer and 10*10*16 filter for the second conv layer.

Yeah. And?

I tried to add up to 5 convolution layer like this ( input->C1-relu-pool1-norm1 ->C2-relu-pool2-norm2 ->C3-relu-pool3-norm3 ->C4-relu-pool4-norm4 ->C5-relu-pool5-norm5 ->fullyconnected->
softmaxwithloss
to make CIFAR10 get better accuracy! yet it stuck at 0.1!!!!. (in case you want to have a look at it I posted the net configuration here : https://groups.google.com/forum/#!topic/caffe-users/93y7Vpvhk3Q )

How do they come up with these numbers, Is this through trial and error?

Probably. That and experience.

and by the way, Shouldn't we always do reluing after pooling ? since pooling would reduce the amount of computations needed for relu, imho, it makes no sense, to first compute all elements and then subsample them using pooling .
Am I right or wrong ?

When you do MAX-pooling, that is already a non-linear operation. Having another one (a ReLU or sigmoid) does not seem improve the performance much if at all. At least in my experience it didn't (on the other hand they also didn't really hurt performance). Maybe the authors of these networks made similar observations. In the original LeNet-5 there are also no nonlinearities after the pooling layers.

Thanks in advance

Jan

Message has been deleted

Hossein Hasanpour

unread,

Feb 15, 2016, 8:57:51 AM2/15/16

to Caffe Users

Thank you very much, I really appreciate it.
So basically there is no rules of thumb.
By the way, I noticed, people use transformations such as cropping and the likes on their images, and this actually improves the network performance. Where can I find these transformation options for caffe, I couldnt find any of them in the documentation!( where they explained different types of layers and solver options.)
If you dont mind I have another question to ask, In Caffe we have something called groups for a convolution layer, right? whats the use of this ? I read the documentation but I dont get it.
Its for weight sharing (parameter sharing) righ? how does it work? if g=1, does it mean we have weight sharing among all parameters in that layer? and if g=3 it means we have 3 distinct set of weights ?!

Jan C Peters

unread,

Feb 15, 2016, 9:47:05 AM2/15/16

to Caffe Users

Things like cropping are not implemented in caffe itself afaik, you will have to do it on your own. And cropping is not used in training, only later with the trained network several crops are predicted by the network and the most frequently predicted label will be selected as prediction for the whole image. That is no property/functionality/layer/whatever of the network itself, it is just a meta-processing "trick" to make your network look a better classifier than it actually is, so to speak. It is somewhat similar to boosting. It basically yields translational invariance to some degree (which conv layers already have, but the innerproduct layers do not by default).

Groups in conv-layers: This basically divides the filters in one convlayer into, well, groups, that operate only on a fraction of the incoming data channels/feature maps. Take for example a convlayer that outputs 8 feature maps and inputs 6 feature maps. Now if you set group=2, the first half of the filters in the conv-layer will operate on the first half of the input feature maps/image channels, i.e. the first four outputs channels will be made only from the first three input channels. And the second four output feature maps are computed from the second three input channels only.

Jan

Hossein Hasanpour

unread,

Feb 15, 2016, 11:06:43 AM2/15/16

to Caffe Users

Thank you very much again ;)

Reply all

Reply to author

Forward