convolutional filters learnt are random

PC

unread,

Jun 23, 2016, 10:57:13 AM6/23/16

to Caffe Users

I've designed a simple network with 1 convolutional layer, 1 pooling layer, 2 fully connected layers and a softmax output layer. The network is a binary classifier and it performs with an accuracy of around 80%. I wanted to know what the filters were learning so followed this ipython notebook: https://github.com/BVLC/caffe/blob/rc2/examples/filter_visualization.ipynb. Here are the layer features and their shapes:

[('data', (1, 3, 64, 64)),
 ('conv1', (1, 48, 55, 55)),
 ('pool1', (1, 48, 27, 27)),
 ('ip1', (1, 500)),
 ('ip2', (1, 2)),
 ('prob', (1, 2))]

And here is the filter visualisation: I can't work out they haven't learnt anything even though the convnet is successful in it's classification. Any advice/reasoning would be greatly appreciated!

mprl

unread,

Jun 23, 2016, 11:35:59 AM6/23/16

to Caffe Users

I don't understand what you expected to see. There is no reason to see shapes or things like that on these filters, as it is combination of these filters (by the two fully connected layer) that classify the images.
To me, everything is normal here !

PC

unread,

Jun 23, 2016, 12:27:30 PM6/23/16

to Caffe Users

I was under the impression that the network will learn filters that activate when they see some type of visual feature such as an edge of some orientation or a blotch ect..In the above case the filters seemed to have learned nothing?

oeb

unread,

Jun 23, 2016, 4:30:43 PM6/23/16

to Caffe Users

First of all, I think this might be explained by your lack of nonlinearities. Nonlinearities allow the network to (theoretically) be able to approximate any function, and without them your network is basically just averaging all those samples. Add TanH between all linear operations, I found they work very well for small networks.

Secondly, even if you had nonlinearities, the large size of the IP layer might cause your conv to look more like noise -- the major processing is happening within those 500 neurons. I would reduce it substantially to move some computation to the conv layers... and thirdly:

It looks like your convolution has a large kernel size. You should probably replace it with a chain of several 3x3 -- such a chain would have the same receptive field but more "wiggle room" to learn, due to nonlinearities, and it would have less weights to learn.

PC

unread,

Jul 1, 2016, 6:07:09 AM7/1/16

to Caffe Users

Hi oeb,

I redesigned the net as follows:

[('data', (1, 3, 64, 64)),
 ('conv1', (1, 64, 60, 60)),
 ('pool1', (1, 64, 30, 30)),
 ('conv2', (1, 96, 30, 30)),
 ('conv3', (1, 128, 30, 30)),
 ('conv4', (1, 256, 30, 30)),
 ('pool2', (1, 256, 15, 15)),
 ('ip1', (1, 500)),
 ('ip2', (1, 2)),
 ('prob', (1, 2))]

After re-training accuracies obviously went up but the filters remained random. I realise i kept the large fully connected layer at the bottom - could this still be the reason? Classification is binary, could it be that there is little difference between the two classes and therefore there is only a vague classification boundary between the two? These are examples of each class, class 0 on the left and class 1 on the right:

PC

unread,

Jul 1, 2016, 6:12:24 AM7/1/16

to Caffe Users

Parameter shapes: (relu nonlinearities after each conv layer)

[('conv1', (64, 3, 5, 5)),
 ('conv2', (96, 64, 3, 3)),
 ('conv3', (128, 96, 3, 3)),
 ('conv4', (256, 128, 3, 3)),
 ('ip1', (500, 57600)),
 ('ip2', (2, 500))]

PC

unread,

Jul 15, 2016, 4:42:53 AM7/15/16

to Caffe Users

bumping: does anyone with more experience in deep-learning have a reasoned explanation for this?

oeb

unread,

Jul 15, 2016, 9:48:48 AM7/15/16

to Caffe Users

I would try the following:

Remove the 500 IP layer (remain only with the 2 output one),

Force the last conv layer to have only 2 outputs, and split it into 2 separate layers along the channel axis

so the IP weights will be decoupled (can also be accomplished by setting IP axis = 2, although splitting it explicitly using slice layer and 2 IP layers is more legible).

Display final conv layer after training.

Due to the channel splitting and the 2 layer IP, the final conv layer activations will be forced (in my experience) to have some intuitive interpretation. From that point, change the architecture in small increments to improve loss, while ensuring that the intuitive interpretation you're seeing doesn't go away.

Hossein Hasanpour

unread,

Jul 15, 2016, 11:13:52 AM7/15/16

to Caffe Users

I think you misunderstood something.

visualizing the filters doesnt get you anywhere useful imho, if you need to see what the filters are learning, detecting edges in the first layers and other abstract concepts in higher layers, you should visualize the feature maps not the filters!

this toolbox can easily provide you what you need : https://github.com/yosinski/deep-visualization-toolbox

using feature maps you can see the effect of a filter and thus understand what its doing.

Reply all

Reply to author

Forward