Why are convolution layers not followed by activation layers in LeNet MNIST example

Alykhan Tejani

unread,

Jan 23, 2015, 3:00:35 PM1/23/15

to caffe...@googlegroups.com

Hi All,

I am new to caffe and have been looking at the mnis example with the lenet classifier. The layers are defined as below. My question is, why, after the convolution layer, is there no activation layer (sigmoid or relu)? According to Yann's paper (page 8 bottom left):

As in classical neural networks, units in layers up to F6 compute a dot product between their input vector and their weight vector, to which bias is added. This weighted sum is then passed through a sigmoid squashing function to produce the state of unit i

So should the convoloutional layers be followed by a activation as below?

layers {

type: RELU

bottom: "conv1"

top: "conv1"

}

Thanks guys

MNIST example :

layers {

type: CONVOLUTION

bottom: "data"

top: "conv1"

blobs_lr: 1

blobs_lr: 2

convolution_param {

num_output: 20

kernel_size: 5

stride: 1

weight_filler {

type: "xavier"

}

bias_filler {

type: "constant"

}

layers {

type: POOLING

bottom: "conv1"

top: "pool1"

pooling_param {

pool: MAX

kernel_size: 2

stride: 2

}

layers {

type: CONVOLUTION

bottom: "pool1"

type: CONVOLUTION

bottom: "pool1"

top: "conv2"

blobs_lr: 1

blobs_lr: 2

convolution_param {

num_output: 50

kernel_size: 5

stride: 1

weight_filler {

type: "xavier"

}

bias_filler {

type: "constant"

}

layers {

type: POOLING

bottom: "conv2"

top: "pool2"

pooling_param {

pool: MAX

kernel_size: 2

stride: 2

}

layers {

type: INNER_PRODUCT

bottom: "pool2"

top: "ip1"

blobs_lr: 1

blobs_lr: 2

inner_product_param {

num_output: 500

weight_filler {

type: "xavier"

}

bias_filler {

type: "constant"

}

layers {

type: RELU

bottom: "ip1"

top: "ip1"

}

etc

Alykhan Tejani

unread,

Jan 26, 2015, 10:03:42 AM1/26/15

to caffe...@googlegroups.com

Any ideas? Or have I just misunderstood LeNet. As I can see in the bvlc_reference_caffenet this is the pattern that it used.

Thanks

Eric Schmucker

unread,

Mar 17, 2015, 3:07:25 PM3/17/15

to caffe...@googlegroups.com

Did you ever get an answer to this? I've also been unable to understand exactly what is going on with the sample network definitions. I assumed there was a default activation, but that might not be true.

Using the Python code in the "Neural Networks and Deep Learning" online book as a guide, I created a C# program that duplicates the author's reported results of reaching approximately 98% accuracy on the MNIST data after about 20 epochs of training. I've tried to create a similar network (28x28-100-10) with Caffe for comparison, but the "documentation" isn't very clear. What I did try only reached high 80% accuracy and plateaued. I can continue experimenting and/or looking at the code, but I thought it would be easier.

Alykhan Tejani

unread,

Jun 28, 2015, 1:23:48 PM6/28/15

to caffe...@googlegroups.com

Still searching for an answer to this...

On Friday, 23 January 2015 15:00:35 UTC, Alykhan Tejani wrote:

Amogh Gudi

unread,

Aug 17, 2015, 4:32:19 PM8/17/15

to Caffe Users

I had the same confusion. However, after looking at models\bvlc_reference_caffenet\train_val.prototxt architecture (which is for imagenet), it strongly appears that ReLu layer needs to be explicitly applied after each convolutional layer, and that it is not included in the convolutional layer implicitly.

Regardless, I completely agree that caffe's documentation is really incomplete.

Cheers

minesh mathew

unread,

Mar 6, 2016, 5:05:07 AM3/6/16

to Caffe Users

The convolution layers in Lenet model in caffe do not use any activation function or in other words use identity activation.

For any computational layer if no activation is mentioned explicitly , identity activation is applied by default

Jan

unread,

Mar 6, 2016, 6:51:33 PM3/6/16

to Caffe Users

Simply put, a convolutional layer does _only_ convolution, just as an inner product layer only computes the inner product of the weight matrix with the inputs. If you want an activation function, you need to put a layer for it. Caffe does (almost) nothing implicitly, you have to specify all the operations you want performed as layers. Which is probably the best considering both the "intuitive understanding" and "software design" points of view.

Jan

Reply all

Reply to author

Forward