Why are convolution layers not followed by activation layers in LeNet MNIST example

1,981 views
Skip to first unread message

Alykhan Tejani

unread,
Jan 23, 2015, 10:00:35 AM1/23/15
to caffe...@googlegroups.com
Hi All,

I am new to caffe and have been looking at the mnis example with the lenet classifier. The layers are defined as below. My question is, why, after the convolution layer, is there no activation layer (sigmoid or relu)? According to Yann's paper (page 8 bottom left): 
As in classical neural networks, units in layers up to F6 compute a dot product between their input vector and their weight vector, to which bias is added. This weighted sum is then passed through a sigmoid squashing function to produce the state of unit i

 So should the convoloutional layers be followed by a activation as below?

layers {
  name: "relu1"
  type: RELU
  bottom: "conv1"
  top: "conv1"
}


Thanks guys


MNIST example :
layers {
  name: "conv1"
  type: CONVOLUTION
  bottom: "data"
  top: "conv1"
  blobs_lr: 1
  blobs_lr: 2
  convolution_param {
    num_output: 20
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layers {
  name: "pool1"
  type: POOLING
  bottom: "conv1"
  top: "pool1"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  name: "conv2"
  type: CONVOLUTION
  bottom: "pool1"
  type: CONVOLUTION
  bottom: "pool1"
  top: "conv2"
  blobs_lr: 1
  blobs_lr: 2
  convolution_param {
    num_output: 50
    kernel_size: 5
    stride: 1
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layers {
  name: "pool2"
  type: POOLING
  bottom: "conv2"
  top: "pool2"
  pooling_param {
    pool: MAX
    kernel_size: 2
    stride: 2
  }
}
layers {
  name: "ip1"
  type: INNER_PRODUCT
  bottom: "pool2"
  top: "ip1"
  blobs_lr: 1
  blobs_lr: 2
  inner_product_param {
    num_output: 500
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
layers {
  name: "relu1"
  type: RELU
  bottom: "ip1"
  top: "ip1"
}
etc

Alykhan Tejani

unread,
Jan 26, 2015, 5:03:42 AM1/26/15
to caffe...@googlegroups.com
Any ideas? Or have I just misunderstood LeNet. As I can see in the bvlc_reference_caffenet this is the pattern that it used.

Thanks

Eric Schmucker

unread,
Mar 17, 2015, 11:07:25 AM3/17/15
to caffe...@googlegroups.com
Did you ever get an answer to this? I've also been unable to understand exactly what is going on with the sample network definitions. I assumed there was a default activation, but that might not be true.

Using the Python code in the "Neural Networks and Deep Learning" online book as a guide, I created a C# program that duplicates the author's reported results of reaching approximately 98% accuracy on the MNIST data after about 20 epochs of training. I've tried to create a similar network (28x28-100-10) with Caffe for comparison, but the "documentation" isn't very clear. What I did try only reached high 80% accuracy and plateaued. I can continue experimenting and/or looking at the code, but I thought it would be easier.

Alykhan Tejani

unread,
Jun 28, 2015, 9:23:48 AM6/28/15
to caffe...@googlegroups.com
Still searching for an answer to this...


On Friday, 23 January 2015 15:00:35 UTC, Alykhan Tejani wrote:

Amogh Gudi

unread,
Aug 17, 2015, 12:32:19 PM8/17/15
to Caffe Users
I had the same confusion. However, after looking at models\bvlc_reference_caffenet\train_val.prototxt architecture (which is for imagenet), it strongly appears that ReLu layer needs to be explicitly applied after each convolutional layer, and that it is not included in the convolutional layer implicitly.

Regardless, I completely agree that caffe's documentation is really incomplete. 

Cheers

minesh mathew

unread,
Mar 6, 2016, 12:05:07 AM3/6/16
to Caffe Users
The convolution layers in Lenet model in caffe do not use any activation function or in other words use identity activation.
For any computational layer if no activation is mentioned explicitly , identity activation is applied by default 

Jan

unread,
Mar 6, 2016, 1:51:33 PM3/6/16
to Caffe Users
Simply put, a convolutional layer does _only_ convolution, just as an inner product layer only computes the inner product of the weight matrix with the inputs. If you want an activation function, you need to put a layer for it. Caffe does (almost) nothing implicitly, you have to specify all the operations you want performed as layers. Which is probably the best considering both the "intuitive understanding" and "software design" points of view.

Jan
Reply all
Reply to author
Forward
0 new messages