Dropout during testing

Rester Hall

unread,

Sep 1, 2014, 12:35:02 PM9/1/14

to caffe...@googlegroups.com

Hi,

why is is the dropout layer used in the imagenet_deploy.prototxt of the imagenet-reference model?
In general, what sense does it make to use dropout during testing? As fas as I understood Dropout, it is used as a regularizer during training to prevent overfitting, but during the real testing I don't want any connection to be artificially set to 0 (same for validation)... what am I missing?

Many thanks,
Rester Hall

Yangqing Jia

unread,

Sep 1, 2014, 12:36:41 PM9/1/14

to Rester Hall, caffe...@googlegroups.com

The layer almost does nothing during testing.

Yangqing

--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/62b11ba8-8150-4bef-8771-e27dc39a5c82%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yangqing Jia

unread,

Sep 1, 2014, 12:59:12 PM9/1/14

to Rester Hall, caffe...@googlegroups.com

Apologies for being brief - just for the record, see the following if-statement in the dropout layer that handles different training and testing time behaviors:

https://github.com/BVLC/caffe/blob/dev/src/caffe/layers/dropout_layer.cpp#L34

Yangqing

Rester Hall

unread,

Sep 1, 2014, 1:03:58 PM9/1/14

to caffe...@googlegroups.com, reste...@gmail.com

Thank you so much Yangqing!

Peiyun Hu

unread,

Feb 3, 2015, 6:38:51 PM2/3/15

to caffe...@googlegroups.com, reste...@gmail.com

Hi Yangqing,

Why the outputs of neurons are not multiplied by 0.5? I'm asking because I remember Alex Krizhevsky wrote this in his ImageNet paper (NIPS 2012).

Thanks,

Peiyun

Urko Sanchez

unread,

Feb 4, 2015, 4:43:19 AM2/4/15

to caffe...@googlegroups.com, reste...@gmail.com

He is probably mutiplying by the inverse during Training time (top_data[i] = bottom_data[i] * mask[i] * scale_). This saves time when testing.

Reply all

Reply to author

Forward