Dropout during testing

3,496 views
Skip to first unread message

Rester Hall

unread,
Sep 1, 2014, 12:35:02 PM9/1/14
to caffe...@googlegroups.com
Hi,

why is is the dropout layer used in the imagenet_deploy.prototxt of the imagenet-reference model?
In general, what sense does it make to use dropout during testing? As fas as I understood Dropout, it is used as a regularizer during training to prevent overfitting, but during the real testing I don't want any connection to be artificially set to 0 (same for validation)... what am I missing?

Many thanks,
Rester Hall

Yangqing Jia

unread,
Sep 1, 2014, 12:36:41 PM9/1/14
to Rester Hall, caffe...@googlegroups.com
The layer almost does nothing during testing.

Yangqing


--
You received this message because you are subscribed to the Google Groups "Caffe Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to caffe-users...@googlegroups.com.
To post to this group, send email to caffe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/caffe-users/62b11ba8-8150-4bef-8771-e27dc39a5c82%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Yangqing Jia

unread,
Sep 1, 2014, 12:59:12 PM9/1/14
to Rester Hall, caffe...@googlegroups.com
Apologies for being brief - just for the record, see the following if-statement in the dropout layer that handles different training and testing time behaviors:


Yangqing

Rester Hall

unread,
Sep 1, 2014, 1:03:58 PM9/1/14
to caffe...@googlegroups.com, reste...@gmail.com
Thank you so much Yangqing!

Peiyun Hu

unread,
Feb 3, 2015, 6:38:51 PM2/3/15
to caffe...@googlegroups.com, reste...@gmail.com
Hi Yangqing, 

Why the outputs of neurons are not multiplied by 0.5? I'm asking because I remember Alex Krizhevsky wrote this in his ImageNet paper (NIPS 2012). 

Thanks, 
Peiyun

Urko Sanchez

unread,
Feb 4, 2015, 4:43:19 AM2/4/15
to caffe...@googlegroups.com, reste...@gmail.com
He is probably mutiplying by the inverse during Training time (top_data[i] = bottom_data[i] * mask[i] * scale_). This saves time when testing.
Reply all
Reply to author
Forward
0 new messages