Inception-v3 RMSProp training

Linchao Zhu

unread,

Mar 28, 2016, 12:12:37 AM3/28/16

to Discuss

Hi there,

I found that when do RMSProp optimization, epsilon=1 was used. However, to my knowledge, people usually used 1e-10 or some small values and Tensorflow set the default value to 1e-10 as well.

Why we need a big epsilon=1 here?

Best,

Linchao

Yuxin Wu

unread,

Mar 28, 2016, 12:30:27 AM3/28/16

to Discuss

I've seen cases where a too-small epsilon leads to too-large parameter update and breaks the model.

This is also mentioned in the TF documentation of Adam:

The default value of 1e-8 for epsilon might not be a good default in general. For example, when training an Inception network on ImageNet a current good choice is 1.0 or 0.1.

Linchao Zhu

unread,

Mar 28, 2016, 1:26:19 AM3/28/16

to Discuss

Cool! That makes sense. Thanks!

Vincent Vanhoucke

unread,

Mar 28, 2016, 10:16:52 AM3/28/16

to Linchao Zhu, Discuss

Setting eps=1 was essential for training to work well using async SGD, and was the only way to get gains from RMSProp over simple momentum. I agree with you though, it's very unsatisfying that we have to use this high a number for a parameter that should in theory merely be a way to not divide by zero. It might be that you don't need such a high epsilon when using sync SGD. Someone should try :)

--
You received this message because you are subscribed to the Google Groups "Discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to discuss+u...@tensorflow.org.
To post to this group, send email to dis...@tensorflow.org.
To view this discussion on the web visit https://groups.google.com/a/tensorflow.org/d/msgid/discuss/a6750b31-6c38-4cc1-a089-8439086e2270%40tensorflow.org.

Linchao Zhu

unread,

Mar 29, 2016, 4:59:56 AM3/29/16

to Vincent Vanhoucke, Discuss

Thank you for the clarification! I am trying both eps on sync SGD. Would update soon.

Reply all

Reply to author

Forward