ex. 22 a) Stepsize

60 views
Skip to first unread message

pylypen...@gmail.com

unread,
Jan 28, 2019, 9:29:57 AM1/28/19
to Machine Learning WS18/19
Hi,

Should we expect good results with stepsize = 1? Because I only obtain high accuracy when I set the stepsize to a much smaller number, e.g. 0.02, otherwise the loss oscillates too much and the model does not really learn.

Best,
Daria

Maksym Andriushchenko

unread,
Jan 28, 2019, 11:56:04 AM1/28/19
to Daria Pylypenko, Machine Learning WS18/19
Hi,

I've just tested stepsize = 1 and it worked perfectly resulting in around 4% test error.

This kind of discrepancy in the proper learning rate may be caused by:
1. The lack of division over the batch size in your objective. If you don't divide it by batchsize=20, you can expect that the right learning rate for you should be also 20 times smaller.
2. Alternatively, a problem may arise if you use a not suitable random initialization. For example, I used He initialization (https://arxiv.org/abs/1502.01852):

ucur = randn(m1, d) * sqrt(2 / d);

wcur = randn(K, m1) * sqrt(2 / m1);

and training was successful.

Hope that helps,
Maksym

--
You received this message because you are subscribed to the Google Groups "Machine Learning WS18/19" group.
To unsubscribe from this group and stop receiving emails from it, send an email to machine-learning-...@googlegroups.com.
To post to this group, send email to machine-lea...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/machine-learning-ws1819/c3b0f3cb-015b-4927-8215-9b4ec577eabe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

pylypen...@gmail.com

unread,
Jan 29, 2019, 4:59:37 AM1/29/19
to Machine Learning WS18/19
But we should divide by batchsize only once, right? We don't average the gradient over the batch and then additionally scale it by 1/B?

понеділок, 28 січня 2019 р. 17:56:04 UTC+1 користувач Maksym Andriushchenko написав:
Hi,

I've just tested stepsize = 1 and it worked perfectly resulting in around 4% test error.

This kind of discrepancy in the proper learning rate may be caused by:
1. The lack of division over the batch size in your objective. If you don't divide it by batchsize=20, you can expect that the right learning rate for you should be also 20 times smaller.
2. Alternatively, a problem may arise if you use a not suitable random initialization. For example, I used He initialization (https://arxiv.org/abs/1502.01852):

ucur = randn(m1, d) * sqrt(2 / d);

wcur = randn(K, m1) * sqrt(2 / m1);

and training was successful.

Hope that helps,
Maksym

On Mon, Jan 28, 2019 at 3:29 PM <pylypen...@gmail.com> wrote:
Hi,

Should we expect good results with stepsize = 1? Because I only obtain high accuracy when I set the stepsize to a much smaller number, e.g. 0.02, otherwise the loss oscillates too much and the model does not really learn.

Best,
Daria

--
You received this message because you are subscribed to the Google Groups "Machine Learning WS18/19" group.
To unsubscribe from this group and stop receiving emails from it, send an email to machine-learning-ws1819+unsub...@googlegroups.com.

Maksym Andriushchenko

unread,
Jan 29, 2019, 5:23:50 AM1/29/19
to Daria Pylypenko, Machine Learning WS18/19
Hi,

On Tue, Jan 29, 2019 at 10:59 AM <pylypen...@gmail.com> wrote:
But we should divide by batchsize only once, right? We don't average the gradient over the batch and then additionally scale it by 1/B?
Yes, we should divide the gradients by the batch size only once. 

If this suggestion doesn't help, then I would recommend to simply stick to the smaller learning rate with your current implementation. 
If you can achieve around 4% test error, then you can be sure that your implementation is fine.

Best wishes,
Maksym
 
понеділок, 28 січня 2019 р. 17:56:04 UTC+1 користувач Maksym Andriushchenko написав:
Hi,

I've just tested stepsize = 1 and it worked perfectly resulting in around 4% test error.

This kind of discrepancy in the proper learning rate may be caused by:
1. The lack of division over the batch size in your objective. If you don't divide it by batchsize=20, you can expect that the right learning rate for you should be also 20 times smaller.
2. Alternatively, a problem may arise if you use a not suitable random initialization. For example, I used He initialization (https://arxiv.org/abs/1502.01852):

ucur = randn(m1, d) * sqrt(2 / d);

wcur = randn(K, m1) * sqrt(2 / m1);

and training was successful.

Hope that helps,
Maksym

On Mon, Jan 28, 2019 at 3:29 PM <pylypen...@gmail.com> wrote:
Hi,

Should we expect good results with stepsize = 1? Because I only obtain high accuracy when I set the stepsize to a much smaller number, e.g. 0.02, otherwise the loss oscillates too much and the model does not really learn.

Best,
Daria

--
You received this message because you are subscribed to the Google Groups "Machine Learning WS18/19" group.
To unsubscribe from this group and stop receiving emails from it, send an email to machine-learning-...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Machine Learning WS18/19" group.
To unsubscribe from this group and stop receiving emails from it, send an email to machine-learning-...@googlegroups.com.

To post to this group, send email to machine-lea...@googlegroups.com.

pylypen...@gmail.com

unread,
Jan 29, 2019, 5:32:58 AM1/29/19
to Machine Learning WS18/19
Okay, thank you!

вівторок, 29 січня 2019 р. 11:23:50 UTC+1 користувач Maksym Andriushchenko написав:
Hi,

To unsubscribe from this group and stop receiving emails from it, send an email to machine-learning-ws1819+unsub...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Machine Learning WS18/19" group.
To unsubscribe from this group and stop receiving emails from it, send an email to machine-learning-ws1819+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages