ex. 22 a) Stepsize

pylypen...@gmail.com

unread,

Jan 28, 2019, 9:29:57 AM1/28/19

to Machine Learning WS18/19

Hi,

Should we expect good results with stepsize = 1? Because I only obtain high accuracy when I set the stepsize to a much smaller number, e.g. 0.02, otherwise the loss oscillates too much and the model does not really learn.

Best,

Daria

Maksym Andriushchenko

unread,

Jan 28, 2019, 11:56:04 AM1/28/19

to Daria Pylypenko, Machine Learning WS18/19

Hi,

I've just tested stepsize = 1 and it worked perfectly resulting in around 4% test error.

This kind of discrepancy in the proper learning rate may be caused by:

1. The lack of division over the batch size in your objective. If you don't divide it by batchsize=20, you can expect that the right learning rate for you should be also 20 times smaller.

2. Alternatively, a problem may arise if you use a not suitable random initialization. For example, I used He initialization (https://arxiv.org/abs/1502.01852):

ucur = randn(m1, d) * sqrt(2 / d);

wcur = randn(K, m1) * sqrt(2 / m1);

and training was successful.

Hope that helps,

Maksym

--
You received this message because you are subscribed to the Google Groups "Machine Learning WS18/19" group.
To unsubscribe from this group and stop receiving emails from it, send an email to machine-learning-...@googlegroups.com.
To post to this group, send email to machine-lea...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/machine-learning-ws1819/c3b0f3cb-015b-4927-8215-9b4ec577eabe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

pylypen...@gmail.com

unread,

Jan 29, 2019, 4:59:37 AM1/29/19

to Machine Learning WS18/19

But we should divide by batchsize only once, right? We don't average the gradient over the batch and then additionally scale it by 1/B?

понеділок, 28 січня 2019 р. 17:56:04 UTC+1 користувач Maksym Andriushchenko написав:

Hi,

I've just tested stepsize = 1 and it worked perfectly resulting in around 4% test error.

This kind of discrepancy in the proper learning rate may be caused by:
1. The lack of division over the batch size in your objective. If you don't divide it by batchsize=20, you can expect that the right learning rate for you should be also 20 times smaller.
2. Alternatively, a problem may arise if you use a not suitable random initialization. For example, I used He initialization (https://arxiv.org/abs/1502.01852):

ucur = randn(m1, d) * sqrt(2 / d);

wcur = randn(K, m1) * sqrt(2 / m1);
and training was successful.

Hope that helps,
Maksym

On Mon, Jan 28, 2019 at 3:29 PM <pylypen...@gmail.com> wrote:

Hi,

Should we expect good results with stepsize = 1? Because I only obtain high accuracy when I set the stepsize to a much smaller number, e.g. 0.02, otherwise the loss oscillates too much and the model does not really learn.

Best,
Daria

--
You received this message because you are subscribed to the Google Groups "Machine Learning WS18/19" group.

To unsubscribe from this group and stop receiving emails from it, send an email to machine-learning-ws1819+unsub...@googlegroups.com.

Maksym Andriushchenko

unread,

Jan 29, 2019, 5:23:50 AM1/29/19

to Daria Pylypenko, Machine Learning WS18/19

Hi,

On Tue, Jan 29, 2019 at 10:59 AM <pylypen...@gmail.com> wrote:

But we should divide by batchsize only once, right? We don't average the gradient over the batch and then additionally scale it by 1/B?

Yes, we should divide the gradients by the batch size only once.

If this suggestion doesn't help, then I would recommend to simply stick to the smaller learning rate with your current implementation.

If you can achieve around 4% test error, then you can be sure that your implementation is fine.

Best wishes,

Maksym

понеділок, 28 січня 2019 р. 17:56:04 UTC+1 користувач Maksym Andriushchenko написав:

Hi,

I've just tested stepsize = 1 and it worked perfectly resulting in around 4% test error.

This kind of discrepancy in the proper learning rate may be caused by:
1. The lack of division over the batch size in your objective. If you don't divide it by batchsize=20, you can expect that the right learning rate for you should be also 20 times smaller.
2. Alternatively, a problem may arise if you use a not suitable random initialization. For example, I used He initialization (https://arxiv.org/abs/1502.01852):

ucur = randn(m1, d) * sqrt(2 / d);

wcur = randn(K, m1) * sqrt(2 / m1);
and training was successful.

Hope that helps,
Maksym

On Mon, Jan 28, 2019 at 3:29 PM <pylypen...@gmail.com> wrote:

Hi,

Should we expect good results with stepsize = 1? Because I only obtain high accuracy when I set the stepsize to a much smaller number, e.g. 0.02, otherwise the loss oscillates too much and the model does not really learn.

Best,
Daria

--
You received this message because you are subscribed to the Google Groups "Machine Learning WS18/19" group.

To unsubscribe from this group and stop receiving emails from it, send an email to machine-learning-...@googlegroups.com.

To post to this group, send email to machine-lea...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/machine-learning-ws1819/c3b0f3cb-015b-4927-8215-9b4ec577eabe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

You received this message because you are subscribed to the Google Groups "Machine Learning WS18/19" group.

To unsubscribe from this group and stop receiving emails from it, send an email to machine-learning-...@googlegroups.com.

To post to this group, send email to machine-lea...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/machine-learning-ws1819/d5cb7e07-90b9-466f-9bf6-9f155072fe30%40googlegroups.com.

pylypen...@gmail.com

unread,

Jan 29, 2019, 5:32:58 AM1/29/19

to Machine Learning WS18/19

Okay, thank you!

вівторок, 29 січня 2019 р. 11:23:50 UTC+1 користувач Maksym Andriushchenko написав:

Hi,

To unsubscribe from this group and stop receiving emails from it, send an email to machine-learning-ws1819+unsub...@googlegroups.com.

To post to this group, send email to machine-lea...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/machine-learning-ws1819/c3b0f3cb-015b-4927-8215-9b4ec577eabe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Machine Learning WS18/19" group.

To unsubscribe from this group and stop receiving emails from it, send an email to machine-learning-ws1819+unsub...@googlegroups.com.

Reply all

Reply to author

Forward