Scaling up batch size for training.

Yohsuke Fukai

unread,

Oct 13, 2022, 10:28:36 PM10/13/22

to Orca Users

Hi all,

I am now considering scaling up the batch size to shorten the training time (by rewriting the script by DistributedDataParallel). I tried increasing the batch for the train_a script from 16 to 32, but this caused oscillation of the test loss (Average Corr. etc.) without score improvement.

Has anyone considered or tried increasing the batch size? It would be great if we could know recommendations for the way to tune the optimization parameters.

Best,
Yohsuke

Yohsuke Fukai

unread,

Oct 13, 2022, 10:31:33 PM10/13/22

to Orca Users

Hi,

I'm sorry for splitting the emails, but let me attach the plot for the Average Correlation (for the h1esc data).

Best,
Yohsuke

2022年10月14日金曜日 11:28:36 UTC+9 Yohsuke Fukai:

Jian Zhou

unread,

Oct 14, 2022, 10:31:01 PM10/14/22

to Yohsuke Fukai, Orca Users

Thanks for reporting this. It is hard to predict what will address the fluctuation, but it may not be hard to solve by changing the learning rate/momentum, and it may not negatively affect the performance. But in general increasing batch size won't necessarily improve training speed, unless you also increase the learning rate (which may make the fluctuation worse though). Small batch size, especially when used with BatchNorm, also has a regularization effect which is sometimes desired. So overall I can't have much recommendation other than testing it out, but hope this helps.

Jian

--
You received this message because you are subscribed to the Google Groups "Orca Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to orca-users+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/orca-users/2b8e400b-b3fa-42b6-b29b-501083b6184an%40googlegroups.com.

Yohsuke Fukai

unread,

Oct 16, 2022, 8:44:54 PM10/16/22

to Orca Users

Dear Jian,

Thanks for your kind reply, I forgot to add this to the plot, but when I tried doubling the learning rate for batch 32 case (lr_scale), the oscillation looked surpressed up to some point (though it stops with running out the memory.). I will further try with varied learning rate and momentums and may be able to report here. (A bit pessimistic to hear the small sizes is sometimes preferrable.)

スクリーンショット 2022-10-17 9.41.44.png

If possible, would you mind letting me know how you chose the optimization parameters? Did you varied them in a range and chose a one with the fastest convergence of the loss?

Best,
Yohsuke

2022年10月15日土曜日 11:31:01 UTC+9 jzh...@gmail.com:

Reply all

Reply to author

Forward