[Computer-go] 30% faster with a batch size of 63 instead of 64!

Skip to first unread message

Rémi Coulom

May 9, 2020, 5:19:31 PM5/9/20
to computer-go

I am probably not the only one who made this mistake: it is usually very bad to use a power of 2 for the batch size!

Relevant documentation by NVIDIA:

The documentation is not extremely clear, so I figured out the formula:

SM is the number of multiprocessors (80 for V100 or Titan V, 68 for RTX 2080 Ti).
n is an integer (usually n=1 is slightly worse than n>1).

So the efficient batch size is 63 for 9x9 Go on a V100 with 256-channel layers. 53 on the RTX 2080 Ti.

There is my tweet with an empirical plot:

I created a new CGOS account to play with this improvement. Probably not a huge different in strength, but it is good to get such an improvement so easily.


Rémi Coulom

May 9, 2020, 7:17:49 PM5/9/20
to computer-go
Yeaaaah! first win against Kata!

In addition to the optimized batch size, I did two other things:
 - I use two batches of 63 instead of one, with double buffering, so that the GPU is kept 100% busy. About 14k nodes per second now.
 - I make the search less selective, by using a bigger exploration constant in the MCTS formula.
I should download Katago and CLOP my search parameters against it.

So far I have tried to keep the "Zero" philosophy of using self-play only, but playing against other opponents is very likely to be a better approach at making progress.


uurtamo .

May 9, 2020, 9:23:51 PM5/9/20
to computer-go
Nice job! And the graph makes it super clear how the edge effects work.


Computer-go mailing list

David Wu

May 9, 2020, 11:36:41 PM5/9/20
to compu...@computer-go.org
Very nice. :)
And thanks for the note about batch sizing. Specifically tuning parameters for this level of strength on 9x9 seems like it could be quite valuable, Kata definitely hasn't done that either. 

But it feels like bots are very very close to optimal on 9x9. With some dedicated work, more months or years of training, it might be possible to reach unbeatable, for all practical purposes, and as you mentioned in the other thread, adaptively building out an opening book could be a part of getting.there - I'd love to see an "unbeatable 9x9 crazystone" a year down the line.

One fundamental issue that I've been noticing in a variety of domains is precisely that self-play under AlphaZero and generally reinforcement learning in environments like these doesn't explore enough, and it's very, very difficult to get it to do so in a way that's still robust and efficient enough. And unless you plan to do something like AlphaStar's internal self-play training league, which would seem to nontrivally multiply the cost, it seems like playing other opponents instead of just selfplay can't entirely be the solution... because once you reach enough better than the best other opponent, it's hard to usefully continue doing that. And the league *still* didn't entirely fix the problem for AlphaStar, in that humans were still able to sometimes find exploitative strategies that it hadn't learned any idea of how to handle via selfplay, and reacted very poorly to. It feels like there's something unsolved and "missing" from current algorithms.

Reply all
Reply to author
0 new messages