Very nice. :)
And thanks for the note about batch sizing. Specifically tuning parameters for this level of strength on 9x9 seems like it could be quite valuable, Kata definitely hasn't done that either.
But it feels like bots are very very close to optimal on 9x9. With some dedicated work, more months or years of training, it might be possible to reach unbeatable, for all practical purposes, and as you mentioned in the other thread, adaptively building out an opening book could be a part of getting.there - I'd love to see an "unbeatable 9x9 crazystone" a year down the line.
One fundamental issue that I've been noticing in a variety of domains is precisely that self-play under AlphaZero and generally reinforcement learning in environments like these doesn't explore enough, and it's very, very difficult to get it to do so in a way that's still robust and efficient enough. And unless you plan to do something like AlphaStar's internal self-play training league, which would seem to nontrivally multiply the cost, it seems like playing other opponents instead of just selfplay can't entirely be the solution... because once you reach enough better than the best other opponent, it's hard to usefully continue doing that. And the league *still* didn't entirely fix the problem for AlphaStar, in that humans were still able to sometimes find exploitative strategies that it hadn't learned any idea of how to handle via selfplay, and reacted very poorly to. It feels like there's something unsolved and "missing" from current algorithms.