[Computer-go] Training an AlphaGo Zero-like algorithm with limited hardware on 7x7 boards

15 views
Skip to first unread message

cody2007 via Computer-go

unread,
Jan 25, 2020, 5:01:02 PM1/25/20
to cody2007 via Computer-go, cody2007
Hi All,

I wanted to share an update to a post I wrote last year about using the AlphaGo Zero algorithm on small boards (7x7). I train for approximately 2 months on a single desktop PC with 2 GPU cards.

In the article I was getting mediocre performance from the networks. Now, I've found that there was a bug in the way that I was evaluating the networks and that what I've been training seems to be matching GNU Go's level of performance.

Anyway, I'm aware I'm not exactly pushing the bounds of what's been done before, but I thought some might be interested to see how one can still get decent performance, at least in my opinion, on extremely limited hardware setups -- orders of magnitude less than what DeepMind (and Leela) have used.

The post where I talk about the model's performance, training, and setup:

A video where I play the network and show some of its move probabilities during self-play games:

The model weights and tensorflow code:

-Cody

Rémi Coulom

unread,
Jan 25, 2020, 5:58:40 PM1/25/20
to cody2007, computer-go
Hi,

Thanks for sharing your experiments.

Your match results are strange. Did you use a komi? You should use a komi of 9:

The final strength of your network looks surprisingly weak. When I started to develop the Zero version of Crazy Stone, I spend a lot of time optimizing my method on a single (V100) GPU. I could train a strong network from scratch in a single day. Using a wrong komi might have hurt you. Also, on such a small board, it is not so easy to make sure that the self-play games have enough variety. You'd have to find many balanced random initial positions in order to avoid replicating the same game again and again.

Rémi

cody2007 via Computer-go

unread,
Jan 25, 2020, 6:39:08 PM1/25/20
to Rémi Coulom, cody2007, computer-go
Hi Rémi,

Thanks for your comments! I am not using any komi and had not given much thought to it. Although, I suppose by having black win most games, I'm depriving the network of its only learning signal. I will have to try with an appropriately set komi next...

>When I started to develop the Zero version of Crazy Stone, I spend a lot of time optimizing my method on a single (V100) GPU
Any chance you've written about it somewhere? I'd be interested to learn more but wasn't able to find anything on the Crazy Stone website.

Thanks,
Cody

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

Rémi Coulom

unread,
Jan 26, 2020, 11:17:42 AM1/26/20
to cody2007, computer-go
Yes, using komi would help a lot. Still, I feel that something else must be wrong, because winning 100% of the games as Black without komi should be very easy on 7x7.

I have not written anything about what I did with Crazy Stone. But my experiments and ideas were really very similar to what David Wu did:

To clarify what I wrote in my previous message: "strong from scratch in a single day" was for 7x7. I like testing new ideas with small networks on small boards, because training is very fast, and what works on small boards with small networks usually also works on large boards with big networks.

Rémi

Igor Polyakov

unread,
Jan 26, 2020, 11:35:12 AM1/26/20
to compu...@computer-go.org
I trained using David Wu's code for a few months on 9x9 only and it's been superhuman after a few months.

I'm not sure if anyone's interested, but I can release my network to the world. It's around the strength of KataGo, but only on 9x9. I could do a final test before releasing it into the wild

_______________________________________________
Computer-go mailing list
Compu...@computer-go.org
http://computer-go.org/mailman/listinfo/computer-go

cody2007 via Computer-go

unread,
Jan 26, 2020, 5:39:14 PM1/26/20
to compu...@computer-go.org, cody2007
Thanks again for your thoughts and experiences Rémi and Igor.

I'm still puzzled by what is making training slower for me than Rémi (although I wouldn't be surprised if Igor's results were faster when matched for hardware, model size, strength etc-- see below). Certainly komi sounds like it might help a lot. I'm going to have to check out the code from David Wu.

It takes me longer than a day for "training" to actually start with my code -- because I first generate 128*2*32*35 = 285k training samples before even running the first round of backprop. After the first day, therefore, my model is always still entirely random.  So, possibly:

(1) either your and David Wu's implementations are faster in wall clock time computationally
(2) backprop is being started before the initial training buffer is filled (the Wu paper used 250k but it's not 100% clear to me if training did not start until that initial buffer was filled)
(3) "training" time is being counted as the time when backprop starts regardless of how long the initial training buffer took to create.

Another thing is that I'm not using any of the techniques beyond AlphaGo Zero that David Wu used. So, depending on if you guys are using some or all of those additional features and/or loss functions, it'd be expected that you're getting much faster training than me. I was actually starting to test adding some of his ideas from that paper to my code a while back but then coincidentally discovered the models I was training weren't as horrible as I had first thought.

Have either of you ever benchmarked your 7x7 (or 9x9) models against GNU Go?

By the way, all benchmarking against GNU Go that I've reported was in single-pass mode only (i.e., I was not running the tree search on top of the net outputs)

Thanks,
Cody

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

Igor Polyakov

unread,
Jan 26, 2020, 11:18:01 PM1/26/20
to cody2007, compu...@computer-go.org
I would be surprised if my model ever lost to GNU Go on 9x9. It's a lot stronger than Fuego, which already stomps GNU Go. It would be a waste of time to test it vs. GNU Go or even MCTS bots. I only plan on running tests vs. current best models to see how it does against the state of the art 9x9 nets

Rémi Coulom

unread,
Jan 27, 2020, 6:04:34 AM1/27/20
to cody2007, computer-go
This is a report after my first day of training my Ataxx network:
Ataxx is played on a 7x7 board. The rules are different, but I expect 7x7 Go would produce similar results. 2k self-play games are more than enough to produce a huge strength improvement at the beginning.

It would take my system less than one day to generate 285k games on a single GPU. But speed optimizations are probably not your biggest problem at the moment.

As I wrote in my previous message, it is important to control the variety of your self-play game. In my program, I have a function to count the number of distinct board configurations for each move number of the self-play games. This way, I can ensure that the same opening is not replicated too many times.

Álvaro Begué

unread,
Jan 27, 2020, 10:11:58 AM1/27/20
to computer-go
For checkers, I used a naive implementation of UCT as my opening book
(the "playout" being the actual game where the engine is thinking). So
towards the end of the opening book there is always a position where
it will try a random move, but in the long run good opening moves will
be explored more often. I think this method might work well for other
games.

Álvaro.

Rémi Coulom

unread,
Jan 27, 2020, 3:36:23 PM1/27/20
to computer-go
Building an opening book is a good idea. I do it too.

By the way, if anybody is interested, I have put a small 9x9 opening book online:
Evaluation is +1 for a win, -1 for a loss, for a komi of 7. It may not be very good, because evaluations was done by my 19x19 network. I have started to train a specialized 9x9 network last week, and it is already stronger.

Rémi

Álvaro Begué

unread,
Jan 27, 2020, 9:11:45 PM1/27/20
to computer-go
To be clear, what I was talking about was building an opening book as
part of the game-generation process that produces training data for
the neural network. This makes sure you don't generate the same game
over and over again.

A few more things about my Spanish checkers experiment from a few
years ago:
* I used a neural network as an evaluation function, and alpha-beta
as the search algorithm. The networks I tried were fully connected and
quite small compared to anything people are trying these days. The
only game-specific knowledge I provided was not stopping the search if
a capture is available (a primitive quiescence search that works well
for checkers).
* I couldn't get very far until I provided access to endgame
tablebases. An important purpose of the evaluation function is to
establish if there is enough advantage for one side to convert the
game into a win, and the shallow searches I was performing in the
generated games weren't strong enough in the endgame to determine
this. Once I generated 6-men tablebases (pretty easy to do for
checkers), it became very strong very quickly (about 1 week of
computation, if I remember correctly).

If I find some time in the next few weeks, I'll try to repeat the
process for Ataxx.

Álvaro.

Reply all
Reply to author
Forward
0 new messages