AlphaZero 2.0 Papers (Discusssion)

Cscuile

unread,

Nov 28, 2018, 8:10:45 AM11/28/18

to LCZero

In case people here do not know AlphaZero 2.0's papers will be released in a few weeks. I thought I'd share some links to AZ discussions so far as well as some thoughts.

AZ 2.0 will likely be a 40 block SE, FiLM or something similar, which will likely be stronger than SFDev. Demis also mentioned that AZ 2.0 should be around 3600 Elo.

I am fine with all other engines being surpassed as long as Deepmind does it scientifically and provides the information required for the results to be recreated.

What are your thoughts on the matter? How strong do you think AZ 2.0 will be? Please share any thoughts you have below.

I'll leave this discussion with a perfect quote I've been referring to surrounding the AZ 2.0 situation:

"Scientist should be free to purse fringe ideas, as long as they do it scientifically." -Matthew O'Dowd Astrophysicist and Host of PBS Space Time

Cscuile

Links:

WCC Demis Interview

DeepMind's AlphaZero on Carlsen-Caruana Games 1, 3, 5 & 8 (Sicilian Defence)

DeepMind's AlphaZero on Carlsen-Caruana Games 4 & 6 (English & Petroff)

DeepMind's AlphaZero on Carlsen-Caruana Games 2 & 7 (the QGD)

ovi...@gmail.com

unread,

Nov 28, 2018, 9:18:05 AM11/28/18

to LCZero

If after one year, all google resources, 40 block net, etc, they get a 3600 elo (SF8 is considered 3400), what a deception!!!
In addition, this time will not be enough to show several games in a match tailored for A0.

Cscuile

unread,

Nov 28, 2018, 9:21:49 AM11/28/18

to LCZero

I'd be very shocked if a 40 block net's max Elo is only 3600. If SF8's true Elo is say 3450 (50 Elo handicap to be generous) then SF 10 Dev would be 3562 Elo.

Deep Blender

unread,

Nov 28, 2018, 10:21:30 AM11/28/18

to LCZero

Why would that be a deception? They are most likely not focusing on Chess explicitly. Usually, they are trying to find general principles which can be applied to more and more problems. If they have a new publication and the AlphaZero's ELO hasn't improved remarkably, that's a sign that they made other kinds of progress. Maybe they have added other kinds of games to the mix (which can require significant architectural changes), maybe they found more efficient ways of training or maybe they stabilized the training. Other possibilities are MCTS improvements, like better or less fragile exploration. Maybe they were able to scale their MCTSnets and make it usable for challenging tasks like Chess, Shogi and Go.

All of those would be worth another publication and none of them has anything to do with deception.

Jeff Wads

unread,

Nov 28, 2018, 12:14:17 PM11/28/18

to LCZero

Why do you think it will be better than SF Dev?? It seems to me that it is fine as long as the A/B searchers like SF do not have 80M nps, etc. We shall see, though.

Cscuile

unread,

Nov 28, 2018, 12:32:12 PM11/28/18

to LCZero

Jeff, as of right now Leela ID 11258 can fairly frequently outplay Stockfish 10 Dev during the Middle game. However she still fails to convert during the endgame. We have seen from our past tests that increases in Network size dramatically increase Elo. So with this, you can assume that if Leela or AlphaZero uses a 40 block, the Elo strength gain will continue.

There is a good chance a 40 block can gain 100-200 Elo to a 20 block, and if that is the case, SF will be fully surpassed in all time controls. However, there is still evidence to suggest that a 40b still struggles during the endgame quote "Failing to find a mate in 36" at one point in the WCC game analysis video. So that means maybe a 40b isn't as strong as we initially thought.

With that said, the newer SE and FiLM networks are more accurate than CNN networks and with the combination of this, cyclical learning, and other training parameters, Leela/A0 could see hundred(s) of Elo points in growth.

So that is why I want SF 10 to be as strong as possible when going up against AZ 2.0

Deep Blender

unread,

Nov 28, 2018, 12:45:45 PM11/28/18

to LCZero

Have there been tests which show that Leela performs better with FiLM? To me, that appears to be counter-intuitive, but that would make it even more exciting.

Cscuile

unread,

Nov 28, 2018, 1:03:05 PM11/28/18

to LCZero

I know there have been SE tests, and the MSE loss from those tests suggest that SE is more accurate than CNN. You can find these results in the pins on the Devs channel. As for FiLM, I don't know. Perhaps someone else does?

Edward Panek

unread,

Nov 28, 2018, 1:30:01 PM11/28/18

to LCZero

Agreed. I think most games we would see A2 gaining strong mid game advantage and then watching SF sometimes escape in endgame

Cscuile

unread,

Nov 28, 2018, 2:34:01 PM11/28/18

to LCZero

There's one thing that worries me. Assuming AZ is a 40b with SE or FiLM, it still seems to struggle during the endgame. This doesn't bold well for Leela. Perhaps we will end up needing to switch between regular leela and an Endgame Leela?

Edward Panek

unread,

Nov 28, 2018, 2:40:43 PM11/28/18

to LCZero

It appears as though as the number of pieces ---> zero, the difference between winning and losing moves becomes more ambiguous. That is with 20 pieces on the board there may only be 3 moves that dont lose. When there are 5 pieces on the board there may be 20 moves that dont lose.

Daniel Rocha

unread,

Nov 28, 2018, 2:46:25 PM11/28/18

to epan...@gmail.com, lcz...@googlegroups.com

What are SE and FiLM networks?

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/aff0dd4c-c608-4243-ba8b-bc44574e003b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--

Daniel Rocha - RJ

danie...@gmail.com

Joseph Ellis

unread,

Nov 28, 2018, 3:27:38 PM11/28/18

to LCZero

https://arxiv.org/pdf/1709.01507.pdf

https://arxiv.org/pdf/1709.07871.pdf

Trevor G

unread,

Nov 28, 2018, 4:06:37 PM11/28/18

to Deep Blender, LCZero

I’ll be very interested to see this. Hopefully it includes some innovation in MCTS/UCT - either during training or match play, or both. I’m not sure how low-hanging the fruit are, but it’s quite apparent that there are inherent issues with MCTS. Specifically a lot of the issues regarding endgames (and misevaluation of draws) feel more symptomatic of the way that MCTS is used rather than the idea of using a heavy evaluation function (ie a NN) trained from “zero.”

All of the comments in this thread so far have been about the network topology, but it will be disappointing to me if that’s the primary improvement in AZ 2.0.

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/785451a6-a6a7-4cc2-820e-16c5a22c1f71%40googlegroups.com.

Cscuile

unread,

Nov 28, 2018, 4:14:05 PM11/28/18

to LCZero

Indeed! Leela's search heuristic is in my opinion her biggest weakness. Due to the inherent issues with MCTS, Leela scales extremely poorly. She stops at around 200k Nodes per Move.

If Deepmind decides to release any detailed data about AZ 2.0 that can be replicated, I hope it's their enhanced MCTS, UCT.

Deep Blender

unread,

Nov 28, 2018, 9:29:56 PM11/28/18

to LCZero

I am aware of the SE results and it is intuitively understandable that it can work better. I can't see that at all for FiLM, that's why I am asking.

Reply all

Reply to author

Forward