In this dual post to both the Leela and Fishcooking fora,
I would like to begin exploring the question of how best to create a
HYBRID of the neural net GPU/TPU MCTS ("Leela") new-style engines with old-style
CPU alpha-beta ("Stockfish") engines, to create something new ("LeeStock"?)
hopefully stronger than either.
First of all, is this a GOOD IDEA? Probably.
The very fact one mainly runs on GPU, other on CPU, alone seems enough to justify this,
i.e. they each could work largely in parallel without using up the other's compute-juice.
Also, the large stylistic differences, and somewhat disjoint weaknesses, and
1000X greater node rates for Stockfish, all look interesting.
Also from a human point of view, as humans who want to understand chess, Leela
is bothering since its understanding seems very hard to translate into human understanding.
In contrast, Stockfish's chess knowledge is pretty human-understandable.
And the great effort that went into creating Stockfish, which is really the result
of about 50 years of development on top of 130 years of human chess-knowledge buildup, seems
surely to be worth something?
CURRENT STRENGTHS: Right now, TCEC 14 premier division is being led by
Stockfish #1 with 18.0 and Leela #2 with 16.0, which two are the only undefeated engines.
The rest of the field, with current results from 9.0 to 14.5, is almost certainly
not going to be able to catch up. (And further, the leader of the rest, Komodo, only
got its 14.5 by being the beneficiary of 2 crashes by KomodoMCTS, and hence its
14.5 is somewhat suspicious. It may "really deserve" only about 13.7.)
Although Stockfish probably will be able to hold off Leela in the present
TCEC 14 superfinal, at the rate Leela is improving it probably
will surpass Stockfish within 1 more year.
CURRENT WEAKNESS CONTRAST:
Leela seems to have superhuman hugely-incredible chess intuition, capable of playing about
1700 chess without any lookahead at all (?!?!). But Stockfish suffers in that department,
having a too-crass and materialistic view, and an inadequate understanding of many
positional nuances both great and small, including "thorn pawns" and bad pieces that reside
"in jail." Leela however can get outcalculated and lose material or get mated;
and it also seems weak in the endgame where the power of fast alphabeta with a large hash
table can outperform it by essentially creating (to some approximation) a custom perfect
play tablebase specialized to that endgame situation.
Plus Leela has had more trouble learning endgame understanding at all (albeit
the Leela developers now think they are on the way to overcoming that).
Also, Monte Carlo players have historically had difficulty playing positions which are
highly likely ot be lost (or won) because their training was only about winning,
and in 99% win situations they only get 2% training info, versus 50% positions.
So then they do dumb-looking random stuff like promoting to a rook rather than a queen,
because, what the hell, I'm probably going to win anyway. Or, if they think they are lost,
then, what the hell, I'll just give up material and make my opponent's job easy,
as opposed to trying to make it maximally tough for him and hoping he'll blunder.
And both Stockfish and Leela in my opinion are pretty stupid about time management.
The question then is HOW BEST TO CREATE A HYBRID?
I would propose something like the "lazy SMP" framework.
I'll assume the existence of known transformation functions which
(i) cause Stockfish root-values to approximately agree with Leela root valuations.
(The Leela team has one, based on arctangent.)
(ii) cause both valuations to approximate "expected game-outcome value."
Start with the search tree created (and stored) by Leela. Call it the "inner tree."
Stockfish with 1000x Leela's node rate has the ability in the same amount of
time to search Leela's tree plus about 5 ply deeper added starting from each Leela-leaf.
In fact let us do that. This is easy to do.
At present (to oversimplify) Stockfish simply searches to "depth D"
for D=1,2,3... in sequence. (Less simply, there are all kinds of retractions & extensions,
and quiescence off the end, and it uses alpha beta with move ordering to cut down on the work...
but all that is not going to matter for my suggestion to follow.)
Define "pseudo depth" to be the same thing as depth, except that whenever you follow
a Leela tree-edge, that does not count as +1 depth, but only as +0.01 depth.
Then ask Stockfish to search out to PSEUDO-DEPTH D=1,2,3...
OK, in that way, Stockfish will benefit continually from Leela.
But we want the benefits to be bidirectional.
Each Leela tree-node (at present) stores a move-suggestion S (or even a full list of
moves, with probabilities assigned to each by neural net for the moves not yet tried)
and a value V (it ultimately plays the root-child with the most value).
This could be supplemented with info from Stockfish:
each node will now also know a lower bound L, upper bound U, or "exact" E, value
for that position as reckoned by Stockfish searching to depth about 5
(or more, if that node happens to be interior to the Leela tree).
Also, each node will know Stockfish's recommended hopefully-best move M.
Leela can use this info.
Let T>0 be some threshold. If V>U+T, or V<L-T, or |V-E|>T
then Stockfish and Leela DISAGREE about that node's value.
And if M and S disagree, they disagree about the best move.
First of all, at the tree-root, if Leela & Stockfish agree on the move choice,
that can be used to encourage search-termination. If they disagree, and
by more than some value-threshold, that can be used to encourage
searching some more. The higher the threshold the more encouragement to search more.
So that is a benefit for smarter time-management.
Second, at any tree node (not root), nodes with greater disagreement "T" value,
should get extra priority for deeper exploration by Leela, and the Stockfish
suggested move M, should get boosted "probability" score if it disagreed with S
and had been granted, a priori by Leela, low probability.
So in that way Leela also is continually benefitting from Stockfish.
It is bidirectional.
At some point, either we reach agreement about the best move and just do it, OR
we still have disagreement and have run out of time, hence must move anyway.
What move to make in the latter situation?
Suppose Leela prefers move 1, with value V1 according to it,
but value E1 (or just an upper bound, <U1; in this case call it V1 anyway
by abuse of notation) says Stockfish.
Stockfish prefers move 2, with value E2 according to it,
but value V2 says Leela.
Well I suggest choosing the move i that maximizes
K*Ei + Vi
where the weight K>0 is tuned to maximize empirical playing strength.
(If Leela is stronger than Stockfish in future I would expect 0<K<1
but if Stockfish is stronger I would expect K>1, but I could be wrong.)
--Warren D. Smith
--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/d1dd73ef-f53c-4f9f-8dcf-122176540671%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Have a look at this paper on hybrid mats/minimax. https://dke.maastrichtuniversity.nl/m.winands/documents/mcts-minimax_hybrids_final.pdf
--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/a15ce63b-4577-40c5-bede-46be18d88393%40googlegroups.com.
1. Have veto-power on Leela's move, to be used only if she *severely* blundered:
Avoid perpets.
Avoid mate, of course.
But simply avoiding material loss is not a legit reason for using the veto.
For example, Leela might sac for a better position, and we want to allow her that.
2. Win by tactics. If SF sees a deep tactic that wins the game or even *significantly* improves position to a close to 100% win, prefer the SF move.
Here TB also come into use.
But once getting into endgame, the full power should be granted to SF.
Leela has nothing to contribute during endgame.
This is the AB arena.
I know many disagree.
Questions are now, how do you quantify:
1. Severely blunder - that one is probably clear: avoid draw by perpets and avoid a mate.
2. Significantly improve position.
3. Midgame to endgame transition point.
Answering 2 & 3 I think is the challenge in such a hybrid.
Cheers,
S
Incorporating the search trees in more elaborate ways, sharing PV at least, should be done and should allow avoiding *some* of those 200 centipawn blunders, that you mentioned.
But even with perfectly incorporated algorithms there must *necessarily* still be cases of unresolvable disagreement between the engines, where Leela says sac the Q! and SF says we're loosing here 200 centipawns!
In midgame, as said, I listen to Leela...
Chess is not 100% safe, until resolved.
Cheers