How best to hybridize Leela & Stockfish "LeeStock"?

802 views
Skip to first unread message

Warren D Smith

unread,
Jan 8, 2019, 10:13:34 AM1/8/19
to LCZero

In this dual post to both the Leela and Fishcooking fora,

I would like to begin exploring the question of how best to create a 

HYBRID of the neural net GPU/TPU MCTS ("Leela") new-style engines with old-style 

CPU alpha-beta ("Stockfish") engines, to create something new ("LeeStock"?) 

hopefully stronger than either.


First of all, is this a GOOD IDEA?  Probably.

The very fact one mainly runs on GPU, other on CPU, alone seems enough to justify this,

i.e. they each could work largely in parallel without using up the other's compute-juice.

Also, the large stylistic differences, and somewhat disjoint weaknesses, and

1000X greater node rates for Stockfish, all look interesting.

Also from a human point of view, as humans who want to understand chess, Leela

is bothering since its understanding seems very hard to translate into human understanding.  

In contrast, Stockfish's chess knowledge is pretty human-understandable.

And the great effort that went into creating Stockfish, which is really the result

of about 50 years of development on top of 130 years of human chess-knowledge buildup, seems

surely to be worth something?


CURRENT STRENGTHS:  Right now, TCEC 14 premier division is being led by

Stockfish #1 with 18.0 and Leela #2 with 16.0, which two are the only undefeated engines.

The rest of the field, with current results from 9.0 to 14.5, is almost certainly

not going to be able to catch up.  (And further, the leader of the rest, Komodo, only

got its 14.5 by being the beneficiary of 2 crashes by KomodoMCTS, and hence its

14.5 is somewhat suspicious.  It may "really deserve" only about 13.7.)  

Although Stockfish probably will be able to hold off Leela in the present

TCEC 14 superfinal, at the rate Leela is improving it probably

will surpass Stockfish within 1 more year.


CURRENT WEAKNESS CONTRAST:

Leela seems to have superhuman hugely-incredible chess intuition, capable of playing about

1700 chess without any lookahead at all (?!?!).  But Stockfish suffers in that department,

having a too-crass and materialistic view, and an inadequate understanding of many 

positional nuances both great and small, including "thorn pawns" and bad pieces that reside

"in jail."  Leela however can get outcalculated and lose material or get mated; 

and it also seems weak in the endgame where the power of fast alphabeta with a large hash 

table can outperform it by essentially creating (to some approximation) a custom perfect 

play tablebase specialized to that endgame situation.

Plus Leela has had more trouble learning endgame understanding at all (albeit

the Leela developers now think they are on the way to overcoming that).

Also, Monte Carlo players have historically had difficulty playing positions which are 

highly likely ot be lost (or won) because their training was only about winning, 

and in 99% win situations they only get 2% training info, versus 50% positions.

So then they do dumb-looking random stuff like promoting to a rook rather than a queen, 

because, what the hell, I'm probably going to win anyway.  Or, if they think they are lost, 

then, what the hell, I'll just give up material and make my opponent's job easy,

as opposed to trying to make it maximally tough for him and hoping he'll blunder.

And both Stockfish and Leela in my opinion are pretty stupid about time management.


The question then is HOW BEST TO CREATE A HYBRID?

I would propose something like the "lazy SMP" framework.

I'll assume the existence of known transformation functions which 

  (i) cause Stockfish root-values to approximately agree with Leela root valuations.

      (The Leela team has one, based on arctangent.)   

  (ii) cause both valuations to approximate "expected game-outcome value."


Start with the search tree created (and stored) by Leela.  Call it the "inner tree."

Stockfish with 1000x Leela's node rate has the ability in the same amount of

time to search Leela's tree plus about 5 ply deeper added starting from each Leela-leaf.  

In fact let us do that.  This is easy to do.  

At present (to oversimplify) Stockfish simply searches to "depth D"

for D=1,2,3... in sequence.  (Less simply, there are all kinds of retractions & extensions,

and quiescence off the end, and it uses alpha beta with move ordering to cut down on the work...

but all that is not going to matter for my suggestion to follow.)

Define "pseudo depth" to be the same thing as depth, except that whenever you follow

a Leela tree-edge, that does not count as +1 depth, but only as +0.01 depth.

Then ask Stockfish to search out to PSEUDO-DEPTH D=1,2,3...


OK, in that way, Stockfish will benefit continually from Leela.

But we want the benefits to be bidirectional.


Each Leela tree-node (at present) stores a move-suggestion S (or even a full list of

moves, with probabilities assigned to each by neural net for the moves not yet tried) 

and a value V (it ultimately plays the root-child with the most value).

This could be supplemented with info from Stockfish:

each node will now also know a lower bound L, upper bound U, or "exact" E, value 

for that position as reckoned by Stockfish searching to depth about 5 

(or more, if that node happens to be interior to the Leela tree).

Also, each node will know Stockfish's recommended hopefully-best move M.

Leela can use this info.


Let T>0 be some threshold.  If V>U+T, or V<L-T, or |V-E|>T

then Stockfish and Leela DISAGREE about that node's value.

And if M and S disagree, they disagree about the best move.


First of all, at the tree-root, if Leela & Stockfish agree on the move choice,

that can be used to encourage search-termination.  If they disagree, and

by more than some value-threshold, that can be used to encourage

searching some more.  The higher the threshold the more encouragement to search more.

So that is a benefit for smarter time-management.


Second, at any tree node (not root), nodes with greater disagreement "T" value,

should get extra priority for deeper exploration by Leela, and the Stockfish

suggested move M, should get boosted "probability" score if it disagreed with S

and had been granted, a priori by Leela, low probability.


So in that way Leela also is continually benefitting from Stockfish.

It is bidirectional.


At some point, either we reach agreement about the best move and just do it, OR

we still have disagreement and have run out of time, hence must move anyway.

What move to make in the latter situation?


Suppose Leela prefers move 1, with value V1 according to it, 

but value E1 (or just an upper bound, <U1; in this case call it V1 anyway

by abuse of notation) says Stockfish.


Stockfish prefers move 2, with value E2 according to it, 

but value V2 says Leela.


Well I suggest choosing the move i that maximizes

  K*Ei + Vi

where the weight K>0 is tuned to maximize empirical playing strength.

(If Leela is stronger than Stockfish in future I would expect 0<K<1

but if Stockfish is stronger I would expect K>1, but I could be wrong.)


--Warren D. Smith


Alexey Eromenko

unread,
Jan 8, 2019, 10:52:32 AM1/8/19
to LCZero
I am not sure if such tricks are even needed. Because Leela gains about 300 ELO strength per *month*. Something Stockfish gains in 5 years.

Leela will be *far* stronger than Stockfish and any traditional AB engine in few months from now.

But maybe switching to AB engine for endgame with under 8 pieces makes sense to finish sooner.
Another thing: we must convince Leela to end games sooner somehow... in many games she just plays too many moves, while Fish can end it quickly. TCEC S14, DivP, game 105 is an example.
Let's see in one year if those tricks with hybrid engines are even needed. If Leela keeps growing like crazy, she will perfect all stages of the game (esp. with bigger Neural Network, like 40x512 )

Gokhan M

unread,
Jan 8, 2019, 11:26:23 AM1/8/19
to Alexey Eromenko, LCZero
End game is a perfect place to do what stockfish does (deep search with simple position eval).

Early and mid games are great for what Leela does (shallow search with advanced position eval). 

Unless Leela slowly switches to some Stockfish-like eval as game progresses, I believe its endgame struggles will persist.

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/d1dd73ef-f53c-4f9f-8dcf-122176540671%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dietrich Kappe

unread,
Jan 8, 2019, 12:48:39 PM1/8/19
to LCZero

Gokhan M

unread,
Jan 8, 2019, 8:20:27 PM1/8/19
to Dietrich Kappe, LCZero
Thanks! Pretty good paper. Anticipated Leela's strengths and weaknesses before it was even developed :) 

On Tue, Jan 8, 2019 at 9:48 AM Dietrich Kappe <dka...@gmail.com> wrote:
--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.

Shah

unread,
Jan 8, 2019, 9:03:48 PM1/8/19
to LCZero
Hi,
Good idea...
IMO Until and including midgame, SF has only this to contribute:

1. Have veto-power on Leela's move, to be used only if she *severely* blundered:
Avoid perpets.
Avoid mate, of course.
But simply avoiding material loss is not a legit reason for using the veto.
For example, Leela might sac for a better position, and we want to allow her that.

2. Win by tactics. If SF sees a deep tactic that wins the game or even *significantly* improves position to a close to 100% win, prefer the SF move.
Here TB also come into use.

But once getting into endgame, the full power should be granted to SF.
Leela has nothing to contribute during endgame.
This is the AB arena.
I know many disagree.

Questions are now, how do you quantify:
1. Severely blunder - that one is probably clear: avoid draw by perpets and avoid a mate.
2. Significantly improve position.
3. Midgame to endgame transition point.

Answering 2 & 3 I think is the challenge in such a hybrid.

Cheers,
S

Dietrich Kappe

unread,
Jan 8, 2019, 9:11:04 PM1/8/19
to LCZero
What’s old is new. This is from an experiment I did with the old 192x15 main nets. https://github.com/dkappe/leela-chess-weights/wiki/Real-Time-Blunder-Checking

Basically I wrote a wrapper that did real time blunder checking with an ab engine. You’d expect that any ab engine that was reasonably strong would do. As it turned out, the strength of the blunder checking engine made a big difference.

If you want something stronger, you have to mix minimax and mcts in a more sophisticated way.

Dietrich Kappe

unread,
Jan 8, 2019, 9:14:48 PM1/8/19
to LCZero
I’d say in addition that leela misevaluates lots of endgames, but also outplays stockfish in things like imbalanced minor piece endgames. The “leela is not good at endgames” notion really doesn’t apply universally.

Shah

unread,
Jan 8, 2019, 9:28:23 PM1/8/19
to LCZero
Correct.
So you can keep Leela awake during endgame, and sort of switch roles:
If leela spots a move with a positional advantage, as long as SF is also fine with it, prefer Leela's move. Sure.
That should squeez a few more elos I guess.

Dietrich Kappe

unread,
Jan 8, 2019, 10:17:05 PM1/8/19
to LCZero
How do you know when 200 centipawns is a leela positional sacrifice or a blunder? You don’t. Saying “let leela pick the move unless stockfish sees it is a blunder” is easy, but making it work isn't all that straightforward.

Shah

unread,
Jan 9, 2019, 1:40:06 AM1/9/19
to LCZero
At midgame, as a rule, "in Leela we trust".
And I agree that combining the engines naivly as I suggested is oversimplified.
But the guidelines were meant to be very clear.
Let leela focus on strategy and let AB focus on tactics.
I also mentioned there are challenges.

Incorporating the search trees in more elaborate ways, sharing PV at least, should be done and should allow avoiding *some* of those 200 centipawn blunders, that you mentioned.
But even with perfectly incorporated algorithms there must *necessarily* still be cases of unresolvable disagreement between the engines, where Leela says sac the Q! and SF says we're loosing here 200 centipawns!
In midgame, as said, I listen to Leela...
Chess is not 100% safe, until resolved.

Cheers

Tyler Bischel

unread,
Jan 9, 2019, 9:28:02 AM1/9/19
to LCZero
I like the idea of independent trees, stockfish getting a PV list instead of single variation, and periodically inserting lc0's PV into stockfish's pv list. Then, on an iterative deepening run, stockfish considers leelas ideas close to the front of the search.

Warren D Smith

unread,
Jan 9, 2019, 2:14:31 PM1/9/19
to LCZero
Try training leela net with,  in addition to chessboard as inputs,
also stockfish  lookedahead eval (typically about 5 ply? or maybe its depth in ply should be another provided input), 
flags saying whether it is exact, lower bound, or upper bound; and stockfish's
suggested best move, as additional inputs.

Then leela hopefully will learn to understand in which kinds of positions, the stockfish recommendation should
be trusted more or less, than whatever its net would have suggested without SF.

This might work well.  Or might not.   Neural nets seem to me to be a black art, you cannot predict
a priori what will work well.  At least I cannot.


Reply all
Reply to author
Forward
0 new messages