Fixing Leela's tactics: Low-entropy driven exploration (rather than visit-count alone)

Kevin Kirkpatrick

unread,

Mar 1, 2019, 8:25:03 AM3/1/19

to LCZero

I've been formulating an idea - I call it "low-entropy-driven exploration" - which would use entropy calculations to bias exploration toward "sharp" areas of the decision tree. My intuition (and couple casual emulations) is that search efficiency can be improved by recognizing that some nodes are "cheaper" to search than others. In chess, low-entropy nodes comport to moves driven by checks, attacks, and sacrifices that lend to forcing sequences.

The driver is not that these sequences are more likely to be better, but instead, that it is much quicker (requires far fewer visits) to determine whether they are good or bad compared to "branchier" nodes. This idea is obvious in other contexts. If I'm looking for my keys, even if I'm 95% sure they are somewhere in my bedroom and only 5% sure they are on my (otherwise clear) kitchen table - because my kitchen table has "nowhere to hide" and 1 glance will definitively tell me whether they're on the table or not... it's more efficient to spend a few seconds to search the 5% table before dedicating several minutes to search the 95% bedroom. If you find yourself nodding along with this, stop to ask: how could one possibly reach that decision if the only thing they were told about the search space was

* 95% likelihood bedroom

* 5% likelihood kitchen table

* 0 visits each

Anyway, the basic concept is to add a new variable to the exploration part of exploration+exploitation. This new variable, entropy, would track the distribution of probabilities of a playout to a node reaching a given unexplored (or terminal) node of its subtree. The collected set of all probabilities of all leaf nodes would thus sum to 1. Consider two nodes, both with subtrees with 10 reachable leaf nodes. The first node (based on current policy/value/visit-count) might have a 10% each chance of visiting any of the 10 possible leaf nodes. The other might have a 91% chance of reaching 1 leaf node and 1% chance for the other 9. The first has a much higher entropy (the mathematical maximum of entropy, in fact) than the second; thus, even if the first node looks a little better than the second (say, Q-value of 0.3 vs Q-value of 0.1) - the low entropy of the second node should incline Leela to prioritize searching it over the first. Again - NOT because the second node is more likely to be better, but simply because if it *is* truly better (or truly worse), it'll take far fewer playouts to learn that fact than it will the first.

I have no idea what the ultimate math should look like, but I expect that any small nudge that biased Leela's exploration factor to spend a little more time looking at sharp/forcing lines, as this strategy would do, would achieve wonders for her tactical game.

Kevin Kirkpatrick

unread,

Mar 1, 2019, 8:41:56 AM3/1/19

to LCZero

Another way to cast this idea: I feel this is the "zero" and generic expression of the strategy that traditional engines use by forcing the exploration of "low entropy" chess moves like check and piece sacs.

In that sense, one could argue that

1) This idea of using the entropy of existing search tree (or some close equivalent) has already been widely tested and broadly accepted as beneficial by traditional engines

2) The exact scenarios that Leela (and arguably AlphaZero) struggles compared to traditional engines are the tactical scenarios that are most directy benefited by this optimization.

In other words: maybe this gap is *the* key hole in Leela's fortress. The one remaining "exposed flank" that still allows even weak opponents to stumble onto surprise wins. The one weak spot which, once fixed, will allow Leela to spring forward to a place where she plays crushingly better positionally and tactically. Where 100-0-0 scores become standard against all but the best/strongest of tradtiional engine competitors.

Lukas S

unread,

Mar 1, 2019, 9:26:35 AM3/1/19

to LCZero

You should post this on the Github issue tracker. It's the better place for these kind of ideas. It'll make sure that the right people see it and your idea will get the feedback it imo deserves!

Kevin Kirkpatrick

unread,

Mar 1, 2019, 10:45:17 AM3/1/19

to LCZero

Maybe that's just my ignorance of git, but when I first looked, I didn't see anything that appeared to facilitate "discuss an idea". In git, I saw the word "issue", which I interpret in a negative context of "report issue", e.g. to flag unwanted behavior or report bugs. Since I consider the git repo to essentially "be" the Leela project, I was not inclined to risk "spamming" the project with un-vetted ideas (particularly ideas that I am 100% unequipped to contribute toward). I figured "vetting ideas" is precisely what forums and Discourse were set up for; and that meritorious ideas would reach the eyes of developers who could add (or not add) the idea to the offical git project list of to-dos.

That said, I'm a git "amateur". I've used it before, but exactly as little as necessitated (and no retention of how it works). If "open a git issue" is the right path to take with this, can you throw me some specifics to help ensure I do it right (not wasting my time and ESPECIALLY not wasting time of development team)?

Florian Schmitt

unread,

Mar 1, 2019, 1:45:55 PM3/1/19

to LCZero

Yes opening an issue would be the right way. Issues can be anything from bugs to feature requests. It's just a way to keep track of things. If you do a bug request, you should put everything in that helps developers to reproduce your bug (because that can be really difficult). But as it is a feature request you can just click "new issue", paste your text in, give it a title and be done with it. ;-) Just use the right issue tracker: this would be a feature of lc0 so use the issue tracker of this subproject (e.g. there: https://github.com/LeelaChessZero/lc0/issues ).

Trevor G

unread,

Mar 1, 2019, 2:05:56 PM3/1/19

to Kevin Kirkpatrick, LCZero

It sounds to me like you want a kind of quiescence search — https://en.m.wikipedia.org/wiki/Quiescence_search

I do agree that policy entropy could be a good place to start to determine a way to go about this, but I also think that having a “variance head” added to Leela would help as well. We could train it by looking at the Q value from the 800 node self-play trees vs the value estimation from the NN itself.

So yes, a very flat (high entropy) policy would seem to me to tend to have evaluations that are more stable, but sharper (low entropy) would be a good place to look more deeply. However, I think both are driven ultimately by the uncertainty in the evaluation, and a variance head, i think, could help.

Another thing to add here is that the MCTS/puct search algorithm is very ripe for data analysis and statistical methods to be thrown at it. Simply averaging all of the NN evaluations to get a node’s value seems very rudimentary vs other possibilities for statistical-driven value estimation. And same thing with determining the policy for node expansion (which your idea would fall under). There’s gotta be a way to throw statistics at that problem to determine when it’s good to search deeper.

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/9608fb48-0ed8-4717-bfe0-58f1e195de43%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Elkin

unread,

Mar 1, 2019, 3:00:50 PM3/1/19

to LCZero

Would something like this also help leela find mates, either for or against. There have been previous posts about how leela can miss a mate in 1 and loose. And at the TCEC bonus games, there were several games that leela lost with mate with her evaluation being much lower.

Missing a mate 1 is the ultimate blunder, and I can't even think of how any search can miss it. If this can help those scenarios it would be great.

2Modes

unread,

Mar 2, 2019, 2:04:50 AM3/2/19

to LCZero

This sounds a bit like the kld gain approach being tested in T50.

M L

unread,

Mar 2, 2019, 7:29:34 PM3/2/19

to LCZero

True; as the KLD approaches zero, the search approaches maximum entropy. The algorithm decides on the extent of KLD thresholding and node visits. The process is continuously adjusted and refined as the network keeps improving. As the developers have started implementing this in T50, we'll see if it will result in better tactics and faster wins in T60.

Joseph Ellis

unread,

Mar 3, 2019, 11:12:51 AM3/3/19

to LCZero

Essentially, singular extensions.

Markus Kohler

unread,

Mar 4, 2019, 1:47:26 AM3/4/19

to LCZero

I agree that an approach like this could potentially help a lot. AFAIK Leela ATM does not take risk into account. E.g. she should consider moves that look bad in the first place because they move towards positions which are bad (hanging pieces etc), but at the end lead to a better position. The criterias for such risky moves probably overlap with those for quiensense search.

Kevin Kirkpatrick

unread,

Mar 5, 2019, 11:38:07 AM3/5/19

to LCZero

Awesome, thanks! I took a while to follow up on this because I actually had an inkling of a feasible implementation of the idea and wanted to think it through before posting to github.

For those interested, I did open the issue: https://github.com/LeelaChessZero/lc0/issues/780 and used that issue to focus on exploring my idea for implementation.

Phil Wakson

unread,

Mar 5, 2019, 9:38:45 PM3/5/19

to LCZero

You could transate LR as the risk leela is willing to take. Leela will sometimes play a worse move just to see how it develops.

Reply all

Reply to author

Forward