How does Leela get a CCRL rating of 2900 with relatively weak tactics?

856 views
Skip to first unread message

Tony Asdourian

unread,
May 21, 2018, 10:29:46 AM5/21/18
to LCZero
I see that the latest versions of Leela have a 2900+ approximate CCRL rating.  I also keep reading on this forum that tactically, Leela is far weaker than traditional programs of such a rating, and that in fact it still will occasionally blunder away a piece or miss a simple mate.  Two questions: 1) Is is true that Leela is still weak, tactically, compared to traditional programs rated around 2900 CCRL?, and if the answer to that is "yes", then my second question, which is the major point of this post, is, 2) How is Leela able to beat 2900 CCRL level programs if it is relatively weak tactically?  Can someone here summarize in words what compensating strengths Leela has?  All the analysis of AlphaZero at least made plausible sense to me, in that AlphaZero made surprising long-term sacrifices that Stockfish couldn't see that actually had long-term positional advantages, but AlphaZero wasn't also making basic tactical errors. What enables Leela to do so well against 2900 traditional programs if it is notably weaker than them tactically?

Jhor Vi

unread,
May 21, 2018, 10:41:09 AM5/21/18
to LCZero
The training results make Leela realize that she's weak in certain tactics that's why she learns to find positions that avoid them so not giving opponents opportunity or atleast choose positions with a kind of tactics that favors her.

Tony Asdourian

unread,
May 21, 2018, 10:50:06 AM5/21/18
to LCZero

Wow, that's quite interesting if it's true.  Do others here agree with Jhor?

Björn Holzhauer

unread,
May 21, 2018, 11:30:30 AM5/21/18
to LCZero
I find it fascinating, too. I don't think that the given explanation is correct though. There is nothing in her that inherently would try to select positions that favor her stylistically. You might get that, if you trained her via reinforcement learning using games versus a traditional engines, but she is actually learning from self-play that should not even be giving her information on which positions she plays a lot better than other engines. 

However, there seems to clearly be enough space for improving (using Stockfish as a reference point, clearly at least several 100 Elo points) both tactically and strategically for engines at this level, that she can compensate by better assessing positions/long term compensation than other engines of a similar strength (which in turn have to be better tactically).

Joseph Ellis

unread,
May 21, 2018, 11:33:09 AM5/21/18
to LCZero
More or less, yes.  LCZ is not bad at tactics because that is a property inherent to neural networks; LCZ is bad at (some) tactics because the training data has made it that way.   What happens is temperature can interfere with tactical follow up.  This means that the network will not get (relatively) good results for tactics which are longer in nature and require more upfront risk (piece sacs for example).  This doesn't mean LCZ can't learn them, just that they will be generally more difficult to learn and more static advantages will learn to be preferred by the network. The same behavior (style) was exhibited by A0 in some of the games published. This artificial (human) biasing of the network is why I created dynamic temperature - to ensure tactics are more often properly followed up on, rewarded, and thus more accurately weighted by the network.  A0 used resign to help in this regard, which works but is not a complete solution.  Neither of the aforementioned methods will fix the game result distribution. In fact, alone, they will exacerbate the issue. 

In a nutshell there are at least 3 significant issues with current training data for value:
1) constant temp vs tactics (result flipping/loss) - addressed by dynamic temp & resign
2) signal to noise ratio of data (result flipping/loss) - addressed by dynamic temp, temp decay, resign, & other methods
3) statistical distribution of results (blunder biasing) - addressed by temp decay & other methods

 To me this is sad because part of the idea of LCZ is for humans not to influence its behavior, yet that is what is (currently) happening. 

Trevor

unread,
May 21, 2018, 12:17:03 PM5/21/18
to LCZero
What's dynamic temperature? I thought the only type of temperature change is temperature decay, and that that's only used in match games (not training). This isn't accurate?

Also, I thought resign is not being used yet. This isn't correct either?

Another thought regarding tactics... I think that simply exploring more nodes at the root would help quite a lot (during game-play - not sure what the effect would be during training if the flattening of the policy isn't "unflattened" in PUCT). There is definitely strong theoretical justification for exploring more at the root, in the form of minimizing "simple regret" (vs cumulative regret). This has long been intuitively true to me, but I didn't know until very recently of research addressing this... See, for example: https://arxiv.org/abs/1207.5536

Jhor Vi

unread,
May 21, 2018, 12:17:12 PM5/21/18
to LCZero
Is it possible to limit Leelas learning so that it accepts only static positions and just let a quiescence search take care of the capturing moves?

Trevor

unread,
May 21, 2018, 12:31:18 PM5/21/18
to LCZero
Regarding this paper about MCTS simple regret... It might be unclear how one would apply this to Deepmind's "puct" implementation (speaking of puct, doing some research, it turns out that nobody seems to know for sure exactly where this came from - the referenced paper seems to not be it).

However, my idea is this. To get from UCB to "UCBs" (it's actually "UCB sub radical" in the paper), you can just apply a transformation to 'N' (where 'N' is the number of total visits at the current node): log(N) => sqrt(N). This is the same as: N => e^(sqrt(N)).

So my thought is to make that same change in PUCT at Leela's root. I.e., at the root node, use something like this algorithm:
PUCTs(N, Vi, ni) = c*PUCT(f(N), Vi, ni)

Where:
c = some configurable constant
N = Number of total visits at current node
f(N) = e^(sqrt(N))  [Transformation to force more exploration in root nodes that have a lot of visits... Another idea might be f(N) = N^2]
Vi = Average value of child node
ni = Number of visits of child node.


Sorry for the math, but the point is this type of idea would be rather easy to try out.

Jesse Jordache

unread,
May 21, 2018, 12:32:02 PM5/21/18
to LCZero
I'm assuming the "multi-armed bandit" problem is something related to the knapsack problem?


On Monday, May 21, 2018 at 12:17:03 PM UTC-4, Trevor wrote:

Trevor G

unread,
May 21, 2018, 12:41:54 PM5/21/18
to Jesse Jordache, LCZero
I had to google knapsack problem. Multi-armed-bandits are just a simple conceptualization of the “exploration vs exploitation” dilemma. If you’re at a slot machine with a bunch of arms to pull (or playing chess with a bunch of possible moves), do you pull the arm that has the greatest average reward so far, or do you pull the arm that you’ve tried the least to learn more? UCB (which UCT and therefore PUCT are based on) is a solution to this problem with good theoretical bounds.

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/3f6a27ff-cbe8-44cd-8481-b697df4292e9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Albert Silver

unread,
May 21, 2018, 1:06:02 PM5/21/18
to LCZero
No, I actually completely disagree. Seriously. I have rarely seen such an aggressive and speculative engine, and none of that is done in a clear blue sky with no complications. Quite the opposite. The matter of her tactics is a complicated one. yes, she misses 2-move tactics no engine within 1000 Elo would miss, but on the other hand she also doles out incredibly deep tactics top engines miss altogether in a similar timeframe, so it is a very mixed bag. These swings of idiocy and brilliance int tactics are fairly unheard of in computer chess so it obviously makes it very hard to judge.

Richard Nowak

unread,
May 21, 2018, 1:17:05 PM5/21/18
to LCZero
I dunno. It seems to me that relying on "some configurable" constant is an arbitrary injection of human bias which, I thought, was to be avoided at all cost. I suppose we could change LeelaZero to LeelaNotQuiteZero. I'm assuming that these constants are not determined by Leela of course. Leela must learn, in the process of adding nodes to her net, that when there is no future in continuing a particular path, she stop. It may take another neural net to do so, maybe not. The developers of AlphaZero solved the problem and probably won't divulge their tricks to anyone, not after spending $500 million for DeepMind. 

Trevor G

unread,
May 21, 2018, 1:21:44 PM5/21/18
to Richard Nowak, LCZero
These algorithms always have cobnfigurable constants. All of deepmind’s did, Leela does, etc. There are ways to automatically tune these, lots of recent discussions on the forums.

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.

Albert Silver

unread,
May 21, 2018, 1:27:33 PM5/21/18
to Trevor G, Richard Nowak, LCZero
Remember there are two parts to Leela in all cases: the NN, and then the binary that deals with all the MCTS and tweaking. That part was always subject to human interference.

To unsubscribe from this group and stop receiving emails from it, send an email to lczero+unsubscribe@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "LCZero" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lczero/0_y_nAK_BKc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lczero+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/CAJHRZ%3DJ2MwBnOq%2BjyJHpJGQitr4Ffv3BtZOBim27cJrfn10a3g%40mail.gmail.com.

Richard Nowak

unread,
May 21, 2018, 1:50:54 PM5/21/18
to LCZero
Thanks. I'm new around here and it's obvious to me that you guys know a lot more about this subject than I which is the reason I come here in the first place.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "LCZero" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lczero/0_y_nAK_BKc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lczero+un...@googlegroups.com.

Trevor G

unread,
May 21, 2018, 2:05:08 PM5/21/18
to Richard Nowak, LCZero
On discord a few days ago, MTGOStark wrote: "Leela's not *really* zero until we leave a motherboard in the middle of a rainforest and it learns to beat Magnus Carlsen at hopscotch."

The point is, you can only go so far with the "zero" idea.


Also, one more thing, to be a little bit pedantic, in prior message, I should've written:
PUCTs(N, Vi, ni) = PUCT(c*f(N), Vi, ni)
The constant does not apply to the average value (Vi) term.






Reply all
Reply to author
Forward
0 new messages