tcec 17 superfinal

2,480 views
Skip to first unread message

Warren D Smith

unread,
Apr 8, 2020, 3:10:09 PM4/8/20
to LCZero
SF won first, in round 3.1.

In round 4.1 Lc0 reported it had achieved a forced mate ("M51" score reported)
but later on in the same game claimed the eval was +2.34 in its favor.

WHAT???!!!!

If your chess program is working, it should NEVER do that.


--
Warren D. Smith
http://RangeVoting.org <-- add your endorsement (by clicking
"endorse" as 1st step)

Shuo Xiang

unread,
Apr 8, 2020, 3:30:10 PM4/8/20
to Warren D Smith, LCZero
🙁🙁🙁

It's also not doing well in the opening stages of the CCCC 20 final.

But this is waaaaay too early to tell. Let's see if it wakes up in the next few games.


  

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/CAAJP7Y2HCUbNLwLOdsrQE5HEV2637mted5hFn8cGDFy_AhCpuA%40mail.gmail.com.

Warren D Smith

unread,
Apr 8, 2020, 3:52:00 PM4/8/20
to LCZero
> In round 4.1 Lc0 reported it had achieved a forced mate ("M51" score
> reported)
> but later on in the same game claimed the eval was +2.34 in its favor.
>
> WHAT???!!!!
>
> If your chess program is working, it should NEVER do that.

-- 33... Ne5 whereupon Lc0 played 34. Rc6a6 INSTANTLY.
What time control management does Lc0 do that enables it to move instantly?

gokiburi

unread,
Apr 9, 2020, 4:42:47 AM4/9/20
to LCZero
It did not. If you look closely you'll see that Leela took 2 minutes and 22 seconds before playing 34.Ra6

Warren D Smith

unread,
Apr 9, 2020, 2:13:53 PM4/9/20
to gokiburi, LCZero
Well, it did on my real-time viewing of TCEC.
Maybe for some reason signals got delayed so I got 2 moves at once?

>It did not. If you look closely you'll see that Leela took 2 minutes and 22 seconds before playing 34.Ra6



glbchess64

unread,
Apr 9, 2020, 5:21:39 PM4/9/20
to LCZero
Some times the GUI lag a bit. In fact there is insta move only if there is one only move. All are pseudo insta moves caused by lag.

Warren D Smith

unread,
Apr 11, 2020, 5:54:38 PM4/11/20
to LCZero

Game 28.1 now happening, Lc0 seems to have a great position & Stockfish's is
disgusting.  If Lc0 can win this game it will be up by 1.  If draw, match dead even.

Warren D Smith

unread,
Apr 11, 2020, 7:51:35 PM4/11/20
to LCZero
Yes, Lc0 is going to win 28.1.
This game was sort of like being trapped in a room whose walls were very very slowly closing in on you.


ronnie millsap

unread,
Apr 12, 2020, 10:16:22 AM4/12/20
to LCZero

whats link i have the old website they had bookmarked. doesnt work anymorea

Lee

unread,
Apr 13, 2020, 4:38:20 PM4/13/20
to LCZero

Warren D Smith

unread,
Apr 13, 2020, 8:37:42 PM4/13/20
to LCZero
TCEC superfinal game 43.1:

in this game, stockfish saw it had a forced win.  Then both kibitzers (blue & red) a few moves after realized the same, their evals
all headed for the sky.  But not leela.  It continued on in blissful ignorance for about 20 moves after, its eval kept creeping up very very slowly.
Finally, about 3 moves after it was completely obvious even to me (the mere human) that stockfish was about to promote TWO freaking queens
on leela's pitiful ass giving it nothing and forcing checkmate soon after  --  leela finally got the bulletin and raised its eval into "loss" territory.

Something seems brain dead in this picture.

It is hard to understand how leela can be this deluded and play 3400 elo chess.

Pawel SalsaDura

unread,
Apr 13, 2020, 9:29:57 PM4/13/20
to LCZero
yes you are right, You are just mere human that takes things to emotional. On the other hand,  Leela plays chess with poker face , no emotion or feelings , and that is why she is so successful. :)  I believe, you as a human player, you would finish all possible chess tournaments  last, even those in the kindergarten, because chess is not about being emotional but rather having no feelings at all!  

Warren D Smith

unread,
Apr 13, 2020, 10:47:48 PM4/13/20
to Pawel SalsaDura, LCZero
On 4/13/20, Pawel SalsaDura <pawel....@gmail.com> wrote:
> yes you are right, You are just mere human that takes things to emotional.
> On the other hand, Leela plays chess with poker face , no emotion or
> feelings , and that is why she is so successful. :) I believe, you as a
> human player, you would finish all possible chess tournaments last, even
> those in the kindergarten, because chess is not about being emotional but
> rather having no feelings at all!

--well, I certainly am not the world's greatest chess player (!), but
I'm not THAT bad.
In fact, I entered a number of USCF tournaments when I was younger, and
I think the overall result was that I won more money than my entry fees.

I don't understand why leela did that. And incidentally
"kingscrusher" has made a number
of youtube videos on this superfinal (probably more coming? I usually
like his videos), and in one of them he too mentioned a stunning
blissful ignorance from leela. This was a different game than the one
I complained about, but same rough scenario: SF and all the kibitzers
saw a huge +eval for SF, but leela thought it was no big deal for a
long time. Until eventually SF murdered it. But in that game, the
situation on the board seemed to me to be complicated, I was unsure
who was right. In contrast, in the game I was complaining about, it
was clear to me Leela was totally dead, and while I saw that after
Stockfish saw it, but BEFORE Leela saw it. So I was, in this
instance, able to outperform Leela, which may be the strongest
chesssplayer in the universe.
Despite the fact I am, as Pawel so kindly remarked, no great player.

A possible hypothesis, or wild guess, is this: in both games there
were passed pawns threatening maybe to queen soon. Perhaps Leela
somehow gets blinded when that
possibility is present? Hard to believe? But anyhow: at least two
superfinal losses
for Leela in this TCEC, so far, in which Leela seemed to have a
completely out to lunch
view of what was going on. It was being destroyed, but seemed to
think no big deal.

glbchess64

unread,
Apr 13, 2020, 11:26:39 PM4/13/20
to LCZero
You can consider that when Leela eval is close to 2.5 that means absolute certitude of win (90% win).

In fact this is not always true because :
  • she miss sometimes some tactical lines especially if there is queens on board,
  • when position are rather closed she miss some fortresses.
In all over cases there is no escape. In game 43 there is no fortress and no queen. You can consider she gave the winning eval on move 30 when swapping the queen.

In the two exceptional cases the eval can reach 5 or 6 an the position may be draw. But it is very rare.

This is not easy to give centi-pawns for Leela eval, the search algorithm does not work with centi-pawns. Leela dev calibrated the function that transform win probabilities in centi-pawns for it to match approximatively SF win rate in [-3; +3] range. If you look at this range, you will see that the best eval are given by Leela and not by SF.

Warren D Smith

unread,
Apr 13, 2020, 11:39:34 PM4/13/20
to glbchess64, LCZero
On 4/13/20, glbchess64 <glbch...@gmail.com> wrote:
> You can consider that when Leela eval is close to 2.5 that means absolute
> certitude of win (90% win).
>
> In fact this is not always true because :
>
> - she miss sometimes some tactical lines especially if there is queens
> on board,
> - when position are rather closed she miss some fortresses.
>
> In all over cases there is no escape. In game 43 there is no fortress and
> no queen. You can consider she gave the winning eval on move 30 when
> swapping the queen.
>
> In the two exceptional cases the eval can reach 5 or 6 an the position may
> be draw. But it is very rare.

--what happened in the game I watching was, when SF & the kibitzers realized
SF had it won, their evals went from like +2 to like +10 to +100 in a stairstep.
Meanwhile Leela's gradually rose from like +1 to about +3 over about
30 moves, but finally after even I saw death was imminent, finally
Leela's suddenly shot up to about +100. (These figures from memory,
not checked.)

If the problem is merely that Leela has a very inaccurate transformation between
two eval scales, then I do not understand why it has to be that
inaccurate. I mean,
for stockfish and its ilk, quite accurate tanh transformations have
been found, about
1% accuracy. You'd think leela as the ultimate in machine learning ought to
be able to do at least as well.

Robert Clark

unread,
Apr 14, 2020, 1:44:04 AM4/14/20
to Warren D Smith, glbchess64, LCZero
Maybe Leela's behavior in these cases is due to the horizon effect? She has great positional sensitivity, but does not look to the depth that some other programs do. And although I can't quite explain why, I wonder if it could also be due to using MCTS for the tree searching?

Robert

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.

glbchess64

unread,
Apr 14, 2020, 3:08:16 AM4/14/20
to LCZero
After 90% win rate eval is not so easy to have a good eval because generally almost all moves are winning with great probability. And Leela has no mean to chose between winning moves. The --logit-q option is an attempt to treat this issue but it is not so clear that it works well. Things becomes better when Leela see the mate (eval +128) and the 0.25 version will choose the shortest and report mate. Now 0.24 choose any of the mate lines.

This issue with won game is the reason why Leela sometimes suicide when she knows that the game is lost. For this reason she can not play good match with handicap. To handicap Leela the best is to take a net at beginning of training.

ronnie millsap

unread,
Apr 15, 2020, 10:26:19 AM4/15/20
to LCZero
LC0 won convincingly in the chess.com as well. noice

Stephen Timothy McHenry

unread,
Apr 16, 2020, 10:18:03 PM4/16/20
to LCZero
Gone up +3 now at TCEC, Leela must now be considered the Queen of chess until some massive SF improvement, or at least until they quit playing with the contempt setting for SF!

Warren D Smith

unread,
Apr 16, 2020, 10:56:51 PM4/16/20
to Stephen Timothy McHenry, LCZero
Lc0 highly likely to win superfinal since up +3 with 2/3 of
match complete.

Unfortunately SF advocates could still argue (correctly) that this
10 wins to 7 victory is not a statistically significant result.

Warren D Smith

unread,
Apr 16, 2020, 11:02:37 PM4/16/20
to Stephen Timothy McHenry, LCZero
Game 67.1 somewhat interesting in the sense that both sides think they are
ahead (e.g. at move 23). Who do you think is ahead?

Pawel SalsaDura

unread,
Apr 17, 2020, 12:27:45 AM4/17/20
to LCZero
It is statistically interesting result because of long time control. Take into consideration how long does this 100 games tournament last and how many for example 1min games could you play instead within that time. Then you realized that it would be thousands of games at very short time control and those short games would have  statistical significance, If so, that only means that those 100 long time control have statistical significance as well!! This logic is very simple and indisputable! 

Tony Mars Rover

unread,
Apr 17, 2020, 1:45:24 AM4/17/20
to Warren D Smith, LCZero, Harry Goodman, Evernote Upload Hebert
I don't concur with the SF fans in that regard. Simply because, in any "World Championship Match, " the Championship is secured by the strongest player with the winning score, not the strongest player exhibiting the best statistical score! At this stage of the Tournament, when all the other contenders refrain, and the two best players remain, I fully expect it to be close! I fully expect a Slugfest! The only math that matters is the winning score!

Best - Tony Mars Rover

From: lcz...@googlegroups.com <lcz...@googlegroups.com> on behalf of Warren D Smith <warre...@gmail.com>
Sent: Thursday, April 16, 2020 7:56:48 PM
To: Stephen Timothy McHenry <teamm...@twlakes.net>
Cc: LCZero <lcz...@googlegroups.com>
Subject: Re: tcec 17 superfinal
 
--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.

Warren D Smith

unread,
Apr 17, 2020, 2:07:15 AM4/17/20
to Tony Mars Rover, LCZero
Sure, as a sporting event, Lc0 presumably will win. Nobody said after the
186-184 game (highest scoring NBA game) that Detroit "did not win."
But: that game did not demonstrate statistically Detroit's superiority, as
a matter of science.
The same criticism is even worse for, e.g. the Carlsen-Caruana and
Carlsen-Karjakin matches :)

Jim Glass

unread,
Apr 17, 2020, 3:22:33 PM4/17/20
to LCZero

"Unfortunately SF advocates could still argue (correctly) that this
10 wins to 7 victory is not a statistically significant result."

Then tell them to add in Leela's simultaneous win by +12 in the 200-game final at CCC.

Adam Kirby

unread,
Apr 17, 2020, 4:17:00 PM4/17/20
to LCZero
SF has not won head to head vs Leela in TCEC or CCC since T30. Them's the breaks.

glbchess64

unread,
Apr 18, 2020, 1:06:06 AM4/18/20
to LCZero
The statistical significance of a match neither depends on the TC nor of the number of nodes per move.

It depends on :
  1. the number of games,
  2. the draw rate,
  3. the opening choice.
The long TC games have a better chess quality but don't have a better statistical quality except that they may give a better estimation of the level of the engines at that TC (the level depends on the TC and all engines don't scale the same way when increasing TC).


Computer chess is not determinist. The main reason is multi threading. The order the thread terminates a task as consequences on the search tree. And this order depends on external reason (OS tasks for example, time management...). If only one move change the position may need a different treatment.

In game 7 SF had a better understanding of the position than Leela and would likely win also a replay (by the way it is for me the best game of the SUFI, Leela is outclass on her ground, positional play). In game 33 SF won because it found a very deep combination that Leela did not see. If you change a move in that game, that can be a draw.

We also may consider that Leela wins are more stable. This is not the case. Leela really play better if there is no queen in open space so that early queen swap may be sometimes an advantage for her. A position that remains closed change the ability for SF to defend...


Long TC favour draws and the statistical significance depends of the draw rate. The greater draw rate, the lower the error bar.


The choice of the opening book is the more important factor. The TCEC opening book is great for fun but poor for evaluating engines (heavily biased).

ronnie millsap

unread,
Apr 18, 2020, 9:35:54 AM4/18/20
to LCZero
Ya this was a good match to show newcomers an aexample of how far behind sf and other ab engines are from nn strategical play. Usually outclassed  sf and only loses to brute forced tactical lines. which it barely even checks :)

Jim Glass

unread,
Apr 20, 2020, 2:26:48 PM4/20/20
to LCZero
Leela is +3 with six games to go, at this writing.

My Mets have pulled worse collapses, but I don't think Leela is *that* human-like.

Warren D Smith

unread,
Apr 20, 2020, 2:35:57 PM4/20/20
to Jim Glass, LCZero
Yes, with +3 and only 5 games to go, Lc0 is almost surely winning it.
Although with only +1 with 12 games to go, things were a lot less clear ...

How do you think Lc0 and SF scale with more thinking time (relatively)?
It might be interesting to see a plot with log(ThinkTime) on horizontal axis,
and relative performance on vertical axis.
> --
> You received this message because you are subscribed to the Google Groups
> "LCZero" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to lczero+un...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/lczero/6b50a4c7-4c46-4480-b5a2-f0957b1162bc%40googlegroups.com.

Shuo Xiang

unread,
Apr 20, 2020, 3:25:39 PM4/20/20
to Warren D Smith, Jim Glass, LCZero
And ladies and gentlemen, we have Lc0's first black win here!

Currently things are also looking up with Lc0 as white, if Lc0 double kills, Lc0 will have won this SuFi.

Robert Filter

unread,
Apr 20, 2020, 3:47:39 PM4/20/20
to LCZero
Wow, epic match, now +4 and she's better in 96.

Champaign's in the fridge!

Great achievement, hope this drives some contributors back to the project :)

Warren D Smith

unread,
Apr 20, 2020, 5:53:43 PM4/20/20
to Robert Filter, LCZero
SF's play in the French defense often (and in this game 96 especially)
looks just horrible. Probably it will lose (& deservedly so)
and then Lc0 beating SF 16 wins to 11. I think SF's evaluation function
has some flaws which cause it to get some big delusions about
the French.

Shuo Xiang

unread,
Apr 20, 2020, 6:47:11 PM4/20/20
to Warren D Smith, Robert Filter, LCZero
Ladies and Gentlemen, we have a winner for TCEC SuFi 17!

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.

Shuo Xiang

unread,
Apr 20, 2020, 6:54:45 PM4/20/20
to Warren D Smith, Robert Filter, LCZero
Leeeeeeeeeeeeeellllllllllllllaaaaaaaaaaaaaaaaaaaaaaaaa

Warren D Smith

unread,
Apr 20, 2020, 7:31:43 PM4/20/20
to Shuo Xiang, Robert Filter, LCZero
Would leela actually have been able to win the final position in game
96 (if no tablebases)?
The neural net had to know (a) how to mate with KNBk against perfect
opposition, (b) that if the Ns were traded, resulting in a KBPk
ending, that that ending would have been a draw because it was the
"wrong rook pawn."

Just wondering :)

Jeff Wads

unread,
Apr 20, 2020, 10:52:59 PM4/20/20
to LCZero
Hey Warren.  You favorite chess program Stockfish is getting baked, fried, and filleted all at once.  How's that for a 3900 performance.  Yeah baby.
GG.  Cheers.


On Monday, April 13, 2020 at 7:37:42 PM UTC-5, Warren D Smith wrote:
TCEC superfinal game 43.1:

in this game, stockfish saw it had a forced win.  Then both kibitzers (blue & red) a few moves after realized the same, their evals
all headed for the sky.  But not leela.  It continued on in blissful ignorance for about 20 moves after, its eval kept creeping up very very slowly.
Finally, about 3 moves after it was completely obvious even to me (the mere human) that stockfish was about to promote TWO freaking queens
on leela's pitiful ass giving it nothing and forcing checkmate soon after  --  leela finally got the bulletin and raised its eval into "loss" territory.

Something seems brain dead in this picture.

It is hard to understand how leela can be this deluded and play 3400 elo chess.


Message has been deleted

Norton Freeman

unread,
Apr 20, 2020, 11:46:24 PM4/20/20
to LCZero
@Warren D Smith
Yes, you can test it on your own computer

glbchess64

unread,
Apr 21, 2020, 12:46:14 PM4/21/20
to LCZero
T60 (or SV-T60) play in trivial endgames is very good without TB even T40 play was very good. TB give "just" about 10 elo.

Nevertheless there is some differences with non trivial endgames that often implies a queen and that are very tactical ones like R vs Q, RP vs Q, QP vs Q...

RNB or RBB are trivial mate for T40 and T60 : they troll until the limit and mate at move 50.

Warren D Smith

unread,
Apr 21, 2020, 1:10:59 PM4/21/20
to Jeff Wads, LCZero
On 4/20/20, Jeff Wads <jeffwads...@gmail.com> wrote:
> Hey Warren. You favorite chess program Stockfish is getting baked, fried,
> and filleted all at once. How's that for a 3900 performance. Yeah baby.
> GG. Cheers.


--well, I never said SF was my "favorite" but it is a very
impressive program a lot of people can learn a lot from :)
The final TCEC 17 result is 17 wins to 12, which is not a statistically
significant bake.

The SF people are currently worried that they
have somehow introduced a bug into SF. In several games versus
leela (not necessarily TCEC games) on heavy hardware
SF managed to play a "suicidal" move for an unknown reason,
then when the SF guys try to duplicate that behavior
on their home machines, SF refuses and plays a good move
instead! So the conjecture is that somehow when there are too
many threads running in parallel for too long, something
somehow messes up, but the SF guys do not own powerful enough
hardware to test this conjecture!
So they may be in deep trouble trying to debug this!

Warren D Smith

unread,
Apr 21, 2020, 1:21:52 PM4/21/20
to Jeff Wads, LCZero
> The SF people are currently worried that they
> have somehow introduced a bug into SF. In several games versus
> leela (not necessarily TCEC games) on heavy hardware
> SF managed to play a "suicidal" move for an unknown reason,
> then when the SF guys try to duplicate that behavior
> on their home machines, SF refuses and plays a good move
> instead! So the conjecture is that somehow when there are too
> many threads running in parallel for too long, something
> somehow messes up, but the SF guys do not own powerful enough
> hardware to test this conjecture!
> So they may be in deep trouble trying to debug this!


--and if so, this may actually (surprisingly?) be one of Leela's
biggest advantages.
That is: Leela is at the core fundamentally simple, and presumably its
components are comparatively easy to test and debug.
All its complexity is self-learned. In contrast, the harder the SF developers
push SF to get better, the more difficult it may become to debug.
Ultimately, that could be devastating.

Shuo Xiang

unread,
Apr 21, 2020, 1:35:34 PM4/21/20
to Warren D Smith, Jeff Wads, LCZero
Duly agree with that bug hypothesis Warren. Multi-threading is inherently non-deterministic and notorious difficult to debug. (I did software engineering in undergrad and so is qualified (I guess) to make that statement?)

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.

glbchess64

unread,
Apr 21, 2020, 4:24:11 PM4/21/20
to LCZero
Leela team can sometimes send a net that has not be heavily tested but is very conservative with the code. The team always test a lot the binaries and sometimes send old binaries for tournament if there is only minor changes in new version.

SF teams always prefers to send binaries that are not enough tested and this is not the first time they have problems.

Notice also that SF was undefeated this season before the SuFi (36 games in divP, 16 against NN) so the "bug" if there is a bug appear only with strong opponent when SF is in very difficult situation. I never see SF giving a pawn for nothing when he has an edge, only when it is under strong pressure and quasi-losing the game.

Warren D Smith

unread,
Apr 21, 2020, 4:37:33 PM4/21/20
to glbchess64, LCZero
Well, I think the SF team feels immunized from bugs thanks to their reliance
on "fishtest." However, fishtest is not going to protect them against a bug
related to too-heavy thread parallelism or too high performance
for them to see on their testing hardware.
I think they may have one hell of a problem trying to figure this one out.

glbchess64

unread,
Apr 22, 2020, 4:47:22 AM4/22/20
to LCZero

Stephen Timothy McHenry

unread,
Apr 22, 2020, 5:30:20 PM4/22/20
to LCZero
Could it not be instead of a "bug", rather with all those threads SF is seeing that it has been outplayed sooner and has a "horizon effect" move to avoid the ultimate defeat on the board? I know the problem of making the bad move would still exist, but the issue would be far different than having to recode buggy software.

123

unread,
Apr 28, 2020, 6:08:39 PM4/28/20
to LCZero
Stephen Timothy McHenry:
Looks like the only bug was not to play with negative contempt -> http://talkchess.com/forum3/viewtopic.php?f=2&t=73792

LC0 really need some big improvements in tactics and should always do her training from chess 960 positions and also have perfect play with 7 pieces on board, even if it's not the fastest.
And it's maybe time to change to bigger nets like 30x384 or 30x512 and speed up the development and the new RTX 3000 GPUs should be ready next month.

glbchess64

unread,
Apr 29, 2020, 6:58:01 AM4/29/20
to LCZero
This article is not serious and @corres explain why.
Reply all
Reply to author
Forward
0 new messages