Leela performance so far in the Champions Bonus

1,618 views
Skip to first unread message

Fahim Saharaiar

unread,
Jun 5, 2019, 5:03:21 PM6/5/19
to LCZero
Champions Bonus is almost half way through and Leela has lost to Sufi 14 SF and Sufi 12 SF without winning the reverse. She also lost to Sufi 4 but won the reverse (bad?).
She is also in 5th place. 
What is going on? Why such a poor performance from a Sufi 15 AI?

Jim Glass

unread,
Jun 5, 2019, 5:21:13 PM6/5/19
to LCZero
Leela's performance rating so far is 101 Elo points below the rating given to her by TCEC at the start of the event.

Kevin Parent

unread,
Jun 5, 2019, 7:47:11 PM6/5/19
to LCZero
The number of games played is really low, especially with random openings where they probably wont play enough game to make it fair for everyone... it's basically a bonus, it's just for fun... this is why SUFI12 is first place... don't take anything serious from these games... this is far from being a competitive format.

Shah

unread,
Jun 6, 2019, 1:32:16 AM6/6/19
to LCZero
But could it still indicate she has trouble in this TC?
Adding,
Leela was always performing relatively poorly against weaker opponents.
And the openings... are also always more an issue for Leela than for AB engines.

srikant dash

unread,
Jun 6, 2019, 2:39:12 AM6/6/19
to LCZero
There has been been 0 proof that Leela performs poorly against weaker opponents.

I am quoting this from another post

"During the TCEC cup rounds, Leela had the lowest draw percentage of all engines that made to a following round, every round.   She also posted the quickest match wins, and the most wins as Black.  That constitutes the last games she played before the SuFi, and are literally the last time Leela played against a field of engines."

If there is an opening where both the engines win as white but the engine which Leela is facing has a much power lower elo then Leela then Leela would lose elo points even if the result is 1-1.

Sam Jukes

unread,
Jun 6, 2019, 6:24:39 AM6/6/19
to LCZero
I'm not convinced on Leela playing poorer against weaker opponents. I think the bigger factor is the openings. Leela trains deeply in openings she likes and disregards those that she doesn't. Often leading to her playing poorer in these openings when forced into it as less time has been spent analyzing. But since without book she can choose not to go into these openings then why should she become good at them? A good example is Sicilian defense. AlphaZero and Leela both seem to not like it too much and as a result get outplayed quite a lot as black in Sicilian.

Maybe the fact she plays openings she seldom trains is a contributing factor for poorer results against weaker engines. My theory is that if she plays whatever opening she wants then her results against the weaker engines will go up drastically. 

Deep Blender

unread,
Jun 6, 2019, 9:54:43 AM6/6/19
to LCZero
I agree with you.
During the self-play, Leela picks all the moves and once it learned that certain opening positions are not favorable, it tries to avoid them. When it is thrown into such an unfavorable position, we know it is way more likely to have trouble as it avoided those positions during the training and doesn't have much in-depth knowledge.

By using book openings, Leela is clearly getting a disadvantage. However, that's part of the game.
Leela would likely be in a better position, if they wouldn't be forced to play a specific opening, but instead e.g. the first ply is forced (based on book openings), the second one picked by the engines, third one is forced again (based on book openings), up to a certain point. This would give Leela a chance to move the game towards better known territory. At the same time, this would give other engines a disadvantage which can also be seen as unfair.

Sam Jukes

unread,
Jun 6, 2019, 10:41:33 AM6/6/19
to LCZero
My opinion has always been to let them play their relative strongest. E.g. stockfish with book (Brainfish perhaps) against Leela with no book. I'd love to see that.

Jim Glass

unread,
Jun 6, 2019, 3:14:36 PM6/6/19
to LCZero
I don't know about small sample size or underperforming against weak opponents (are past SuFi champions so weak?).

But one thing that clear is that Leela has been playing too many draws and getting too few decisions. She has fewer decisions than any other entrant, and is a bunch short of what she needs to compete with the top four Stockfishes. (Assuming she'd win them.)

Message has been deleted

Fahim Saharaiar

unread,
Jun 6, 2019, 5:47:26 PM6/6/19
to LCZero
Guys just look at the Champions bonus now. Leela already played 28 games and is in 6th place! Let that sink in. I mean leela is great when it comes to 1-on-1 matches, but with round robins her weakness shows. Also, the SF engines are roughly in sorted order, with the exception of Sufi 12 and 13 out of order.

Something is very strange here... 


David Ongaro

unread,
Jun 6, 2019, 5:55:14 PM6/6/19
to LCZero
By using book openings, Leela is clearly getting a disadvantage. However, that's part of the game.

No, it's not "part of the game". By using forced openings it's not chess anymore, it's a chess variant. The starting condition is a fundamental part of the rules. You could call it part of the tournament, but then the tournament itself is not a "chess tournament" anymore. What would the players say in a normal chess tournament if forced openings were put on them?

Of course, you can say you want to find the strongest all round player who can make the most out of every situation, but that's a different goal from just finding the best "chess" player.

Dave Whipp

unread,
Jun 6, 2019, 6:02:19 PM6/6/19
to Fahim Saharaiar, LCZero
A reasonable hypothesis here would be that new versions of stockfish are hand-crafted to defeat previous versions; and so it's not surprising that they do so, disproportionately to their generalized strength as chess players. This is a variant of the rock-papers-scissors effect we've seen in Leela's training, where beating a previous version is not a good indicator of increasing strength.

More likely, newer versions of stockfish really are stronger in the general sense; but, at the same time, a disproportionately strong when playing previous sufi winners because they've been tuned to beat those specific engines. Leela doesn't have this advantage; and so appears disproportionately weaker in a multi-engine comparison.

Alternatively this could all be a small-sample-size issue ... but I wouldn't want to rely on that, especially as SF engines are ranked as expected.

On Thu, Jun 6, 2019 at 2:47 PM Fahim Saharaiar <fah...@gmail.com> wrote:
Guys just look at the Champions bonus now. Leela already played 28 games and is in 6th place! Let that sink in. I mean leela is great when it comes to 1-on-1 matches, but with round robins her weakness shows. Also, the SF engines are roughly in sorted order, with the exception of Sufi 12 and 13 out of order.

Something is very strange here... 


--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/5d51d8d2-9047-4f2c-ad06-50f17769fb9c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gokhan M

unread,
Jun 6, 2019, 6:57:05 PM6/6/19
to Dave Whipp, Fahim Saharaiar, LCZero
The way leela learns is a very biased (bayesian). It makes the assumption that her opponent will like herself ("probabilistic play") vs mini-max. During self play, Leela also doesn't probably choose Sufi openings nearly enough to master them. A lot of food for thought here. Not the first time that Leela is playing marginally against weaker opponents.

glbchess64

unread,
Jun 6, 2019, 11:29:06 PM6/6/19
to LCZero
There is several reasons why this kind of tournament is heavily biased :
  • The games do not begin from the start position and some sequences of moves are long (my tests show that Leela is 10 ELO weaker if given 8 plies instead of 4, in this tournament some games start after 30 plies or more) ;
  • The opening are tested with SF (and may be last H and K) so that their evaluation function gives low value (Leela eval function is not considered : she lose a game to a weak engine she evaluate close to +2 at the end of the book losing probably 1 point) none of the top engines had to play a so biased position at the end of the book.
  • All competitors do not play the same openings.
There is also the small sample size effect (negative score against the two SF 10 and SF 10 variant, she always win long series of games against any engine).

And the last, the real Leela weaknesses :
  • She can play tactical blunders that even the weaker engines can detect with many cores, and she is not able to exploit some tactical weaknesses of the weak engines. For this reason SF 9, SF 10 and SF dev which are tunes with contempt to push weaker engines to blunder have a strong advantage.
  • The openings are chosen to be complicated tactically or positionally, but Leela only handle well the positional complexity. It is also a sort off bias since the most complicated positions in term of positional play, the start position or the positions close to start are not allowed by the opening book.
This tournament is just for fun, enjoy it !

Shah

unread,
Jun 7, 2019, 12:56:18 AM6/7/19
to LCZero
All the reasons that people mention are very good, but I wonder why no one considers the time control, which is very different from sufi.
If not mistaken, in a recent cccc touney, that had similar 30m time control, leela could beat sf, but final result was far more marginal. Hope I remember correctly.

Stephen Timothy McHenry

unread,
Jun 7, 2019, 7:25:45 AM6/7/19
to LCZero
Yes, you are right

Alexey Eromenko

unread,
Jun 7, 2019, 6:18:11 PM6/7/19
to LCZero
I think we should train Leela with TCEC openings and also Chess 960 openings, so she become more versatile Chess player.

Jim Glass

unread,
Jun 7, 2019, 7:11:19 PM6/7/19
to LCZero
After 31 rounds the top 4 Fishnets have won 19, 18, 18 and 13 games while losing a combined total of 2.

Leela has won just 10 and lost 3, not even near competive.

She's performing 130 Elo points below expectation and that's getting worse.

No reason is clear to me.

Small sample size? After a slow start the Elo underperformance should be getting smaller, not larger.

Fixed openings? They didn't stop her from winning SuFi 15 convincingly and being dead even with Fish in SuFi 14.

Playing down to weaker opponents? Faster time controls? They didn't stop her from tearing to through CCC 7 to win and CCC 8 to very near win.

Anyone have any other ideas?

Maybe her server is overheating?


Deep Blender

unread,
Jun 7, 2019, 8:24:23 PM6/7/19
to LCZero
I don't agree with on on that. Rather than forcing certain openings, it would be better to improve Leela's exploration, such that it would find its own weaknesses better.

Even though, it would most likely give a boost, it may hide potential advances in its exploration and in the worst case those would not be considered or further improved. Good in the short term, possibly harmful in the long run.

Stephen Timothy McHenry

unread,
Jun 7, 2019, 9:53:44 PM6/7/19
to LCZero
This has been clearly explained to you, just read the above posts. I'm not sure about your continued perplexity on this tourney. It is not a serious tournament, just have fun with it

Jim Glass

unread,
Jun 7, 2019, 11:32:00 PM6/7/19
to LCZero
Ah, computers take it easy and don't play so hard when they know they are in "fun" tournaments?

Leela maybe, through the marvel of AI.

Not the other spoilsports though, who are lining up nicely 14 down to 1, with performance ratings spot on with their ratings as earned in their other tourney performances.

Dumb A/B engines don't know how to take a break when playing in tourney that's "not serious", it seems.

Stephen Timothy McHenry

unread,
Jun 8, 2019, 12:00:45 AM6/8/19
to LCZero
which doesn't answer any of the substantive comments made above, I just don't think I have to repeat them

srikant dash

unread,
Jun 8, 2019, 3:41:27 AM6/8/19
to LCZero
The reason why Leela is doing bad is very simple, latest version of chess engine beat older version of the same chess engine.

Stockfish 10 beats stockfish 8 more then Leela beats stockfish 8 because stockfish 10 is an improvement over stockfish 8 knowing its moves and improving it.

Does no 1 here really remember 9000 elo Leela? Because we use to check Elo beating older version of Leela?
I assume we should get similar results with 10 versions of leela going till the start of the program and 1 stockfish. The reason stockfish is doing so good is cause it can beat older versions of stockfish easier.

glbchess64

unread,
Jun 8, 2019, 5:39:04 AM6/8/19
to LCZero
I don't think so. Have a look to the crosstable. You will see that the top 4 engines (variants of SF 9 and 10) does not defeat each over heavily (in the contrary Leela should be 3rd or 4th and not 5th) but they beat SF 8 and SF 6 in "the same way" they beat Houdini and the Komodos that have the same level.

You can read my previous post above for better reasons, in particular the use of contempt : the effect is to make the game longer, waiting for inaccuracies and blunders of the opponent. And this is great against weaker engine and in open positions. When Leela shuffle, she is doing something like that but she prefer closed positions where inaccuracies are difficult to exploit or does not exist (in closed positions with few pieces there is often just two categories of moves : very bad ones and very good ones and nothing in the middle contrary to open position). To fully benefit of weaker engines inaccuracies, Leela need starting to play from the beginning of the game since her play is very positional.

Brian Van Tassel

unread,
Jun 8, 2019, 6:55:23 AM6/8/19
to LCZero
My impression is that Leela does play less decisively than SF against weaker competition.  (Remember the most recent CCC, for example, where there was a bit of nailbiting as to whether Leela would reach the top 4).  But if someone has data that contradicts this, I'd be happy to be corrected.

But ... it makes sense, doesn't it?  Chess is draw-ish.  The objectively "best" chess move in a given situation might be one that preserves a draw, rather than opening up the risk of a loss.  The best engine might get poorer tournament results for that reason.  Ironically, an engine that plays a slightly "worse" move (but that is less draw-ish) against a weaker engine could end up winning because the weaker engine doesn't know how to respond correctly.

Similar to human play.  A stronger player will sometimes deliberately enter into a more complex / risky line against a weaker player, confident that he or she will be able to exploit it, even if "objectively", the line may be worse. I guess this is a form of "contempt".    And it is probably a good strategy for getting a Win against a weaker player, rather than playing a more draw-ish line.

So ... I think Leela will often struggle in a tournament setting with a wide range of engine strengths.  I think Leela usually does better with a smaller field of similarly strong opponents.

Thoughts?

Ivan Ivec

unread,
Jun 8, 2019, 10:42:34 AM6/8/19
to LCZero
My dear friends,

there is no problem with Leela, neither with Stockfish, but with hardware configuration for A/B engines.

16 GB of hash memory for 43 threads is extremely low, around 380 MB per thread.
As a consequence, A/B engines suffer performance drop on very long time controls, like in TCEC Superfinal.

glbchess64

unread,
Jun 8, 2019, 11:42:46 AM6/8/19
to LCZero
Very interesting. But probably not sufficient to explain all observations. I also follow the CCC 8 and the bonus. It seems (tell me if I am wrong) that there is much memory and TC is faster so that according to you it gives better chances to AB engines. But you will see the same problem : SF win the final because it has a better score against the weaker Houdini. Leela beat head to head the 3 other engines but not with sufficient margin against the weaker. In the bonus there were no weaker engine just variants of Leela or SF and sometimes Allie and the results were always in  Leela favour.

Is it possible the explanation for CCC is memory usage ? What do you think ?

glbchess64

unread,
Jun 8, 2019, 11:49:30 AM6/8/19
to LCZero
From the TCEC chat : "S15 Sufi had 32GiB memory for both (well 30GiB for GPU and 32GB for CPU)", by @aloril42

Ivan Ivec

unread,
Jun 8, 2019, 1:20:57 PM6/8/19
to LCZero
In CCC Leela has benefits from very strong GPU system, which is especially advantageous on lower time controls.
Number of threads helps logarithmically, while number of CUDA/tensor cores helps linearly.

About hash: I'll try to do some measurements on lower scale and I'll inform you about conclusions.

glbchess64

unread,
Jun 9, 2019, 1:19:58 AM6/9/19
to LCZero
Theoretically CCC hardware is better but TCEC gives for Leela (at start of game) ~57 Knps and CCC ~61 Knps. The difference seems not very significant. And SF at TCEC is ~70 Mps and at CCC ~100 Mps.

I suppose the hardware favour Leela at TCEC (if CCC is the reference at TCEC Leela has 93% nps and SF 70%). I am wrong ?

Moreover, the problem is not the results against SF 9, 10 or dev (SF Leads by 1 point for 10 games, small sample size I presume) but the results against the weaker engines. And this is a well known problem.

Jim Glass

unread,
Jun 9, 2019, 9:47:31 AM6/9/19
to LCZero
"Moreover, the problem is not the results against SF ... but the results against the weaker engines. And this is a well known problem."

This. Against the top four Fishnets Leela is -1, she is so far behind them because she has underperformed against the weakies below her on the crosstable.

Playing down to the level of weak opposition is a well documented behavior of hers and was discussed in the Lc0 org blog at the end of last year...

http://blog.lczero.org/2018/10/leela-promotes-to-round-of-16-in-tcec.html

"It's a general observation that Leela underperforms against weaker engines and a good analysis of this can be found HERE ..."

http://www.talkchess.com/forum3/viewtopic.php?f=2&t=68517&p=775116

Ivan Ivec

unread,
Jun 9, 2019, 11:22:21 AM6/9/19
to LCZero
Nevertheless, positional style of Leela is not as good against weaker opponents as tactical style of Stockfish.

About hash:
I ran 80 games SF vs SF (300+3 seconds TC, 3 threads), one variant with 512MB hash, and one with only 1MB hash. It was 12-0-68, 512MB winning of course.
And that with very drawish openings...
So, at TCEC Stockfish performs slightly better in Bonus than in Superfinal due to hash (Leela fills 16GB memory at slower pace...).

Antony Mathews

unread,
Jun 10, 2019, 1:48:47 AM6/10/19
to LCZero
IMHO, because of Opening Book, Leela is not performing well. To explain this Leela has not explored/learned much in non-promising opening lines.
Message has been deleted

denochss

unread,
Jun 14, 2019, 5:40:43 PM6/14/19
to LCZero
El miércoles, 5 de junio de 2019, 23:03:21 (UTC+2), Fahim Saharaiar escribió:
> Champions Bonus is almost half way through and Leela has lost to Sufi 14 SF and Sufi 12 SF without winning the reverse. She also lost to Sufi 4 but won the reverse (bad?).
> She is also in 5th place. 
>
> What is going on? Why such a poor performance from a Sufi 15 AI?

In a league tournament it would be better to use only one version of each engine, because in a certain tournament there may be a crack and the group of engines may be the same version, in this case it could be Stockfish. Just as it is said that the benefit is for Stockfish and the one that is damaged is Leela, also the other engines are damaged.

Greetings.
Reply all
Reply to author
Forward
0 new messages