I have for you 50 games Leela VS Stockfish 9 64-bit. Quess who won!!! :)

1,124 views
Skip to first unread message

Margus Riimaa

unread,
Aug 29, 2018, 4:56:51 AM8/29/18
to LCZero
Processor Intel i7-8700 CPU @ 3.20GHz, 6 Core(s), 12 Logical 
Memory 16GB
Graphics         1080Ti
Storage 230GB SSD
Hash 1024MB both
Ponder ON
Syzygy TB 5-man
Time Controls (40mov in 8min + 1sec move)+(40mov in 2min + 1sec move)+(1 sec move)
GUI         Rybka Aquarium
Opening Book Max 4 moves from a Widebook
Engines:         Lc0 v0.17.0-rc2 Net-ID Don´t know where to look - can you help me? Loaded yesterday morning.
        Stockfish 9 64-bit
Match arrang. Each opening played twice with switched sides.

Games                    50
Score:                     Leela: 27.5
                                Stockfish: 22.5


Not only that - The score should be actually even more in Leelas favor, because the following strange things happened:

1. Game 4 was given to Sf, because Leela (black) forfeited on time, but Leela was clearly winning - she had mate in 7
2. Game 11 was given to Sf, because Leela(white) forfeited on time, but Leela was clearly winning - she had mate in 6
3. Game 13 was set to draw for an unknown reason, but Leela(white) was clearly winning - she had mate in 2
4. Game 14 was set to draw for an unknown reason, but Leela(black) was clearly winning - she had mate in 4

5. Game 20 was given to Leela(black), because Sf forfeited on time, however Leela was ahead  (-1.24:-2.10) both engines eval favor black. Move 39, late midgame.
6. Game 40 was given to Leela(black), because Sf forfeited on time, however the game was drawish (-0.03:-0.54) negative value indicate slight advantage to black. Move 39, midgame.

I tried to replay the broken games, but sadly I was unable to do so in the Aquarium Gui. I did set up the position, but for some reason the engines still started from the initial position, even though I selected the other positions.

It would be delightful if Tryfon Gavriel would find here a game of interest aswell. I do enjoy your analysis!

Enjoy!
Original.pgn

Alexander Lyashuk

unread,
Aug 29, 2018, 5:03:01 AM8/29/18
to margus...@gmail.com, LCZero
Poor kingcrusher will have hard time covering all the Lc0 wins given increased rate of appearing of those games recently. :)

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/64ee76bd-270a-4493-b602-49b2163e1fc8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Margus Riimaa

unread,
Aug 29, 2018, 5:38:18 AM8/29/18
to LCZero
I sadly have to admit, as Jonathan Rosenthal kindly pointed out, that I was probably running Sf on only just 1 core - I checked and indeed, it happened. So please be aware, that Sf was handicapped in this case.
Sorry! Did not do that on purpose.

Matt Blakely

unread,
Aug 29, 2018, 8:12:49 AM8/29/18
to LCZero
Ok that makes sense.  Because in my testing so far, SF9 is still beating leela overall.  However we aren't that far from reaching parity.

If you repeat the test, can you find out the ID?  It's a constant battle to determine the best IDs and the newest aren't always the strongest.

ovi...@gmail.com

unread,
Aug 29, 2018, 8:44:33 AM8/29/18
to LCZero
Margus, could you calculate the leela ratio you were getting by running just one core?
You could compare latter with having full power.

Margus Riimaa

unread,
Aug 29, 2018, 9:43:06 AM8/29/18
to LCZero
Yes, of course!

If the Leela ratio formula is the following: (Lc0 nps/sf9 nps * 875) at the starting position evaluation: 
Stockfish 9 at d=26, Lc0 at d=10

Leela has 9kN/s
Stockfish on 1 thread:  1993kN/s  Ratio: 3.95
Stockfish on 12 threads:14637kN/s Ratio: 0.54

ovi...@gmail.com

unread,
Aug 29, 2018, 9:55:24 AM8/29/18
to LCZero
If you repeat the match with sf at full strenght we will have a clue about how results scale with ratio.

Margus Riimaa

unread,
Aug 29, 2018, 10:00:46 AM8/29/18
to LCZero
Oh Dear, I must be doing something very wrong again. I don´t download the net from the website, but I just run the training client and it downloads and updates it for me. Therefore I also never know, which net-id is currently in use and don´t know how to find it out either.

garrykli...@gmail.com

unread,
Aug 29, 2018, 1:28:31 PM8/29/18
to LCZero
Very interested to see results with full sf9.  Watching this project gets more exciting weekly

Alexandre Meirelles

unread,
Aug 29, 2018, 8:09:39 PM8/29/18
to LCZero
Hahahaha SF9 losing on time???! Come on...you have something wrong with your PC or program.

SF9 will never lose on time with those TC conditions.

Best regards.

ccamp81318

unread,
Aug 29, 2018, 10:43:48 PM8/29/18
to LCZero
I just don't see why you folks think these 'tests' have value. If you spend any time watching leela think you will see that it is not scalable. Past about 5 or 10 seconds per move leela gains virtually nothing by extending its available time. All of the brute force bots keep getting stronger. So when you run these tests at seconds per move you are slanting the outcome drastically in leela's favor. At a TC more like a real tournament game say 40/120+20/60+30SD you will see that leela plays about the same as it does at a few seconds per move while your favorite test subject SF9 will play about 200 points stronger. I've been watching Komodo 12 and Lc0.17.rc2 analyze in parallel for hours a day and that is what I see. Perhaps some other 2000+ chess player sees something different. It is possible I suppose as all of my analysis is of similar positions.

Dietrich Kappe

unread,
Aug 30, 2018, 1:13:29 AM8/30/18
to LCZero
Well, here is one counterexample:

https://github.com/dkappe/leela-chess-weights/wiki/Project-Success

Leela slaps sf8 around at higher nodecounts.

Dietrich Kappe

unread,
Aug 30, 2018, 1:26:33 AM8/30/18
to LCZero
Note that the “second” figure refers to simulation of games on monster hardware. The “8 second” games took 10.5 hours to complete. We have a 30 second simulation underway. Stay tuned.
Reply all
Reply to author
Forward
0 new messages