[New L.e.e.l.a Rating Lists] Leela ID 20755 Tests Complete: 2937 Elo

594 views
Skip to first unread message

Cscuile

unread,
Sep 15, 2018, 9:42:15 AM9/15/18
to LCZero
Please keep in mind this is a work in progress. Thanks!

Features of the new lists:
-Stockfish Version Elo is now based off FishTest (20,000 to 40,000 Games Played)
-New updated Long Time Control Rating List. Time Control will be determined by the first PoDR of both engines with sample sizes taken into account. 
-Optimal Time Control Relative to Sample Size: 10:45 Minutes Per Game
-Improved overall design to improve space usage.
-Elo based on CCRL 40/4 (Stockfish 9 Elo set to 3560)
-All AlphaZero and Stockfish theoretical estimates are now based on Scaling.


Node Ratios:

863:1 (SF9-Leela 20x256)
LeelaFish Ratio: 0.4214
AlphaZero-Stockfish 8 Ratio: 0.9863

Thank you!
-Cscuile


Message has been deleted

Cscuile

unread,
Sep 15, 2018, 10:03:10 AM9/15/18
to LCZero
My mistake, I misread "633" as "755". I'm sorry for this! I just woke up this morning to my finished tests. It was a bad idea messing with Leela in the morning. Please forgive me! I am human. 

Matt Blakely

unread,
Sep 15, 2018, 10:24:56 AM9/15/18
to LCZero
Great minds think alike Csuile!  I'm starting my benchmark testing for the new nets today as well.

Keep up the good work - I will definitely be looking at your spreadsheet freqently
 

Cscuile

unread,
Sep 15, 2018, 11:51:27 AM9/15/18
to LCZero
Thanks Matt. Do you have a docs link or anything similar for when your tests are finished? I would love to see how your results compare. 

Daniel Rocha

unread,
Sep 15, 2018, 2:54:56 PM9/15/18
to LCZero
I don't understand this result. The one competing on CCCC is ranked about 3300, based on the overall rankings of other engines.

Greg Mattson

unread,
Sep 15, 2018, 3:02:37 PM9/15/18
to LCZero
daniel,

they restarted the network from scratch. anything prefixed with 2 is part of the new networks.

Matt Blakely

unread,
Sep 15, 2018, 5:23:41 PM9/15/18
to LCZero
No, I wasn't this formal in the past, but I think I'll start a spreadsheet now and maybe post it online for others once I build up a few results

My testing is a bit less scientific perhaps in that I like to vary the opponent and sometimes the time control.  But with enough games it still gives a good indication of strength.
Reply all
Reply to author
Forward
0 new messages