Leela vs Stockfish: comparison @ fixed nodes and depths

860 views
Skip to first unread message

Jon Mike

unread,
Aug 22, 2019, 4:05:18 AM8/22/19
to LCZero
How does our current top Leela network (T40B.2-160) scale versus Stockfish10x64_modern at various nodes?  
Let's find out!

--------------------

Lc0.T40B.4-160 vs SF10x64.modern
3x9 opening book:

Lc0 @ 1 fixed node per move vs SF10 @  fixed various nodes per move:

Lc0T40B.4-160_1node   - SF10x64_modern_1node              55.0 - 0.0    +55/=0/-0     100.00%
Lc0T40B.4-160_1node   - SF10x64_modern_10nodes          53.5 - 0.5    +53/=1/-0       99.07%
Lc0T40B.4-160_1node   - SF10x64_modern_100nodes        52.0 - 2.0    +52/=0/-2        96.30%
Lc0T40B.4-160_1node  - SF10x64_modern_750nodes      48.5 - 4.5    +47/=3/-3        91.51% (Old Leela Ratio, A0 vs SF8)
Lc0T40B.4-160_1node   - SF10x64_modern_1Knodes          51.0 - 4.0    +51/=0/-4        92.73%
Lc0T40B.4-160_1node   - SF10x64_modern_1.5Knodes       42.0 - 13.0    +40/=4/-11    76.36%
Lc0T40B.4-160_1node   - SF10x64_modern_2Knodes          42.0 - 11.0    +41/=2/-10    79.25%
Lc0T40B.4-160_1node   - SF10x64_modern_2.5Knodes       40.5 - 13.5    +39/=3/-12    75.00%
Lc0T40B.4-160_1node   - SF10x64_modern_3Knodes          33.5 - 21.5    +30/=7/-18    60.91%
Lc0T40B.4-160_1node   - SF10x64_modern_3.5Knodes       33.5 - 20.5    +32/=3/-19    62.04%
Lc0T40B.4-160_1node   - SF10x64_modern_3.75Knodes     29.0 - 24.0    +26/=6/-21    54.72%
Lc0T40B.4-160_1node   - SF10x64_modern_4Knodes         34.0 - 19.0    +29/=10/-14   64.15%
Lc0T40B.4-160_1node   - SF10x64_modern_4.25Knodes     29.0 - 25.0    +21/=16/-17    53.70%
Lc0T40B.4-160_1node   - SF10x64_modern_4.5Knodes      24.0 - 30.0    +19/=10/-25   44.44%
Lc0T40B.4-160_1node   - SF10x64_modern_4.75Knodes     30.0 - 24.0    +24/=12/-18    55.56%
Lc0T40B.4-160_1node   - SF10x64_modern_5Knodes         30.5 - 24.5    +26/=9/-20     55.45%
Lc0T40B.4-160_1node   - SF10x64_modern_5.25Knodes     24.5 - 29.5    +21/=7/-26     45.37%
Lc0T40B.4-160_1node   - SF10x64_modern_5.5Knodes      30.0 - 24.0    +25/=10/-19    55.56%
Lc0T40B.4-160_1node   - SF10x64_modern_5.75Knodes     30.0 - 24.0    +24/=12/-18    55.56%
Lc0T40B.4-160_1node   - SF10x64_modern_6Knodes          62.5 - 44.5    +51/=23/-33    58.41%
Lc0T40B.4-160_1node   - SF10x64_modern_6.25Knodes     17.0 - 37.0    +13/=8/-33     31.48%
Lc0T40B.4-160_1node   - SF10x64_modern_6.5Knodes       21.0 - 33.0    +18/=6/-30      38.89%
Lc0T40B.4-160_1node   - SF10x64_modern_6.75Knodes      23.0 - 31.0    +19/=8/-27     42.59%
Lc0T40B.4-160_1node   - SF10x64_modern_7.25Knodes     19.0 - 35.0    +16/=6/-32      35.19%
Lc0T40B.4-160_1node   - SF10x64_modern_7Knodes          32.0 - 23.0    +26/=12/-17    58.18%
Lc0T40B.4-160_1node   - SF10x64_modern_7.5Knodes       25.0 - 29.0    +21/=8/-25      46.30%
Lc0T40B.4-160_1node   - SF10x64_modern_7.75Knodes     24.5 - 29.5    +19/=11/-24    45.37%
Lc0T40B.4-160_1node   - SF10x64_modern_8Knodes          15.0 - 38.0    +13/=4/-36      28.30%
Lc0T40B.4-160_1node   - SF10x64_modern_8.5Knodes      18.0 - 36.0    +15/=6/-33      33.33%
Lc0T40B.4-160_1node   - SF10x64_modern_9Knodes          16.5 - 37.5    +11/=11/-32    30.56%
Lc0T40B.4-160_1node   - SF10x64_modern_9.5Knodes      14.5 - 38.5    +11/=7/-35       27.36%
Lc0T40B.4-160_1node   - SF10x64_modern_10Knodes        22.0 - 32.0    +17/=10/-27    40.74%

One year ago, the 1:750 ratio would yield about 50% results with Lc0 vs SF.

SF nodes vs Lc0 @ 1 node for equivalent strength is greater than 1:4,250 and less than 1:6,250
or about 1:4,500 nodes per move (reduces to 1:4500)

Lc0 @ 1 fixed node per move vs SF10 @  fixed various depths per move:

Lc0T40B.4-160_1node   - SF10x64_modern_depth=5        48.5 - 5.5    +46/=5/-3        89.81%
Lc0T40B.4-160_1node   - SF10x64_modern_depth=6        48.0 - 6.0    +45/=6/-3        88.89%
Lc0T40B.4-160_1node   - SF10x64_modern_depth=7        34.0 - 20.0    +32/=4/-18    62.96%
Lc0T40B.4-160_1node   - SF10x64_modern_depth=8        25.0 - 29.0    +20/=10/-24  46.30%
Lc0T40B.4-160_1node   - SF10x64_modern_depth=9        14.0 - 39.0    +11/=6/-36     26.42%

SF depth needed for equivalent strength vs Lc0 @ 1 node:
greater than depth=7, less than depth =8

--------------------------------

I wonder how the above results compare to Lc0 @ 10 fixed nodes per move?

Lc0 @ 10 fixed node per move vs SF10 @  fixed various nodes per move:

Lc0T40B.4-160_10node   - SF10x64_modern_1node        54.0 - 1.0    +53/=2/-0    98.18%
Lc0T40B.4-160_10node   - SF10x64_modern_10nodes      54.0 - 0.0    +54/=0/-0    100.00%
Lc0T40B.4-160_10node   - SF10x64_modern_100nodes     50.5 - 3.5    +48/=5/-1    93.52%
Lc0T40B.4-160_10node   - SF10x64_modern_750nodes     51.0 - 3.0    +50/=2/-2    94.44%
Lc0T40B.4-160_10node   - SF10x64_modern_1Knodes      44.0 - 10.0    +41/=6/-7    81.48%
Lc0T40B.4-160_10node   - SF10x64_modern_1.5Knodes      45.5 - 8.5    +43/=5/-6    84.26%
Lc0T40B.4-160_10node   - SF10x64_modern_2Knodes      39.0 - 15.0    +35/=8/-11    72.22%
Lc0T40B.4-160_10node   - SF10x64_modern_2.5Knodes      34.0 - 20.0    +29/=10/-15    62.96%
Lc0T40B.4-160_10node   - SF10x64_modern_3Knodes      24.5 - 13.5    +24/=1/-13    64.47%
Lc0T40B.4-160_10node   - SF10x64_modern_3.5Knodes      30.0 - 24.0    +27/=6/-21    55.56%
Lc0T40B.4-160_10node   - SF10x64_modern_3.75Knodes     35.0 - 19.0    +31/=8/-15    64.81%
Lc0T40B.4-160_10node   - SF10x64_modern_4Knodes        31.5 - 22.5    +28/=7/-19    58.33%
Lc0T40B.4-160_10node   - SF10x64_modern_4.25Knodes     31.5 - 22.5    +27/=9/-18    58.33%
Lc0T40B.4-160_10node   - SF10x64_modern_4.5Knodes      33.0 - 21.0    +30/=6/-18    61.11%
Lc0T40B.4-160_10node   - SF10x64_modern_4.75Knodes     32.0 - 22.0    +26/=12/-16    59.26%
Lc0T40B.4-160_10node   - SF10x64_modern_5Knodes        23.0 - 31.0    +15/=16/-23    42.59%
Lc0T40B.4-160_10node   - SF10x64_modern_5.75Knodes     20.5 - 33.5    +19/=3/-32    37.96%
Lc0T40B.4-160_10node   - SF10x64_modern_5.25Knodes     25.5 - 28.5    +18/=15/-21    47.22%
Lc0T40B.4-160_10node   - SF10x64_modern_5.5Knodes      19.0 - 35.0    +15/=8/-31    35.19%
Lc0T40B.4-160_10node   - SF10x64_modern_6Knodes        25.5 - 28.5    +21/=9/-24    47.22%
Lc0T40B.4-160_10node   - SF10x64_modern_6.75Knodes     22.0 - 32.0    +19/=6/-29    40.74%
Lc0T40B.4-160_10node   - SF10x64_modern_6.5Knodes      21.0 - 33.0    +17/=8/-29    38.89%
Lc0T40B.4-160_10node   - SF10x64_modern_6.25Knodes     25.5 - 28.5    +20/=11/-23    47.22%
Lc0T40B.4-160_10node   - SF10x64_modern_7Knodes        18.5 - 35.5    +16/=5/-33    34.26%
Lc0T40B.4-160_10node   - SF10x64_modern_7.25Knodes     31.0 - 77.0    +24/=14/-70    28.70%
Lc0T40B.4-160_10node   - SF10x64_modern_7.5Knodes      38.5 - 69.5    +33/=11/-64    35.65%
Lc0T40B.4-160_10node   - SF10x64_modern_7.75Knodes     24.5 - 29.5    +22/=5/-27    45.37%
Lc0T40B.4-160_10node   - SF10x64_modern_8Knodes        32.0 - 76.0    +22/=20/-66    29.63%
Lc0T40B.4-160_10node   - SF10x64_modern_8.5Knodes      20.5 - 33.5    +16/=9/-29    37.96%
Lc0T40B.4-160_10node   - SF10x64_modern_9Knodes        17.5 - 36.5    +14/=7/-33    32.41%
Lc0T40B.4-160_10node   - SF10x64_modern_9.5Knodes      15.0 - 39.0    +12/=6/-36    27.78%
Lc0T40B.4-160_10node   - SF10x64_modern_10Knodes       16.5 - 37.5    +13/=7/-34    30.56%

SF nodes vs Lc0 @ 10 nodes for equivalent strength is greater than 10:4,750 and less than 10:5,000 
or about 10:4,875 nodes per move (reduces to 1:487)

Lc0 @ 10 fixed node per move vs SF10 @  fixed various depths per move:

Lc0T40B.4-160_10node   - SF10x64_modern_depth=5        49.0 - 5.0    +47/=4/-3    90.74%
Lc0T40B.4-160_10node   - SF10x64_modern_depth=6        42.0 - 12.0    +37/=10/-7    77.78%
Lc0T40B.4-160_10node   - SF10x64_modern_depth=7        38.0 - 16.0    +33/=10/-11    70.37%
Lc0T40B.4-160_10node   - SF10x64_modern_depth=8        18.0 - 36.0    +14/=8/-32    33.33%
Lc0T40B.4-160_10node   - SF10x64_modern_depth=9        15.5 - 37.5    +10/=11/-32    29.25%

SF depth needed for equivalent strength vs Lc0 @ 10 nodes:
greater than depth=7, less than depth=8

--------------------------------

What about Lc0 @ 100 fixed nodes per move?  See below.

Lc0 @ 100 fixed nodes per move vs SF10 @  fixed various nodes per move:
1x9 opening book

Lc0T40B.4-160_100node   - SF10x64_modern_1node        18.0 - 0.0    +18/=0/-0    100.00%
Lc0T40B.4-160_100node   - SF10x64_modern_10node      18.0 - 0.0    +18/=0/-0    100.00%
Lc0T40B.4-160_100node   - SF10x64_modern_100node    18.0 - 0.0    +18/=0/-0    100.00%
Lc0T40B.4-160_100node   - SF10x64_modern_750node    17.5 - 0.5    +17/=1/-0    97.22%
Lc0T40B.4-160_100node   - SF10x64_modern_1Knode     18.0 - 0.0    +18/=0/-0    100.00%
Lc0T40B.4-160_100node   - SF10x64_modern_2Knode     18.0 - 0.0    +18/=0/-0    100.00%
Lc0T40B.4-160_100node   - SF10x64_modern_3Knode     18.0 - 0.0    +18/=0/-0    100.00%
Lc0T40B.4-160_100node   - SF10x64_modern_4Knode     17.5 - 0.5    +17/=1/-0    97.22%
Lc0T40B.4-160_100node   - SF10x64_modern_5Knode     17.5 - 0.5    +17/=1/-0    97.22%
Lc0T40B.4-160_100node   - SF10x64_modern_6Knode     16.0 - 2.0    +15/=2/-1    88.89%
Lc0T40B.4-160_100node   - SF10x64_modern_7Knode     17.0 - 1.0    +16/=2/-0    94.44%
Lc0T40B.4-160_100node   - SF10x64_modern_8Knode     15.5 - 2.5    +14/=3/-1    86.11%
Lc0T40B.4-160_100node   - SF10x64_modern_9Knode     18.0 - 0.0    +18/=0/-0    100.00%
Lc0T40B.4-160_100node   - SF10x64_modern_10Knode     17.0 - 1.0    +16/=2/-0    94.44%
Lc0T40B.4-160_100node   - SF10x64_modern_15Knode     14.5 - 3.5    +13/=3/-2    80.56%
Lc0T40B.4-160_100node   - SF10x64_modern_30Knode     13.0 - 4.0    +13/=0/-4    76.47%
Lc0T40B.4-160_100node   - SF10x64_modern_40Knode     13.5 - 4.5    +13/=1/-4    75.00%
Lc0T40B.4-160_100node   - SF10x64_modern_45Knode     12.0 - 6.0    +9/=6/-3      66.67%
Lc0T40B.4-160_100node   - SF10x64_modern_50Knode     11.0 - 7.0    +10/=2/-6    61.11%
Lc0T40B.4-160_100node   - SF10x64_modern_55Knode      9.5 - 7.5     +8/=3/-6      55.88%
Lc0T40B.4-160_100node   - SF10x64_modern_57.5Knode    8.0 - 9.0    +6/=4/-7      47.06%
Lc0T40B.4-160_100node   - SF10x64_modern_60Knode       5.0 - 13.0    +3/=4/-11    27.78%
Lc0T40B.4-160_100node   - SF10x64_modern_65Knode       6.0 - 12.0    +5/=2/-11    33.33%
Lc0T40B.4-160_100node   - SF10x64_modern_75Knode       8.0 - 10.0    +7/=2/-9      44.44%
Lc0T40B.4-160_100node   - SF10x64_modern_100Knode     3.5 - 13.5    +2/=3/-12    20.59%

SF nodes vs Lc0 100 node for equivalent strength is greater than 100:55,000 and less than 100:57,500 
or about 100:56,250 nodes per move (reduces to 1:562)

Lc0 @ 100 fixed nodes per move vs SF10 @  fixed various depths per move:

SF depth needed for equivalent strength vs Lc0 @ 100 nodes:
greater than depth=10, less than depth=11

Lc0T40B.4-160_100node   - SF10x64_modern_depth7         18.0 - 0.0    +18/=0/-0    100.00%
Lc0T40B.4-160_100node   - SF10x64_modern_depth8         16.5 - 1.5    +16/=1/-1     91.67%
Lc0T40B.4-160_100node   - SF10x64_modern_depth9         12.5 - 4.5    +11/=3/-3     73.53%
Lc0T40B.4-160_100node   - SF10x64_modern_depth10       12.5 - 4.5    +11/=3/-3     73.53%
Lc0T40B.4-160_100node   - SF10x64_modern_depth11        7.0 - 11.0    +5/=4/-9      38.89%
Lc0T40B.4-160_100node   - SF10x64_modern_depth12        7.5 - 10.5    +4/=7/-7      41.67%
Lc0T40B.4-160_100node   - SF10x64_modern_depth13        7.0 - 10.0    +3/=8/-6      41.18%

--------------------------------

CONCLUSIONS:

Lc0 vs SF10 equivalency ratios:
1 node (policy head) = 1:4,500 nodes per move, or SF depth of 7 to 8
10 nodes = 1:487 nodes per move, or SF depth of 7 to 8
100 nodes = 1:562 nodes per move, or SF depth of 10 to 11

Lc0 at a single node (policy head) is almost 10x stronger than any other per node comparison!

Jon Mike

unread,
Aug 22, 2019, 4:06:56 AM8/22/19
to LCZero
Correction: 
    • How does our current top Leela network (T40B.2-160) scale versus Stockfish10x64_modern at various nodes?  
    • Should be (T40.B4-160)

    On Thursday, August 22, 2019 at 3:05:18 AM UTC-5, Jon Mike wrote:
    ...

    Zlatko Hulama

    unread,
    Aug 22, 2019, 11:09:02 AM8/22/19
    to LCZero
    I find it amazing that 1 node leela is actually stronger than 10 node leela against stockfish; maybe there's some kind of bad search going on by doing 10 nodes?

    Hypothetical:
    It does 7 nodes on the best move and win probability for that move drops lower than second and third best moves
    Then it uses the last 3 nodes on second best move and output of that move also drops lower than third best move
    After 10 of those nodes, it plays third best move even though a couple of more nodes would show it's actually the worst move possible.

    Maybe a different strategy should be used for such low node numbers, like:
    Check 1 node, check 3 top replies for that node and check 2 more nodes after each of those 3 replies (6 more nodes) for a total of 10 nodes.

    ... who can program this?:D

    Jon Mike

    unread,
    Aug 22, 2019, 1:15:33 PM8/22/19
    to LCZero
    I find it amazing as well!
    The policy head (1 node search) is like the intuition of a network.  
    The 1 node search is the product of all the networks experiences.
    It is also the strongest correlated reflection of the network's relative strength.

    Here is something else amazing which I posted on another thread:

    The matchup is 11248 vs 42850 @ 1,10,100,1K,10K, 100K and 1Million nodes/move.
    Starting with 1.e4 e5, the same move is at the PV for 1,10,100,1K,10K, 100K and 1Million nodes/move for 28 half-moves!

    1. e4 e5 2. Nf3 Nc6 3. Bb5 Nf6 4. O-O Nxe4 5. Re1 Nd6 6. Nxe5
    Be7 7. Bf1 Nxe5 8. Rxe5 O-O 9. d4 Bf6 10. Re1 Re8 11. c3 c6 12. Bf4 Rxe1 13.
    Qxe1 Ne8 14. Nd2 d5 

    Once again the first 28 half moves @ 1 Million nodes per move is exactly the same as at 1 node per move.

    If Leela could see tactics, I think her 1 node PV would be as strong as 1 million nodes.


    On Thursday, August 22, 2019 at 10:09:02 AM UTC-5, Zlatko Hulama wrote:
    ...

    Adam Kirby

    unread,
    Aug 23, 2019, 9:29:59 AM8/23/19
    to LCZero
    What are you using to test engines with different fixed node settings?  I can't figure out how to do that in Arena.

    Kevin Kirkpatrick

    unread,
    Aug 23, 2019, 12:56:37 PM8/23/19
    to LCZero
    Per these results, Leela's search algorithm does seem utterly useless up to 10 nodes (statistically, performing no better or worse than 1 node).  I've got a theory as to why this might be, but first... can you verify that you are using options which preclude multi-threading from obscuring the results?  Are you using options along the lines of:
    --minibatch-size=1
    --no-out-of-order-eval
    --threads=1
    --max-collision-visits=1
    --max-collision-events=1
    --max-prefetch=0

    Anyway, assuming everything is properly configured, my hunch is that the poor scaling at low node play is caused by First-Play Urgency (FPU).  Per the lc0 defaults, the "reduction" FPU strategy is used with value of 1.2.  With "reduction", a positive FPU value causes Leela to initialize each unvisited node to a Q value worse than that of the parent node.  The larger the value (and 1.2 in this context is pretty big) the worse Leela will assume these unvisited nodes are (and the less inclined she will be to visit them).  Leela's default FPU has been tuned for high-node-count competition.  However, while it may lead to strong play at higher node counts, I believe the default FPU could greatly impede Leela's play at lower node counts. 

    A value of 1.2 for FPU=reduction is so high that even if the top two moves had identical policies, whichever of the two is visited by chance first - as long as it isn't found to be a 1-move blunder with Q value much lower than the parent - will almost certainly be 2 or 3 times before its "twin" move gets its first visit.  This anti-exploration effect of FPU does diminish after a few dozen visits; but would still be hugely significant at 10-node play. Consider: for starting position, the latest T40 net gives 1.e4 a policy of ~0.3 and d4 a policy of ~0.1.  This (loosely) means that Leela's policy head predicts a ~3:1 ratio in visits to e4 vs d4 in an 800 node search.  This proves to be fairly accurate; after 800 visits, the ratio is actually about 5:1 (a bit higher than 3:1 because the Q value after e4 winds up being "slightly moreso better" than Q value of d4).  In this light, at 10 nodes, one might predict 3 or 4 or possibly even 5 visits to e4 before a first glance at d4.  But thanks to FPU=1.2, d4 is assigned an initial Q value so low that Leela actually visits e4 12 TIMES before  visiting d4 once.  

    I suspect that until node count gets into the 20's or 30's, for the majority of positions, this FPU setting probably locks Leela into making whatever move has the highest policy... effectively causing Leela to play no different than node=1.  I'd be interested to see the results of repeating the 10-node experiment, but running it with FPU-value of 0.2 (rather than 1.2).  For reference, with FPU=0.2, d4 is visited at N=5, not N=13.  
    p1.JPG

    Jon Mike

    unread,
    Aug 23, 2019, 11:39:09 PM8/23/19
    to LCZero
    Kevin,
    Thanks for the thoughtful reply.  

    I too have been pondering the unexpected results of 1 versus 10 node.  I was figuring since 10 and 100 nodes per move shared close ratios, that the policy head was just an outlier.  For example, I am under the assumption that the ratio of equivalency is roughly 1:500 at all tested and untested numbers of nodes per move -except for 1.  I was assuming the policy head is a altogether different beast (being 10x stronger) than the actual search algorithm at any other number of nodes.  It's very intriguing how much strength is embedded into the policy head!

    As far as my lc0.exe options, I believe I am using all default values.  
    Here is the code I am using to retrieve lc0.exe:

    !rm -rf lc0
    !git clone --recurse-submodules https://github.com/LeelaChessZero/lc0.git
    !cd lc0 && git checkout $(git tag --list |grep -v rc |tail -1)
    !cd lc0 && rm -rf build
    !cd lc0 && meson build --buildtype release -Db_lto=true -Dgtest=false
    !cd lc0/build && ninja

    Then I just run it vanilla with cutechess-cli.

    Once the policy head championship is over (almost there), I do plan on experimenting the strongest PH nets to find the best parameter combinations at 1 node, rather than optimizing for 10 nodes.  I normally don't mess around with the settings much, so my knowledge is basic in that area, do you have suggestions on how I could optimize params for 1 node strength (policy head performance)? 

     


    On Friday, August 23, 2019 at 11:56:37 AM UTC-5, Kevin Kirkpatrick wrote:
    ...
    Reply all
    Reply to author
    Forward
    0 new messages