Leela vs Stockfish: comparison @ fixed nodes and depths

Jon Mike

unread,

Aug 22, 2019, 4:05:18 AM8/22/19

to LCZero

How does our current top Leela network (T40B.2-160) scale versus Stockfish10x64_modern at various nodes?

Let's find out!

--------------------

Lc0.T40B.4-160 vs SF10x64.modern

3x9 opening book:

Lc0 @ 1 fixed node per move vs SF10 @ fixed various nodes per move:

Lc0T40B.4-160_1node - SF10x64_modern_1node 55.0 - 0.0 +55/=0/-0 100.00%

Lc0T40B.4-160_1node - SF10x64_modern_10nodes 53.5 - 0.5 +53/=1/-0 99.07%

Lc0T40B.4-160_1node - SF10x64_modern_100nodes 52.0 - 2.0 +52/=0/-2 96.30%

Lc0T40B.4-160_1node - SF10x64_modern_750nodes 48.5 - 4.5 +47/=3/-3 91.51% (Old Leela Ratio, A0 vs SF8)

Lc0T40B.4-160_1node - SF10x64_modern_1Knodes 51.0 - 4.0 +51/=0/-4 92.73%

Lc0T40B.4-160_1node - SF10x64_modern_1.5Knodes 42.0 - 13.0 +40/=4/-11 76.36%

Lc0T40B.4-160_1node - SF10x64_modern_2Knodes 42.0 - 11.0 +41/=2/-10 79.25%

Lc0T40B.4-160_1node - SF10x64_modern_2.5Knodes 40.5 - 13.5 +39/=3/-12 75.00%

Lc0T40B.4-160_1node - SF10x64_modern_3Knodes 33.5 - 21.5 +30/=7/-18 60.91%

Lc0T40B.4-160_1node - SF10x64_modern_3.5Knodes 33.5 - 20.5 +32/=3/-19 62.04%

Lc0T40B.4-160_1node - SF10x64_modern_3.75Knodes 29.0 - 24.0 +26/=6/-21 54.72%

Lc0T40B.4-160_1node - SF10x64_modern_4Knodes 34.0 - 19.0 +29/=10/-14 64.15%

Lc0T40B.4-160_1node - SF10x64_modern_4.25Knodes 29.0 - 25.0 +21/=16/-17 53.70%

Lc0T40B.4-160_1node - SF10x64_modern_4.5Knodes 24.0 - 30.0 +19/=10/-25 44.44%

Lc0T40B.4-160_1node - SF10x64_modern_4.75Knodes 30.0 - 24.0 +24/=12/-18 55.56%

Lc0T40B.4-160_1node - SF10x64_modern_5Knodes 30.5 - 24.5 +26/=9/-20 55.45%

Lc0T40B.4-160_1node - SF10x64_modern_5.25Knodes 24.5 - 29.5 +21/=7/-26 45.37%

Lc0T40B.4-160_1node - SF10x64_modern_5.5Knodes 30.0 - 24.0 +25/=10/-19 55.56%

Lc0T40B.4-160_1node - SF10x64_modern_5.75Knodes 30.0 - 24.0 +24/=12/-18 55.56%

Lc0T40B.4-160_1node - SF10x64_modern_6Knodes 62.5 - 44.5 +51/=23/-33 58.41%

Lc0T40B.4-160_1node - SF10x64_modern_6.25Knodes 17.0 - 37.0 +13/=8/-33 31.48%

Lc0T40B.4-160_1node - SF10x64_modern_6.5Knodes 21.0 - 33.0 +18/=6/-30 38.89%

Lc0T40B.4-160_1node - SF10x64_modern_6.75Knodes 23.0 - 31.0 +19/=8/-27 42.59%

Lc0T40B.4-160_1node - SF10x64_modern_7.25Knodes 19.0 - 35.0 +16/=6/-32 35.19%

Lc0T40B.4-160_1node - SF10x64_modern_7Knodes 32.0 - 23.0 +26/=12/-17 58.18%

Lc0T40B.4-160_1node - SF10x64_modern_7.5Knodes 25.0 - 29.0 +21/=8/-25 46.30%

Lc0T40B.4-160_1node - SF10x64_modern_7.75Knodes 24.5 - 29.5 +19/=11/-24 45.37%

Lc0T40B.4-160_1node - SF10x64_modern_8Knodes 15.0 - 38.0 +13/=4/-36 28.30%

Lc0T40B.4-160_1node - SF10x64_modern_8.5Knodes 18.0 - 36.0 +15/=6/-33 33.33%

Lc0T40B.4-160_1node - SF10x64_modern_9Knodes 16.5 - 37.5 +11/=11/-32 30.56%

Lc0T40B.4-160_1node - SF10x64_modern_9.5Knodes 14.5 - 38.5 +11/=7/-35 27.36%

Lc0T40B.4-160_1node - SF10x64_modern_10Knodes 22.0 - 32.0 +17/=10/-27 40.74%

One year ago, the 1:750 ratio would yield about 50% results with Lc0 vs SF.

SF nodes vs Lc0 @ 1 node for equivalent strength is greater than 1:4,250 and less than 1:6,250

or about 1:4,500 nodes per move (reduces to 1:4500)

Lc0 @ 1 fixed node per move vs SF10 @ fixed various depths per move:

Lc0T40B.4-160_1node - SF10x64_modern_depth=5 48.5 - 5.5 +46/=5/-3 89.81%

Lc0T40B.4-160_1node - SF10x64_modern_depth=6 48.0 - 6.0 +45/=6/-3 88.89%

Lc0T40B.4-160_1node - SF10x64_modern_depth=7 34.0 - 20.0 +32/=4/-18 62.96%

Lc0T40B.4-160_1node - SF10x64_modern_depth=8 25.0 - 29.0 +20/=10/-24 46.30%

Lc0T40B.4-160_1node - SF10x64_modern_depth=9 14.0 - 39.0 +11/=6/-36 26.42%

SF depth needed for equivalent strength vs Lc0 @ 1 node:

greater than depth=7, less than depth =8

--------------------------------

I wonder how the above results compare to Lc0 @ 10 fixed nodes per move?

Lc0 @ 10 fixed node per move vs SF10 @ fixed various nodes per move:

Lc0T40B.4-160_10node - SF10x64_modern_1node 54.0 - 1.0 +53/=2/-0 98.18%

Lc0T40B.4-160_10node - SF10x64_modern_10nodes 54.0 - 0.0 +54/=0/-0 100.00%

Lc0T40B.4-160_10node - SF10x64_modern_100nodes 50.5 - 3.5 +48/=5/-1 93.52%

Lc0T40B.4-160_10node - SF10x64_modern_750nodes 51.0 - 3.0 +50/=2/-2 94.44%

Lc0T40B.4-160_10node - SF10x64_modern_1Knodes 44.0 - 10.0 +41/=6/-7 81.48%

Lc0T40B.4-160_10node - SF10x64_modern_1.5Knodes 45.5 - 8.5 +43/=5/-6 84.26%

Lc0T40B.4-160_10node - SF10x64_modern_2Knodes 39.0 - 15.0 +35/=8/-11 72.22%

Lc0T40B.4-160_10node - SF10x64_modern_2.5Knodes 34.0 - 20.0 +29/=10/-15 62.96%

Lc0T40B.4-160_10node - SF10x64_modern_3Knodes 24.5 - 13.5 +24/=1/-13 64.47%

Lc0T40B.4-160_10node - SF10x64_modern_3.5Knodes 30.0 - 24.0 +27/=6/-21 55.56%

Lc0T40B.4-160_10node - SF10x64_modern_3.75Knodes 35.0 - 19.0 +31/=8/-15 64.81%

Lc0T40B.4-160_10node - SF10x64_modern_4Knodes 31.5 - 22.5 +28/=7/-19 58.33%

Lc0T40B.4-160_10node - SF10x64_modern_4.25Knodes 31.5 - 22.5 +27/=9/-18 58.33%

Lc0T40B.4-160_10node - SF10x64_modern_4.5Knodes 33.0 - 21.0 +30/=6/-18 61.11%

Lc0T40B.4-160_10node - SF10x64_modern_4.75Knodes 32.0 - 22.0 +26/=12/-16 59.26%

Lc0T40B.4-160_10node - SF10x64_modern_5Knodes 23.0 - 31.0 +15/=16/-23 42.59%

Lc0T40B.4-160_10node - SF10x64_modern_5.75Knodes 20.5 - 33.5 +19/=3/-32 37.96%

Lc0T40B.4-160_10node - SF10x64_modern_5.25Knodes 25.5 - 28.5 +18/=15/-21 47.22%

Lc0T40B.4-160_10node - SF10x64_modern_5.5Knodes 19.0 - 35.0 +15/=8/-31 35.19%

Lc0T40B.4-160_10node - SF10x64_modern_6Knodes 25.5 - 28.5 +21/=9/-24 47.22%

Lc0T40B.4-160_10node - SF10x64_modern_6.75Knodes 22.0 - 32.0 +19/=6/-29 40.74%

Lc0T40B.4-160_10node - SF10x64_modern_6.5Knodes 21.0 - 33.0 +17/=8/-29 38.89%

Lc0T40B.4-160_10node - SF10x64_modern_6.25Knodes 25.5 - 28.5 +20/=11/-23 47.22%

Lc0T40B.4-160_10node - SF10x64_modern_7Knodes 18.5 - 35.5 +16/=5/-33 34.26%

Lc0T40B.4-160_10node - SF10x64_modern_7.25Knodes 31.0 - 77.0 +24/=14/-70 28.70%

Lc0T40B.4-160_10node - SF10x64_modern_7.5Knodes 38.5 - 69.5 +33/=11/-64 35.65%

Lc0T40B.4-160_10node - SF10x64_modern_7.75Knodes 24.5 - 29.5 +22/=5/-27 45.37%

Lc0T40B.4-160_10node - SF10x64_modern_8Knodes 32.0 - 76.0 +22/=20/-66 29.63%

Lc0T40B.4-160_10node - SF10x64_modern_8.5Knodes 20.5 - 33.5 +16/=9/-29 37.96%

Lc0T40B.4-160_10node - SF10x64_modern_9Knodes 17.5 - 36.5 +14/=7/-33 32.41%

Lc0T40B.4-160_10node - SF10x64_modern_9.5Knodes 15.0 - 39.0 +12/=6/-36 27.78%

Lc0T40B.4-160_10node - SF10x64_modern_10Knodes 16.5 - 37.5 +13/=7/-34 30.56%

SF nodes vs Lc0 @ 10 nodes for equivalent strength is greater than 10:4,750 and less than 10:5,000

or about 10:4,875 nodes per move (reduces to 1:487)

Lc0 @ 10 fixed node per move vs SF10 @ fixed various depths per move:

Lc0T40B.4-160_10node - SF10x64_modern_depth=5 49.0 - 5.0 +47/=4/-3 90.74%

Lc0T40B.4-160_10node - SF10x64_modern_depth=6 42.0 - 12.0 +37/=10/-7 77.78%

Lc0T40B.4-160_10node - SF10x64_modern_depth=7 38.0 - 16.0 +33/=10/-11 70.37%

Lc0T40B.4-160_10node - SF10x64_modern_depth=8 18.0 - 36.0 +14/=8/-32 33.33%

Lc0T40B.4-160_10node - SF10x64_modern_depth=9 15.5 - 37.5 +10/=11/-32 29.25%

SF depth needed for equivalent strength vs Lc0 @ 10 nodes:

greater than depth=7, less than depth=8

--------------------------------

What about Lc0 @ 100 fixed nodes per move? See below.

Lc0 @ 100 fixed nodes per move vs SF10 @ fixed various nodes per move:

1x9 opening book

Lc0T40B.4-160_100node - SF10x64_modern_1node 18.0 - 0.0 +18/=0/-0 100.00%

Lc0T40B.4-160_100node - SF10x64_modern_10node 18.0 - 0.0 +18/=0/-0 100.00%

Lc0T40B.4-160_100node - SF10x64_modern_100node 18.0 - 0.0 +18/=0/-0 100.00%

Lc0T40B.4-160_100node - SF10x64_modern_750node 17.5 - 0.5 +17/=1/-0 97.22%

Lc0T40B.4-160_100node - SF10x64_modern_1Knode 18.0 - 0.0 +18/=0/-0 100.00%

Lc0T40B.4-160_100node - SF10x64_modern_2Knode 18.0 - 0.0 +18/=0/-0 100.00%

Lc0T40B.4-160_100node - SF10x64_modern_3Knode 18.0 - 0.0 +18/=0/-0 100.00%

Lc0T40B.4-160_100node - SF10x64_modern_4Knode 17.5 - 0.5 +17/=1/-0 97.22%

Lc0T40B.4-160_100node - SF10x64_modern_5Knode 17.5 - 0.5 +17/=1/-0 97.22%

Lc0T40B.4-160_100node - SF10x64_modern_6Knode 16.0 - 2.0 +15/=2/-1 88.89%

Lc0T40B.4-160_100node - SF10x64_modern_7Knode 17.0 - 1.0 +16/=2/-0 94.44%

Lc0T40B.4-160_100node - SF10x64_modern_8Knode 15.5 - 2.5 +14/=3/-1 86.11%

Lc0T40B.4-160_100node - SF10x64_modern_9Knode 18.0 - 0.0 +18/=0/-0 100.00%

Lc0T40B.4-160_100node - SF10x64_modern_10Knode 17.0 - 1.0 +16/=2/-0 94.44%

Lc0T40B.4-160_100node - SF10x64_modern_15Knode 14.5 - 3.5 +13/=3/-2 80.56%

Lc0T40B.4-160_100node - SF10x64_modern_30Knode 13.0 - 4.0 +13/=0/-4 76.47%

Lc0T40B.4-160_100node - SF10x64_modern_40Knode 13.5 - 4.5 +13/=1/-4 75.00%

Lc0T40B.4-160_100node - SF10x64_modern_45Knode 12.0 - 6.0 +9/=6/-3 66.67%

Lc0T40B.4-160_100node - SF10x64_modern_50Knode 11.0 - 7.0 +10/=2/-6 61.11%

Lc0T40B.4-160_100node - SF10x64_modern_55Knode 9.5 - 7.5 +8/=3/-6 55.88%

Lc0T40B.4-160_100node - SF10x64_modern_57.5Knode 8.0 - 9.0 +6/=4/-7 47.06%

Lc0T40B.4-160_100node - SF10x64_modern_60Knode 5.0 - 13.0 +3/=4/-11 27.78%

Lc0T40B.4-160_100node - SF10x64_modern_65Knode 6.0 - 12.0 +5/=2/-11 33.33%

Lc0T40B.4-160_100node - SF10x64_modern_75Knode 8.0 - 10.0 +7/=2/-9 44.44%

Lc0T40B.4-160_100node - SF10x64_modern_100Knode 3.5 - 13.5 +2/=3/-12 20.59%

SF nodes vs Lc0 100 node for equivalent strength is greater than 100:55,000 and less than 100:57,500

or about 100:56,250 nodes per move (reduces to 1:562)

Lc0 @ 100 fixed nodes per move vs SF10 @ fixed various depths per move:

SF depth needed for equivalent strength vs Lc0 @ 100 nodes:

greater than depth=10, less than depth=11

Lc0T40B.4-160_100node - SF10x64_modern_depth7 18.0 - 0.0 +18/=0/-0 100.00%

Lc0T40B.4-160_100node - SF10x64_modern_depth8 16.5 - 1.5 +16/=1/-1 91.67%

Lc0T40B.4-160_100node - SF10x64_modern_depth9 12.5 - 4.5 +11/=3/-3 73.53%

Lc0T40B.4-160_100node - SF10x64_modern_depth10 12.5 - 4.5 +11/=3/-3 73.53%

Lc0T40B.4-160_100node - SF10x64_modern_depth11 7.0 - 11.0 +5/=4/-9 38.89%

Lc0T40B.4-160_100node - SF10x64_modern_depth12 7.5 - 10.5 +4/=7/-7 41.67%

Lc0T40B.4-160_100node - SF10x64_modern_depth13 7.0 - 10.0 +3/=8/-6 41.18%

--------------------------------

CONCLUSIONS:

Lc0 vs SF10 equivalency ratios:

1 node (policy head) = 1:4,500 nodes per move, or SF depth of 7 to 8

10 nodes = 1:487 nodes per move, or SF depth of 7 to 8

100 nodes = 1:562 nodes per move, or SF depth of 10 to 11

Lc0 at a single node (policy head) is almost 10x stronger than any other per node comparison!

Jon Mike

unread,

Aug 22, 2019, 4:06:56 AM8/22/19

to LCZero

Correction:

How does our current top Leela network (T40B.2-160) scale versus Stockfish10x64_modern at various nodes?

Should be (T40.B4-160)

On Thursday, August 22, 2019 at 3:05:18 AM UTC-5, Jon Mike wrote:

...

Zlatko Hulama

unread,

Aug 22, 2019, 11:09:02 AM8/22/19

to LCZero

I find it amazing that 1 node leela is actually stronger than 10 node leela against stockfish; maybe there's some kind of bad search going on by doing 10 nodes?

Hypothetical:

It does 7 nodes on the best move and win probability for that move drops lower than second and third best moves

Then it uses the last 3 nodes on second best move and output of that move also drops lower than third best move

After 10 of those nodes, it plays third best move even though a couple of more nodes would show it's actually the worst move possible.

Maybe a different strategy should be used for such low node numbers, like:

Check 1 node, check 3 top replies for that node and check 2 more nodes after each of those 3 replies (6 more nodes) for a total of 10 nodes.

... who can program this?:D

Jon Mike

unread,

Aug 22, 2019, 1:15:33 PM8/22/19

to LCZero

I find it amazing as well!

The policy head (1 node search) is like the intuition of a network.

The 1 node search is the product of all the networks experiences.

It is also the strongest correlated reflection of the network's relative strength.

Here is something else amazing which I posted on another thread:

The matchup is 11248 vs 42850 @ 1,10,100,1K,10K, 100K and 1Million nodes/move.

Starting with 1.e4 e5, the same move is at the PV for 1,10,100,1K,10K, 100K and 1Million nodes/move for 28 half-moves!

1. e4 e5 2. Nf3 Nc6 3. Bb5 Nf6 4. O-O Nxe4 5. Re1 Nd6 6. Nxe5

Be7 7. Bf1 Nxe5 8. Rxe5 O-O 9. d4 Bf6 10. Re1 Re8 11. c3 c6 12. Bf4 Rxe1 13.

Qxe1 Ne8 14. Nd2 d5

Once again the first 28 half moves @ 1 Million nodes per move is exactly the same as at 1 node per move.

If Leela could see tactics, I think her 1 node PV would be as strong as 1 million nodes.

On Thursday, August 22, 2019 at 10:09:02 AM UTC-5, Zlatko Hulama wrote:

...

Adam Kirby

unread,

Aug 23, 2019, 9:29:59 AM8/23/19

to LCZero

What are you using to test engines with different fixed node settings? I can't figure out how to do that in Arena.

Kevin Kirkpatrick

unread,

Aug 23, 2019, 12:56:37 PM8/23/19

to LCZero

Per these results, Leela's search algorithm does seem utterly useless up to 10 nodes (statistically, performing no better or worse than 1 node). I've got a theory as to why this might be, but first... can you verify that you are using options which preclude multi-threading from obscuring the results? Are you using options along the lines of:

--minibatch-size=1

--no-out-of-order-eval

--threads=1

--max-collision-visits=1

--max-collision-events=1

--max-prefetch=0

Anyway, assuming everything is properly configured, my hunch is that the poor scaling at low node play is caused by First-Play Urgency (FPU). Per the lc0 defaults, the "reduction" FPU strategy is used with value of 1.2. With "reduction", a positive FPU value causes Leela to initialize each unvisited node to a Q value worse than that of the parent node. The larger the value (and 1.2 in this context is pretty big) the worse Leela will assume these unvisited nodes are (and the less inclined she will be to visit them). Leela's default FPU has been tuned for high-node-count competition. However, while it may lead to strong play at higher node counts, I believe the default FPU could greatly impede Leela's play at lower node counts.

A value of 1.2 for FPU=reduction is so high that even if the top two moves had identical policies, whichever of the two is visited by chance first - as long as it isn't found to be a 1-move blunder with Q value much lower than the parent - will almost certainly be 2 or 3 times before its "twin" move gets its first visit. This anti-exploration effect of FPU does diminish after a few dozen visits; but would still be hugely significant at 10-node play. Consider: for starting position, the latest T40 net gives 1.e4 a policy of ~0.3 and d4 a policy of ~0.1. This (loosely) means that Leela's policy head predicts a ~3:1 ratio in visits to e4 vs d4 in an 800 node search. This proves to be fairly accurate; after 800 visits, the ratio is actually about 5:1 (a bit higher than 3:1 because the Q value after e4 winds up being "slightly moreso better" than Q value of d4). In this light, at 10 nodes, one might predict 3 or 4 or possibly even 5 visits to e4 before a first glance at d4. But thanks to FPU=1.2, d4 is assigned an initial Q value so low that Leela actually visits e4 12 TIMES before visiting d4 once.

I suspect that until node count gets into the 20's or 30's, for the majority of positions, this FPU setting probably locks Leela into making whatever move has the highest policy... effectively causing Leela to play no different than node=1. I'd be interested to see the results of repeating the 10-node experiment, but running it with FPU-value of 0.2 (rather than 1.2). For reference, with FPU=0.2, d4 is visited at N=5, not N=13.

p1.JPG

Jon Mike

unread,

Aug 23, 2019, 11:39:09 PM8/23/19

to LCZero

Kevin,

Thanks for the thoughtful reply.

I too have been pondering the unexpected results of 1 versus 10 node. I was figuring since 10 and 100 nodes per move shared close ratios, that the policy head was just an outlier. For example, I am under the assumption that the ratio of equivalency is roughly 1:500 at all tested and untested numbers of nodes per move -except for 1. I was assuming the policy head is a altogether different beast (being 10x stronger) than the actual search algorithm at any other number of nodes. It's very intriguing how much strength is embedded into the policy head!

As far as my lc0.exe options, I believe I am using all default values.

Here is the code I am using to retrieve lc0.exe:

!rm -rf lc0

!git clone --recurse-submodules https://github.com/LeelaChessZero/lc0.git

!cd lc0 && git checkout $(git tag --list |grep -v rc |tail -1)

!cd lc0 && rm -rf build

!cd lc0 && meson build --buildtype release -Db_lto=true -Dgtest=false

!cd lc0/build && ninja

Then I just run it vanilla with cutechess-cli.

Once the policy head championship is over (almost there), I do plan on experimenting the strongest PH nets to find the best parameter combinations at 1 node, rather than optimizing for 10 nodes. I normally don't mess around with the settings much, so my knowledge is basic in that area, do you have suggestions on how I could optimize params for 1 node strength (policy head performance)?