400,000 Nodes Per Move Scaling ID 10480 [Leela's Limit Reached]

Cscuile

unread,

Aug 22, 2018, 12:42:47 PM8/22/18

to LCZero

The second point of diminishing returns has been reached! For a 20x256 network, it seems above 400,000 nodes ELO growth diminishes heavily (1-7 ELO per doubling NPM)

https://docs.google.com/spreadsheets/d/1ZAIuHR6n-5JTxKQc0XUSx1jyUrgVEcj8DNLKA7-urBw/edit?ts=5b7d9014#gid=1374094180

Kostya M

unread,

Aug 22, 2018, 3:19:21 PM8/22/18

to LCZero

if 400K nodes is max, why it play badly in tcec? it mostly generate >800K nodes, so it should play in full strength?

pw31

unread,

Aug 22, 2018, 3:31:50 PM8/22/18

to LCZero

That's a kind of worrying result, isn't it? The trend you have observed seems to be confirmed by this test:

https://docs.google.com/spreadsheets/d/18UWR4FVhPi0vNwwPreu_avd9ycujGQ5ayR2LzJOWP4s/htmlview?sle=true#

see "scaling" tab. Although just tested until 75000 nodes, it's author says that lc0 shows that saturation effect more

that lczero used to.

Cscuile

unread,

Aug 22, 2018, 5:25:42 PM8/22/18

to LCZero

Hopefully for 800k we will see a higher ELO difference.

Robert Clark

unread,

Aug 22, 2018, 5:39:27 PM8/22/18

to joc...@gmail.com, LCZero

Maybe the reason for lc0 showing saturation effect earlier is that its evaluation is stronger. Maybe if your evaluation is REALLY good, the benefit of going even farther down the tree is reduced. If that is true, it's good news, because it would tend to reduce the hardware requirements for really good performance.

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/a2697c29-f959-4b4b-9f76-50b9350a27b6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ovi...@gmail.com

unread,

Aug 22, 2018, 5:59:39 PM8/22/18

to LCZero

Thank you for these tests. Are really interesting!
Some questions,
(a) I know in lc0 there was a lot of effort in including aggresive pruning (if you do not have enough time, you have to cut the search tree). How could affect rhis to results (if you let a free search maybe you find better moves)
(b) How this compares with Alphazerochess. They played at 80knps and 1 minute per move. That is about 5M nodes in total. There was no elo gaining thinking so long?
(c) How compare with a/b engines?. They sature at about the same amount of thinking time?

Cscuile

unread,

Aug 22, 2018, 6:30:39 PM8/22/18

to LCZero

Indeed! =D

Cscuile

unread,

Aug 22, 2018, 6:36:21 PM8/22/18

to LCZero

1. I'm sorry I'm not sure how to test that in arena.

2. Perhaps Leela at 2 million+ nodes will reach a point where she starts to gain more ELO again. A point of increased growth?
3. I've wanted to test this with Stockfish 9. However Arena has a bug where if you test past 50M nodes per move it glitches out and calculates forever. I've been told Cutechess Cli can fix this problem for me but I haven't been able to set it up properly. If someone can point me in the right direction I will gladly test Stockfish 9 Scaling! =D

Graham Jones

unread,

Aug 23, 2018, 2:59:03 AM8/23/18

to LCZero

See Fig 2 in the A0 paper. https://arxiv.org/pdf/1712.01815.pdf. A0 continued to improve until about 5M at least. In the graph, the times marked on the x-axis correspond to 8k, 80, 800k nodes.

Rattlestone

unread,

Aug 23, 2018, 3:22:06 AM8/23/18

to LCZero

I wonder if there's still progress on tactics tests with deeper searches.

pwa128

unread,

Aug 23, 2018, 6:40:15 AM8/23/18

to LCZero

Excellent test and I have to say I find this a most surprising result as the search does deepen appreciably between 200k and 400k.

It would be lovely to think that Leela had such a brilliant eval that it did not need to search much but we are clearly a long way from that point assuming it could be reached at all.

Perhaps the result will improve with Lc0 v0.17 in which the pruning has been changed.

Nico van Dijk

unread,

Aug 23, 2018, 7:04:41 AM8/23/18

to LCZero

So how is the pruning now, and how will it change? I was just aware of the MCTS search. The pruning may also affect the training if it is also used then right?

pwa128

unread,

Aug 23, 2018, 7:30:23 AM8/23/18

to LCZero

My understandfing (a bit sketchy) is that The MTCS search does not go through to the end of the game but get chopped off in branches if the network returns a severe enough score.

This is from the release notes for v0.17-rc1

Old smart pruning flag is gone. Instead there is --futile-search-aversion flag.
--futile-search-aversion=0 is equivalent to old --no-smart-pruning.
--futile-search-aversion=1 is equivalent to old --smart-pruning.
Now default is 1.47, which means that engine will sometimes decide to stop search earlier even when there is theoretical chance (but not very probable) that best move decision could be changed if allowed to think more.

Having read this again I realise it could mean two things and I don't know which it does mean. It could mean branches get pruned more often with the default setting or it could mean the whole search gets stopped.

Graham Jones

unread,

Aug 23, 2018, 8:54:44 AM8/23/18

to LCZero

I think it means the whole search gets stopped, but I also think it is a time-control thing, which shouldn't affect Csuile's node-count-based tests.

Cscuile

unread,

Aug 23, 2018, 2:41:13 PM8/23/18

to LCZero

Graham,

Figure 2 shows A0's ELO strength up to 10 seconds only. With 4 TPUs this will be 800,000 Nodes per Move. From my tests Leela is far stronger than Stockfish at very short time controls, however given longer thinking time and more nodes, SF outscales a 20 block Leela ID 480 (Presumption from other scaling tests)

Graham Jones

unread,

Aug 23, 2018, 3:44:00 PM8/23/18

to LCZero

The last time marked on the x axis is 10 seconds, but the graph goes well beyond that, and it is logarithmic. I reckon the right end of the graph is close to 5M nodes per move.

Anyway, it seems from your tests (and your last comment) that the 20x256 nets withLC0 are behaving quite differently from both A0 and the 15x192 nets with LeelaZero. That's interesting and may be important. I'm puzzled.

Cscuile

unread,

Aug 23, 2018, 5:54:16 PM8/23/18

to LCZero

Indeed! The 15x192 net scales better than the 20x256, which is expected since the 20b has a higher default ELO at lower nodes. Also Deepmind never tested AlphaZero at higher node counts. While you can use the best-fit equation for their short time control tests, it is not accurate.

Dan Kelly

unread,

Aug 23, 2018, 7:35:12 PM8/23/18

to csc...@gmail.com, LCZero

What do you mean by long tc's? 40/15? 90/30?

--

You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/57a13c94-917c-44eb-8411-04c8022817f6%40googlegroups.com.

pawnslinger

unread,

Aug 24, 2018, 2:25:12 AM8/24/18

to LCZero

I was playing game in 60. But game in 120 was my next step. Since Leela couldn't win the K+R v K ending... I didn't see the point of going longer. In some events, the endgame might have been adjudicated in Leela's favor, I guess. Being up a whole Rook and then unable to bring it home... a loss in my book.

How many games do you have to train Leela thru before she starts to understand these basic positions? The B and N endgames are even more difficult. Most of the AB engines rely on TB for this, and it helps them a lot. However, if Leela is claiming to be an AI, then looking up stuff in the TB without true understanding... well, that's a lot like cheating to me.

John D

unread,

Aug 24, 2018, 4:19:07 AM8/24/18

to LCZero

This is where its critical to remember the elo gains are all relative to one opponent, 12 openings, and a specific hardware configuration...in particular the last, as SF was bottlenecked wrt hash table at longer TCs.

Dan Kelly

unread,

Aug 24, 2018, 8:46:25 AM8/24/18

to John D, LCZero

I tested Leela in all the basic mate endings she succeeded in all of them at TC 3/2

on my pitifully slow computer.

--

You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/6a091c19-02cb-4868-b251-13c72c68a0ef%40googlegroups.com.

Dan Kelly

unread,

Aug 24, 2018, 8:49:30 AM8/24/18

to John D, LCZero

PS. That's with no TB of course.

pwa128

unread,

Aug 29, 2018, 11:07:25 AM8/29/18

to LCZero

I have been doing a fair bit of analysis of my games and openings using Leela over the past 3 days. I started using Lc0 v0.16 and network 10776. It very quickly (5 seconds) latched onto its best move and I don't think it ever changed it mind even if I ran it for several minutes (well over 400k nodes). What's more the first 16 plies of the PV rarely changed after 20 to 30 seconds. My experience in doing analysis was entirely consistent with these scaling results.

Today I switched to Lc0 v0.17 and network 11149 and my experience has been very different. The PV changes a lot more and sometimes the best first move will also change after 30 seconds, 1 mniute or 2 minute into the search. Not often, but then even Stockfish doesn't change its first move very often after the first 5 or 10 seconds (and from my work on correspondence chess a few years back I believe Stockfish has a superb search algorithm). Whilst I haven't done a side by side comparison of v0.16 and v0.17 I did go back over the same games and the difference to my mind was quite pronounced and I am assuming it comes from the version of Lc0 rather than the net.

If you did the tests using v0.16, Cscuile, I would be pretty optimistic of the results being better with v0.17

On Wednesday, 22 August 2018 17:42:47 UTC+1, Cscuile wrote:

Wolf_Pawn

unread,

Aug 29, 2018, 1:44:29 PM8/29/18

to LCZero

I'm curious as to why scaling with Id 10480 is very useful when it's endgame evaluation is less evolved. I've seen a lot of improvement in the endgame with newer Ids. SF is stronger in the endgame than Id 10480, which could be the main reason for better scaling. Lc0 is starting to outplay/better evaluate some endgame positions. The point is that the limit that 10480 hit, a newer and better Id might go beyond?

On Wednesday, August 22, 2018 at 12:42:47 PM UTC-4, Cscuile wrote:

Dietrich Kappe

unread,

Aug 29, 2018, 2:11:29 PM8/29/18

to LCZero

Smart pruning off, I hope?

ovi...@gmail.com

unread,

Aug 29, 2018, 3:17:17 PM8/29/18

to LCZero

I do not know if smart pruning was the default in v16 and not in v17... better to include a flag to be sure and do not trust the default.

Fahim Saharaiar

unread,

Aug 29, 2018, 3:42:10 PM8/29/18

to LCZero

What was the Alpha Zero network size?

Cscuile

unread,

Aug 29, 2018, 5:57:46 PM8/29/18

to LCZero

"If you did the tests using v0.16, Cscuile, I would be pretty optimistic of the results being better with v0.17"

Interesting, thanks I'll keep that in mind for when I start scaling the strongest ID.

Cscuile

unread,

Aug 29, 2018, 5:59:20 PM8/29/18

to LCZero

Wolf_Pawn

This was the strongest network at the time. Please keep in mind these scaling tests can take a very long time to finish, especially with ID tests in between.

Cscuile

unread,

Aug 29, 2018, 6:01:03 PM8/29/18

to LCZero

Dan Kelly,

Around those time controls. 40-60 Minutes TCs would be a good estimate.

Reply all

Reply to author

Forward