40x256 network bootstrapped off of t40 data

6,661 views
Skip to first unread message

Joe MD

unread,
May 17, 2019, 4:56:35 PM5/17/19
to LCZero
If anyone wants to donate some GPU time to training a 40x256 (40b) which was on par with 41800 (haven't tested it recently), you're more than welcome to. I have been training it for quite some time now and hope to continue to train it into the future.

http://157.230.189.191:8080/

Any help would be appreciated. It's still in its infancy of training off self play data, about 150k games in the 500k window are self play, the rest are t40 games.

M MUSTERMANN

unread,
May 17, 2019, 5:18:19 PM5/17/19
to LCZero
Joe MD:
If anyone wants to donate some GPU time to training a 40x256 (40b) which was on par with 41800 (haven't tested it recently), you're more than welcome to. I have been training it for quite some time now and hope to continue to train it into the future.

http://157.230.189.191:8080/

Any help would be appreciated.  It's still in its infancy of training off self play data, about 150k games in the 500k window are self play, the rest are t40 games.



Thank you very much for doing this.

Can you explain to me why you decided to go for 40 x 256 instead of 40 x 512?
20 x 256 was the old one.

Something like 40 x 384 could be interesting too.
 
Message has been deleted

Joe MD

unread,
May 17, 2019, 6:01:29 PM5/17/19
to LCZero
It would take too long to train a bigger network, this network already took me almost a month to train to get to par with 41800. Transitioning to self play generation might have lowered the strength a little, but I haven't tested it recently.
Network specs:
40x256
Conv policy head
Wdl value head
SE
.002 LR

Trevor G

unread,
May 17, 2019, 9:49:51 PM5/17/19
to Joe MD, LCZero
It’s cool you’re doing this, but given computing constraints, my guess would’ve been that 20x512 might have been better than 40x256. Intuitively, depth would be more about finding more complex hierarchies of patterns, but that seems a bit limited on an 8x8 chess-board. However, increasing the number of channels would increase the “pattern repertoire” at each layer, and it seems to me that that would be more helpful.

Furthermore, a 20x512 network would probably train much faster than a 40x256, which has so many layers to propagate error.

I might be completely wrong, and I *can* think of reasons why deeper could be better than wider (eg, maybe deeper allows for more “abstract” patterns).

If you have any evidence showing that deeper is definitely the way to go, I’d be very interested in hearing.



--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/7a6351d9-30bf-42c9-9435-1338ec4639d3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

123

unread,
May 18, 2019, 5:19:14 AM5/18/19
to LCZero
Trevor:
It’s cool you’re doing this, but given computing constraints, my guess would’ve been that 20x512 might have been better than 40x256. Intuitively, depth would be more about finding more complex hierarchies of patterns, but that seems a bit limited on an 8x8 chess-board. However, increasing the number of channels would increase the “pattern repertoire” at each layer, and it seems to me that that would be more helpful.

Furthermore, a 20x512 network would probably train much faster than a 40x256, which has so many layers to propagate error.

I might be completely wrong, and I *can* think of reasons why deeper could be better than wider (eg, maybe deeper allows for more “abstract” patterns).

If you have any evidence showing that deeper is definitely the way to go, I’d be very interested in hearing.


On Fri, May 17, 2019 at 6:01 PM Joe MD <piru...@gmail.com> wrote:
It would take too long to train a bigger network, this network already took me almost a month to train to get to par with 41800.  Transitioning to self play generation might have lowered the strength a little, but I haven't tested it recently.
Network specs:
40x256
Conv policy head
Wdl value head
SE
.002 LR

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lcz...@googlegroups.com.

I can tell you that depth is completely unimportant. At least if you have an RTX GPU.

123

unread,
May 18, 2019, 5:25:00 AM5/18/19
to LCZero
Joe MD:
How long it would take in your opinion to train a 40 x 512 network compard to your 40 x 256 network?
Of course you needed a month but even if the 40 x 512 network takes more time, together we will be much faster there.
I will support your 40 x 256 network until there arrive an 40 x 512 network:)

How do I need to run the client to donate games to your network???

Joe MD

unread,
May 18, 2019, 7:46:36 AM5/18/19
to LCZero
Google used 256x40 with success and Leela Zero (go) uses 256x40 with success. I believe 512 filters 20 blocks would take much longer to train than 256 filters and 40 blocks.

If anyone wants to connect to my sever in client paste

clent_linux / client.exe --hostname=http://157.230.189.191:8080 --user=[username] --password=[password]

Joe MD

unread,
May 18, 2019, 7:55:51 AM5/18/19
to LCZero
I'll do some testing of 512x40 today and post the training speed compared to 256x40.

M MUSTERMANN

unread,
May 18, 2019, 9:09:55 AM5/18/19
to LCZero
Joe MD:
I'll do some testing of 512x40 today and post the training speed compared to 256x40.

I will prefer if you can give us the possibility to support your 40 x 512 network:)
You don't need to cancel the 40 x 256 network.
We can compare after a month or two both networks.

I'm not really interested in training speed.
I'm fine even if the training speed is only 1/20.
I'm interested in chess results, better chess understanding, more elo and in the long run the 40 x 512 network should win. So I'm all in here with my 2x RTX 2080 Ti GPUs.

123

unread,
May 18, 2019, 6:56:28 PM5/18/19
to LCZero
Joe MD:
Google used 256x40 with success and Leela Zero (go) uses 256x40 with success.  I believe 512 filters 20 blocks would take much longer to train than 256 filters and 40 blocks.

If anyone wants to connect to my sever in client paste

clent_linux / client.exe --hostname=http://157.230.189.191:8080 --user=[username] --password=[password]


What I need to write if I want to use client... with two GPUs? 

Joe MD

unread,
May 18, 2019, 7:06:42 PM5/18/19
to LCZero
We're going to keep training the 40x256, i was just talking about testing how many pos/sec a 40x256 uses.  This net has the alpha zero policy head so it's even slower than just doubling the blocks.  Tensorboard is currently changing a lot, policy accuracy is skyrocketing and value accuracy is tanking.

Joe MD

unread,
May 18, 2019, 7:06:53 PM5/18/19
to LCZero
You can do this:
./client_linux / client.exe --hostname=http://157.230.189.191:8080 --user=[username] --password=[password] --parallelism=64 --backend-opts="(backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)"

Trevor G

unread,
May 18, 2019, 11:51:10 PM5/18/19
to Joe MD, LCZero
Oops, I actually meant something more like 20x362 — which, like 40x256, should approximately double the amount of computation of a 20x256 network. I just wasn’t thinking that n-filters gets squared in terms of computation.

In 19x19 go, it does make a bit more sense to me why 40 blocks could help. The board is bigger, and you need network depth to get a holistic view of the board when each convolution is only 3x3, and perhaps to make sense of things at various scales.

With chess, I feel that 256 channels might be the limiting factor with all of the various types of piece interactions and tactical combinations. It just seems like that should require network width more than depth.

Anyway, I hope the 40x256 effort yields some great results. I also hope somebody will try something similar with a width-only-increased network (perhaps bootstrapping from the 40x256 data), so we can know which one is more important vs Leela’s current 20x256 size.



--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/830d077e-373c-41e5-a49d-5dc2cd7e7edd%40googlegroups.com.

Joe MD

unread,
May 20, 2019, 4:34:46 AM5/20/19
to LCZero
Thanks for the support. I uploaded a partial tensorboard at http://157.230.189.191:6006

Self play started at 132k steps. Policy accuracy is increasing nicely since then.

123

unread,
May 21, 2019, 1:10:24 PM5/21/19
to LCZero
Joe MD:
I'll do some testing of 512x40 today and post the training speed compared to 256x40.

Can you show us the result of 40 x 512 training speed?


It was obviously a very good idea to donate gpu power, we have our first extremely good results here:)

The training is now already 66 elo better compared to id 3 which was tested.
At the moment it is only possible to donate gpu power to this training.

Sean Koors

unread,
May 21, 2019, 1:58:07 PM5/21/19
to LCZero
Did anyone get this to work in Colab yet? I tried adding the suggested url and keep getting errors...

Sean Koors

unread,
May 21, 2019, 2:03:17 PM5/21/19
to LCZero
Wait... I got it...

!cd lc0/build && ./client_linux --hostname=http://157.230.189.191:8080 --user 'Google Colab' --password googlecolab

Joe MD

unread,
May 22, 2019, 12:02:09 PM5/22/19
to LCZero
Sorry, I've been busy with the training server to test the training speed of any new net architectures.

I added an adaptive resign rate that duplicates the resign characteristics of resign in Alpha Zero. This means the resign percentage is variable to a 5% false positive resign rate. So now the resign percentage will change throughout the day.

You can find the training parameters here:
http://157.230.189.191:8080/training_runs

Brian Richardson

unread,
May 23, 2019, 8:53:27 AM5/23/19
to LCZero
Looking at the self Elo graph and wondering about the .002 learning rate.
I know the net was started from T40 data.
So, it seems like this LR might be just fine-tuning from there.

How about trying CLR which might enable finding a new minimum?
There is a branch here which does run:

Thanks.

SxB

unread,
May 24, 2019, 5:57:33 AM5/24/19
to LCZero
Sorry, donating via colab but not sure how to donate from Windows with 2060.
I currently just double click client.exe, but that server is down, do not see any config where I can enter host details?

Shaun

M MUSTERMANN

unread,
May 24, 2019, 7:43:27 AM5/24/19
to LCZero
SxB:
Sorry, donating via colab but not sure how to donate from Windows with 2060.
I currently just double click client.exe, but that server is down, do not see any config where I can enter host details?

Shaun


1. Click Windows button + R
2. Write cmd.exe
3. Write in the new black window where you have client.exe for example:
3.1 C: and click enter.
3.2 Now write: cd LC0 or cd Leela.
3.3 Now (you are in your LC0 folder) you should copy from word something like this: 
client.exe --hostname=http://157.230.189.191:8080 --user=XXXXX --password=XXXXX --parallelism=64 --backend-opts="(backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)"
-If you are using only 1 gpu, then write: client.exe --hostname=http://157.230.189.191:8080 --user=XXXXX --password=XXXXX --parallelism=32 --backend-opts="(backend=cudnn-fp16,gpu=0)"
-right mouse click and paste it in the black window and click enter.

Joe MD

unread,
May 24, 2019, 5:27:53 PM5/24/19
to LCZero
6-man tablebase rescoring has been added to the 40b Experimental server, which was previously absent. I will keep you updated on other improvements that have been made in the near future.

123

unread,
May 25, 2019, 8:35:12 AM5/25/19
to LCZero
Joe MD:
6-man tablebase rescoring has been added to the 40b Experimental server, which was previously absent.  I will keep you updated on other improvements that have been made in the near future.

Very good idea.
Tablebases means perfect play and that's why I expect a very big elo gain. 

But I would prefer 7-men tablebases rescoring!

Sean Koors

unread,
May 25, 2019, 11:42:59 AM5/25/19
to LCZero
Glad to hear this got implemented... I had the same thoughts before... And yes 7-men is of course better, but maybe painful

Joe MD

unread,
May 27, 2019, 5:31:52 AM5/27/19
to LCZero
After changing the training parameters to match that of AlphaGo Zero, which was done 2 days ago, training appears to be finally stabilizing after the initial wild swings. It should be a little smoother from here on.

The main parameters that were changed are temperature of 1 for the first 30 moves and endgame temperature of 0.

Edward Panek

unread,
May 27, 2019, 10:16:20 AM5/27/19
to LCZero
Now testing 30 minute TC 40b 38 vs SF 8 

M MUSTERMANN

unread,
May 27, 2019, 5:42:24 PM5/27/19
to LCZero
Joe MD:
After changing the training parameters to match that of AlphaGo Zero, which was done 2 days ago, training appears to be finally stabilizing after the initial wild swings. It should be a little smoother from here on.  

The main parameters that were changed are temperature of 1 for the first 30 moves and endgame temperature of 0.


I also think it's better to use 7-men tablebases rescoring:)
Message has been deleted

Joe MD

unread,
May 28, 2019, 9:21:43 AM5/28/19
to LCZero
I agree 7 man would be ideal. Does anyone have any data on which 7 man tb's are most likely to be rescored? I can just get the most common ones that are used in rescoring(which is not necessarily the most common 7 man positions) instead of all of them.
I also fixed the error on the server that was giving people some problems:
2019/05/23 13:49:56 client_http.go:41: Bad JSON from http://157.230.189.191:8080/next_game -- Invalid training run

Training is going well right now.

Keep you updated.

Joe MD

unread,
May 30, 2019, 1:21:37 AM5/30/19
to LCZero
Made some changes to the training parameters.

1) Reverted to the original FPU of Alpha Zero
"--fpu-strategy=absolute", "--fpu-value=-1.0"
2) Reverted the resign back to normal style instead of --resign-wdlstyle
3) Increased the false positive resign threshold back to 5% from 3.5

The net went through a major recovery phase after the transition to AlphaGo Zero temperature settings. This might bring some more changes to the structure of the weights but it shouldn't be as dramatic as the previous change. Hopefully this will be the last change in a while.

M MUSTERMANN

unread,
Jun 1, 2019, 3:00:47 AM6/1/19
to LCZero
Joe MD:
Is 7-men tablebases rescoring possible now?

Congrats to the training:) 
It looks like the elo graph is improving very fast.
We have already reached 3280 elo with 40x256 while the little 20x256 is only at 3242 elo and it looks like 20x256 doesn't improves so good and fast.
Some days ago we were 150 elo behind.
20x256 training has 10x times more training games per day compared to the 40x256 so this looks like a disaster.

Joe MD

unread,
Jun 3, 2019, 10:28:20 AM6/3/19
to LCZero
We're making good progress on the 40b network. The question is which 7 man tablebases to include because the 14tb is a lot when most of those endgames are probably heavy favored to one side and won't benefit from rescoring. I can manually download them and test which ones rescore the most amount of games but it isn't a trivial task.

glbchess64

unread,
Jun 3, 2019, 2:27:41 PM6/3/19
to LCZero
 May be you can download the files as needed : when you reached a 7 men position first you look at your disk, if the file is on your disk you use it, if not you download it automatically (or you put the game in standby, waiting you download manually the file).

RaffaR

unread,
Jun 3, 2019, 3:19:26 PM6/3/19
to LCZero

Hi Joe, I do not well remember but maybe the list with the most common 7 men was just posted on Discord.  

Brian Richardson

unread,
Jun 3, 2019, 4:48:49 PM6/3/19
to LCZero
A while ago:

jhorthos 05/10/2019
@Occyroexanthub and @Mardak they are split on two SSDs, but here is the relevant list:
KBPPPvKB.rtbw
KBPPPvKP.rtbw
KBPPPvKR.rtbw
KBPPvKBP.rtbw
KBPPvKNP.rtbw
KBPPvKPP.rtbw
KBPPvKRP.rtbw
KNPPPvKP.rtbw
KNPPPvKR.rtbw
KNPPvKBP.rtbw
KNPPvKNP.rtbw
KNPPvKPP.rtbw
KNPPvKRP.rtbw
KPPPPvKP.rtbw
KPPPvKBP.rtbw
KPPPvKNP.rtbw
KPPPvKPP.rtbw
KPPPvKRP.rtbw
KQBPPvKP.rtbw
KQNPPvKP.rtbw
KQPPPvKP.rtbw
KQPPvKPP.rtbw
KQPPvKQP.rtbw
KQRNPvKP.rtbw
KQRPPvKP.rtbw
KQRPvKPP.rtbw
KRBPPvKP.rtbw
KRBPPvKR.rtbw
KRBPvKPP.rtbw
KRBPvKRB.rtbw
KRNPPvKP.rtbw
KRPPPvKP.rtbw
KRPPvKBP.rtbw
KRPPvKNP.rtbw
KRPPvKQP.rtbw
KRRPvKRP.rtbw

I think they can be downloaded from here:

Álvaro Begué

unread,
Jun 3, 2019, 5:49:02 PM6/3/19
to LCZero
Why not have a server for 7-men positions? You only need to query it once per game that reaches a 7-men position. So even if 1 million games per day reach a 7-men position, you only get something like 11 requests per second. That sounds perfectly manageable.

 

Trevor G

unread,
Jun 4, 2019, 6:28:04 PM6/4/19
to Álvaro Begué, LCZero
I thought (7 man) tablebase rescoring means all 7-man positions are rescored? Training games aren’t truncated when rescored, right? Maybe I’m wrong...

On Mon, Jun 3, 2019 at 5:49 PM Álvaro Begué <alvaro...@gmail.com> wrote:
Why not have a server for 7-men positions? You only need to query it once per game that reaches a 7-men position. So even if 1 million games per day reach a 7-men position, you only get something like 11 requests per second. That sounds perfectly manageable.

 

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.

Álvaro Begué

unread,
Jun 4, 2019, 6:38:10 PM6/4/19
to LCZero
Ah, it looks like you are right. The rescoring procedure is nicely described here: http://blog.lczero.org/2018/09/tb-rescoring.html

So there would be as many lookups as there are irreversible moves (pawn pushes and captures) after a 7-man position is reached.

Álvaro.


On Tuesday, June 4, 2019 at 6:28:04 PM UTC-4, Trevor wrote:
I thought (7 man) tablebase rescoring means all 7-man positions are rescored? Training games aren’t truncated when rescored, right? Maybe I’m wrong...
On Mon, Jun 3, 2019 at 5:49 PM Álvaro Begué <alvar...@gmail.com> wrote:
Why not have a server for 7-men positions? You only need to query it once per game that reaches a 7-men position. So even if 1 million games per day reach a 7-men position, you only get something like 11 requests per second. That sounds perfectly manageable.

 

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lcz...@googlegroups.com.

Joe MD

unread,
Jun 5, 2019, 5:03:51 PM6/5/19
to LCZero
Thanks for the list but the discrepancy i described was the difference between the most common 7 man positions and the most likely 7 man positions to be rescored. For example KQRNPvKP.rtbw is on the most common list but there's no chance that it would ever be rescored. Finding the most common to be rescored would require me to download the table bases and check each one against a set of games to see which are most common for rescoring.

123

unread,
Jun 6, 2019, 7:18:29 AM6/6/19
to LCZero
Joe MD:
Thanks for the list but the discrepancy i described was the difference between the most common 7 man positions and the most likely 7 man positions to be rescored.  For example KQRNPvKP.rtbw is on the most common list but there's no chance that it would ever be rescored.  Finding the most common to be rescored would require me to download the table bases and check each one against a set of games to see which are most common for rescoring.

I would prefer to use all 7-men tb from cloud to be on the safe site.

Fash Z

unread,
Jun 7, 2019, 9:31:22 AM6/7/19
to LCZero
i started to contribute to this project...  t40 training is unstable 



is this net tested for strength?   i think distilled 40b to 20b version would be stronger

PiotrekL

unread,
Jun 8, 2019, 3:23:32 AM6/8/19
to LCZero
Did anyone test it against current T40 nets? I am interested to know if it's stronger with the same amount of nodes.
I run quite a bit of analysis with it and y feelings is that it's stronger if time is not an issue (which it isn't if you use it for analysis and have finite amount of RAM).

Joe MD

unread,
Jun 8, 2019, 6:30:19 AM6/8/19
to LCZero

Notice: Please update engine

We have changed the default from draw adjudication from 450 ply to 250 ply to reduce the number of redundant positions. No other changes were made except a change to version 40 to identify the engine.

Linux users: Build from: https://github.com/joeismad/lc0

SxB

unread,
Jun 9, 2019, 1:50:16 PM6/9/19
to LCZero
Hi,
currently using Google Colab to donate. Changed url is it possible to use new exe's? Thanks Shaun

Weber Yan

unread,
Jun 9, 2019, 2:12:51 PM6/9/19
to LCZero


On Monday, June 10, 2019 at 1:50:16 AM UTC+8, SxB wrote:
Hi,
currently using Google Colab to donate. Changed url is it possible to use new exe's? Thanks Shaun

I cannot load the 40 Blocks on lc.0.21.2. Anyone experience that? It say: error Invalid weight file: parse error. The 20x256 network work fine.

Joe MD

unread,
Jun 9, 2019, 2:24:36 PM6/9/19
to LCZero
Just change your script line for Google Colab that starts with "git clone" to "git clone https://github.com/joeismad/lc0.git".

Joe MD

unread,
Jun 9, 2019, 2:29:37 PM6/9/19
to LCZero

Not sure why it wouldn't work, might ask in the Discord.

Joe MD

unread,
Jun 9, 2019, 2:34:13 PM6/9/19
to LCZero
Partial 7 man tablebase rescoring has also been implemented on the 40b experimental server.

SxB

unread,
Jun 9, 2019, 3:35:51 PM6/9/19
to LCZero
Thanks

123

unread,
Jun 9, 2019, 5:06:00 PM6/9/19
to LCZero
PiotrekL:
Did anyone test it against current T40 nets? I am interested to know if it's stronger with the same amount of nodes.
I run quite a bit of analysis with it and y feelings is that it's stronger if time is not an issue (which it isn't if you use it for analysis and have finite amount of RAM).

I have done it a week ago.
TC was only 1 second per move, so it was much better for the 20x256 network.
With more time 40x256 plays better.
I used one gpu per engine, so it was again better for the 20x256 network.
With two gpus 40x256 plays better / has higher elo increase than 20x256 with 2 gpus.
The result was +5 elo for the 20x256 network.

40x256 has only 1/3 of the kn/s compared to the 20x256.
Note that doubling kn/s will bring +50 elo or maybe more.
So 40x256 with 2/3, 66% kn/s would be +45 elo stronger than 20x256 with 100% kn/s.

But since then 20x256 networks improved by ~ 0 elo according to the graph.
And the 40x256 networks improved by ~ 40 elo according to the other graph.

So even now under bad conditions 40x256 could be already much better or equal or at least slightly worse.

123

unread,
Jun 9, 2019, 5:07:07 PM6/9/19
to LCZero
Joe MD:
Partial 7 man tablebase rescoring has also been implemented on the 40b experimental server.

That's great:)

You can give us more informations about that if you want to;) 

Joe MD

unread,
Jun 10, 2019, 3:33:20 AM6/10/19
to LCZero
For the 7 man tablebases I used the list posted here by Brian Richardson, and removed any of the tablebases that were up a full piece in material (is. a knight, bishop, etc) since the likelihood of them being rescored is probably close to 0. So it ended up being about 400gb of the most common 7 man TBs.

The list is as follows:
KRNPvKRN.rtbw
KRPPvKRP.rtbw
KBPPPvKR.rtbw
KBPPvKBP.rtbw
KBPPvKNP.rtbw
KBPPvKRP.rtbw
KNPPPvKR.rtbw
KNPPvKBP.rtbw
KNPPvKNP.rtbw
KNPPvKRP.rtbw
KPPPvKBP.rtbw
KPPPvKNP.rtbw
KPPPvKPP.rtbw
KQPPvKQP.rtbw
KRBPvKRB.rtbw

Joe MD

unread,
Jun 12, 2019, 11:07:58 AM6/12/19
to LCZero
Windows users, please upgrade to the new 225 ply draw adjudication threshold.  Hardly any games were hitting the 250 threshold, so i moved it to 225 shortly after the build.  Most linux users should be at the 225 window if they built from my repo.


I'm considering lowering the threshold to 200 but need more people to move from the 250 threshold to the 225 threshold to see how it affects the data.

Zlatko Hulama

unread,
Jun 12, 2019, 1:35:18 PM6/12/19
to LCZero
Did anyone test a couple of latest 40b nets vs some of the best T40 20b nets?
Preferably one time controlled test and a test with same number of nodes.

Karol Majewski

unread,
Jun 12, 2019, 1:47:06 PM6/12/19
to LCZero
TC match doesn't make sense as 20x256 will always be ahead. What is extremely intresting is same number of nodes match. I'll put my money on 40x256 here.

M MUSTERMANN

unread,
Jun 12, 2019, 2:44:21 PM6/12/19
to LCZero
Karol Majewski:
TC match doesn't make sense as 20x256 will always be ahead. What is extremely intresting is same number of nodes match. I'll put my money on 40x256 here.

Did anyone test a couple of latest 40b nets vs some of the best T40 20b nets?
Preferably one time controlled test and a test with same number of nodes.

Same number of nodes is not so interesting.
Maybe only when we talk about memory and or problems.
+75 elo with the 40x256 was good enough, for me, with the same number of nodes.

M MUSTERMANN

unread,
Jun 12, 2019, 2:47:35 PM6/12/19
to LCZero
Zlatko Hulama:
Did anyone test a couple of latest 40b nets vs some of the best T40 20b nets?
Preferably one time controlled test and a test with same number of nodes.

I have done some time control tests.
When using more time the 40x256 increases in strength more and more.
It is better, at least under long time control.

Karol Majewski

unread,
Jun 12, 2019, 3:26:39 PM6/12/19
to LCZero
Well, the whole idea of 40x256 is to create a net that holds more knowledge, patterns at the cost of speed. So if you want to compare 20x256 to 40x256 - then what's better way to compare them than fixed node match? And how do you know it's +75 Elo? I wouldn't trust these Elo graphs.


MUSTERMANN napisał:

Zlatko Hulama

unread,
Jun 13, 2019, 12:51:40 AM6/13/19
to LCZero
It's better to compare them with normal time controls, however, comparing them with fixed node match shows if 40b is actually "smarter" and distilling 40b to 20b (or 22b or 18b or whatever) should in theory create the strongest net as well as showing us how many blocks have the best balance between speed and accuracy!
We're currently mostly training 20b, but we don't know if perhaps 19b is as accurate, but faster enough to pull ahead. Or perhaps 21b is slower, but so much smarter that it's a better choice than 20b? Nobody made those tests yet, we're just using approximate numbers.

Shah

unread,
Jun 13, 2019, 1:27:29 AM6/13/19
to LCZero
I think if you compare just fixed nodes you'd know which is smarter - but only for that fixed node.
It could well be that the other net "scales better" and will hence prove smarter in a test with a higher fixed node count.

About balance between speed and accuracy:
I think that in practice you are correct.
But in theory if it would have been feasible to train an unlimited huge net, you could end up with a (absurd...) "1-node only" net that would become as strong as you wish...

Some say Leela is already ~2200 elo at 1 node (hard for me to beleive)
As much as it matters, human GM inspect 10s of nodes...

Joe MD

unread,
Jun 13, 2019, 6:51:19 AM6/13/19
to LCZero
If it were true that bigger nets would be weaker because of the loss of NPS then the 128x10 network would surely be stronger than the 256x20. Claiming that a 256x40 cannot beat a 256x20, because of the neural network structure, with equal time, is pure speculation without any evidence to support it.

Zlatko Hulama

unread,
Jun 13, 2019, 9:08:58 AM6/13/19
to LCZero
That's exactly the thing, How do we know that 80x512 is not even stronger with even less NPS? We don't. 
What about 160x1024 or even bigger?

There are two main reasons for "sweet spot network size hypothesis"

1. A lot of nodes and depth must be checked because of tactics. A bigger net with low NPS number might have tactics blind spots and there is probably some kind of a "sweet spot" in regards to speed vs smarts.
2. By increasing the net size, "smarts" are increasing logarithmically (it could even be a horizontal asymptote). However, the speed drops linearly! Even without "tactics blind spots", there is a sweet spot in regards to speed vs smarts.

Some raw numbers showed us that we get around 50 ELO when we double the node count.
For smarts, we get 250-300 elo by moving from 10x128 to 20x256
We also lose 4x speed by moving from 10x128 to 20x256
That gives us a loss of 200 elo because of speed and a gain of 250-300 elo because of smarts for a net gain of 50-100 elo.

If we get less than 50 elo of smarts with 40b net, 20b is probably going to be stronger because it's more than 2x faster, BUT, it could be stronger on extremely strong hardware and very long time controls so it should still be very usefull for TCEC and probably for distillation of stronger smaller (faster) nets for us mere mortals with weak computers.

PrinceZappa

unread,
Jun 14, 2019, 10:09:07 PM6/14/19
to LCZero
The law of diminishing returns applies here. Both in terms of net size, as the elo improvements lessen significantly with each step up in net size. But also in terms of nodes computed. For instance in TCEC sufi, unless running low on time in the endgame, Leela probably played best moves with the nodes she had. So the point about 200 elo loss due to speed compromise compared to 10x128 is kind of moot, with that long time control she had.

If you really care about elo improvements the most efficient way to make it happen would be through use of Pohl's opening book or like thereof. Part supervised learning it may be, it would quite easily add another 50 elo, I reckon. 
Message has been deleted

glbchess64

unread,
Jun 14, 2019, 10:44:40 PM6/14/19
to LCZero
Someone (@nps2060) publish a test on discord at 3000 nodes per moves (so advantage to 40b_106) here the results :

# PLAYER : RATING ERROR POINTS PLAYED W L D D(%) CFS(%) 1 lc0.net.42550 : 0.0 20.8 60.5 100 30 9 61 61 100 2 lc0.net.40b_106 : -75.1 20.8 39.5 100 9 30 61 61 ---

You can see there is a lot to do to reach T40 level.

Álvaro Begué

unread,
Jun 14, 2019, 10:54:19 PM6/14/19
to LCZero
A bigger network should have better performance at a fixed number of nodes, but that's only true if the training is done properly. For instance, the larger network has larger capacity for overfitting, so it probably needs more games or smaller learning rates and longer training than the smaller network.

Training the standard size network is expensive enough, so I don't see much chance of success for this 40x256 project in the next few years, even if it is intrinsically a good idea.

Alexander Lyashuk

unread,
Jun 15, 2019, 2:13:19 AM6/15/19
to Joe MD, LCZero
Hi Joe MD,

Could you change suffix of your custom builds in version.inc to something which is not "" or "rc1" or "rc2", so that it's not confused with official releases and not accepted by lczero.org?

Thanks!

--
You received this message because you are subscribed to the Google Groups "LCZero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lczero+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/92326669-0e1d-4740-9909-4477125bb919%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

123

unread,
Jun 15, 2019, 8:47:24 AM6/15/19
to LCZero
Karol Majewski:
He was talking about same number of nodes and +75 elo. I think he was talking about some or his own tests +75 elo and not about elo from graphs. 

123

unread,
Jun 15, 2019, 9:30:25 AM6/15/19
to LCZero
Zlatko Hulama:
When we talk about speed(not a good chosen word in this case) -> we mean depth!!
It is a question of depth vs accuracy.

1.Bigger net is always more accurate.
2.But has lower depth.

1.A perfectly trained 40x256 net has is own perfect accuracy. And perfect stays always perfect. You can't increase perfect!!
2.Depth is all about which gpus we are using.
That's why the idea of an ideal balance between depth and accuracy doesn't exist.
Or in other words: If I have a perfectly trained 40x256 net and 1x RTX 2080 Ti and that's the ideal balance between depth and accuracy, then no other person will have the same balance between this depth and accuracy unless they have also the 40x256 net and 1x RTX 2080 Ti.

1.The main point is: How much depth does LC0 need per move?
2.Someone will answer he has a perfect trained 20x256 net and 1x GTX 1080 and this is the perfect balance between depth and accuracy and this is true. Okay it may be true but:
3.Some weeks later the RTX cards arrive and he bought one.
4.Now he has much more depth per move and in this case of depth he is overpowered.
5.Now he still has a perfectly trained 20x256 net but he is out of balance, so the solution is to use a bigger 40x256 net.
6.Now he will have a bigger accuracy and depth will be back again to the depth he has before. Now he is again back to perfect balance between depth and accuracy.


1.We will have in the next 5 years much more new gpus which will be much stronger, I mean not by something like 10% stronger, but by a factor of 2, 3, 4, 5, 10 times stronger, I mean when using artificial intelligence.
= It doesn't make any sense for someone with an RTX gpu to donate training games to 10x128 or 20x256. 
Maybe only for some special cases to prove some special things for a short time.

By the way I have 2x RTX 2080 Ti gpus and it doesn't makes any sense to use a 20x256 net, then the depth will increase, thanks to the second gpu, maybe from depth 30 to depth 32 and my elo benefit would be something like +2 elo.

123

unread,
Jun 15, 2019, 9:37:58 AM6/15/19
to LCZero
Joe MD:
If it were true that bigger nets would be weaker because of the loss of NPS then the 128x10 network would surely be stronger than the 256x20.  Claiming that a 256x40 cannot beat a 256x20, because of the neural network structure, with equal time, is pure speculation without any evidence to support it.

That's correct.

Second was maybe a joke I think. Of course a bigger net can win with equal time.
When somebody does not believe it, he can try something like 8 rtx 2080 ti gpus for 40x256 and same for 20x256 and tc 30 seconds per move and ponder on for both and see how 20x256 will be destroyed. 

123

unread,
Jun 15, 2019, 9:44:16 AM6/15/19
to LCZero
Zlatko Hulama:
Of course we know that 80x512 is stronger with even less nps!
Nps are depth, so with enough depth there is no problem using bigger nets!
Try with something like 8x rtx 2080 ti gpus vs 8x rtx 2080 ti gpus and 30 seconds per move with ponder on. It should be easy for you to prove bigger is better.

123

unread,
Jun 15, 2019, 9:59:58 AM6/15/19
to LCZero
glbchess64:
Someone (@nps2060) publish a test on discord at 3000 nodes per moves (so advantage to 40b_106) here the results :

# PLAYER : RATING ERROR POINTS PLAYED W L D D(%) CFS(%) 1 lc0.net.42550 : 0.0 20.8 60.5 100 30 9 61 61 100 2 lc0.net.40b_106 : -75.1 20.8 39.5 100 9 30 61 61 ---

You can see there is a lot to do to reach T40 level.

This was a joke.
Why somebody should run an artificial intelligence with only 3000 nodes and a toaster gpu???
On my hardware this would be something like 1/20 or 1/30 of 1 second!!!
This test proves only that if you have no money and no time you should not use 40x256 networks and stay better with 20x256 or even 10x128 networks.

glbchess64

unread,
Jun 15, 2019, 11:45:58 AM6/15/19
to LCZero
@123 I in no way intend to cause controversy here, I just post this results to give evidence to the debate. Because there is a lot of assumptions in this thread but little solid data.

Explain me with real evidence why 3000 nodes per move does not result in a serious test. Generally when we test Leela nets at 3000 nodes per move against SF dev, with Leela ratio close to 1 (better result if close level), we have a very good idea of the results at longer TC.

May be the better would be to post a 100 games match between the same nets at 30,000 nodes and at 300,000. It seems you have the hardware to do this. It will be very convincing if such such tests gives very different results.

Porkinson

unread,
Jun 15, 2019, 12:10:04 PM6/15/19
to LCZero
 @glbchess64
Dont take that guy too seriously, he is basically a charlatan. Likes to flaunt to others how he has a lot of money, yet fails to do anything remotely productive with what he has, except for shitposting i guess. I wonder why he hasnt bought 100 rtx2080 and proved his completely baseless claims. A shame that this project attracts those types of clowns, but then again, maybe he is a troll.

PiotrekL

unread,
Jun 15, 2019, 4:19:53 PM6/15/19
to LCZero
I would like to run some tests with fixed number of nodes vs current T40 nets. Could someone summarize how do I set it up? Is there a standard tool or tutorial for running such tests?

glbchess64

unread,
Jun 15, 2019, 5:03:10 PM6/15/19
to LCZero
@PiotrkL : I use cutechess-cli running 20000 games in 500 games series (250 colour reversed). I adapted a script found in discord. You must have to modify some parameters if you change the number of nodes (nncache for Leela and hash for SF).

The batch :
for /l %%s in (1,250,9751) do (
cutechess-cli.exe ^
-engine name="Stockfish.dev" cmd="stockfish_19042720_x64_bmi2.exe" dir="C:\path\to\Stockfish-11-win" ^
option.Threads=4 option.Hash=512 nodes=1700000 ^
option.SyzygyPath=C:\path\to\TB\syzygy ^
-engine name="lc0.net.42547" dir="C:\path\to\LC0_v0.21" ^
option.WeightsFile="C:\path\to\LC0_v0.21\weights_42547.pb.gz" ^
cmd="lc0.exe --backend=cudnn-fp16 --syzygy-paths=C:\path\to\TB\syzygy" ^
nodes=3000 ^
-debug > Debug.txt -pgnout "match.pgn" min ^
-each proto=uci tc=inf -recover -concurrency 1 -tournament gauntlet ^
-draw movenumber=50 movecount=5 score=8 -resign movecount=5 score=1000 ^
-games 2 -rounds 250 ^
-openings file="C:\path\to\Books\openings-8ply-10k.pgn" format=pgn order=sequential start=%%s ^
-tb C:\path\to\TB\syzygy -repeat
)

M MUSTERMANN

unread,
Jun 19, 2019, 3:33:11 AM6/19/19
to LCZero
Joe MD:
Is there a reason not to use all 7 man tb??

I'm missing KBBP vs KNN and KNNP vs KBB. Many of these endgames are won.
What's with KRRP vs KRR? Some of them are easily won.
And also KPPPP vs KB or KN. These are very difficult.
I have seen lots of these endgames in the last time.

PiotrekL

unread,
Jun 24, 2019, 3:00:36 AM6/24/19
to LCZero
Possible bug, consider this position:
8/5kp1/5p2/8/8/8/6K1/8 b - - 1 7

Black has a king and 2 pawns vs a king which is an easy win. With tablebases disabled newest Lc0 with 256x40 net peaks at -2.01 evaluation (83.7% for black)
The same lc0 with TCEC SF network shows eval of -41.00 (although it somehow is decreasing as the search goes which might indicate a bug in search or in how eval is calculated).

I don't understand how lc0 works well enough to be able to venture a guess of why this occurs but it is very worrying.

Deep Blender

unread,
Jun 24, 2019, 3:37:30 PM6/24/19
to LCZero
As far as I know, all the training is done with tablebases and those positions are not included in the training (please correct me if I am wrong). If that is the case, the evaluation is not too surprising, as it would basically be unknown territory the network has never seen.

PiotrekL

unread,
Jun 25, 2019, 7:55:58 AM6/25/19
to LCZero
Does it mean one needs to have all the tablebases in use in training including the 7man ones to use this net? That sounds like a pretty tough barrier.

M MUSTERMANN

unread,
Jun 28, 2019, 6:07:37 AM6/28/19
to LCZero
PiotrekL:
Is it possible to due some 8 men tb rescoring? This could help a lot.

I think 40x is good enough for now, but the 256 must be increased to 512 = 40x512. Then LC0 will see much more tactics.

Looks like LC0 40x256 is stronger then 20x256 at middle and long time control and equal in this blitz mode:

PiotrekL

unread,
Jun 28, 2019, 12:37:29 PM6/28/19
to LCZero
Maybe the games should be played out according to the tablebases and those positions should be included in training? At least those with 8-7-6 pieces so people don't need 150+GB of tablebases to get reasonable assessments in the endgame?
I use only 40x256 network now for analysis. The main reason is that it's much stronger tactically. 20x256 still doesn't get some quite simple positions in the opening like:

1.e4 e5 2.Nf3 Nc6 3.Bc4 Nf6 4.d4 exd4 5.O-O Nxe4 6.Re1 d5 7.Bxd5 Qxd5 8.Nc3

8...Qc4 is a blunder here (because 9.Nd2 gives white big advantage) but it takes forever for 20x256 network even on strong GPU to have it as even slightly better for white while 40x256 has Qc4 as better for white (although slightly until you actually make the move and let it think there) almost instantly.

Edward Panek

unread,
Jun 29, 2019, 12:04:16 PM6/29/19
to LCZero
https://www.twitch.tv/edosani

Testing 40b160 vs SF DEV (6/27/19)

On Friday, May 17, 2019 at 4:56:35 PM UTC-4, Joe MD wrote:
If anyone wants to donate some GPU time to training a 40x256 (40b) which was on par with 41800 (haven't tested it recently), you're more than welcome to. I have been training it for quite some time now and hope to continue to train it into the future.

http://157.230.189.191:8080/

Any help would be appreciated.  It's still in its infancy of training off self play data, about 150k games in the 500k window are self play, the rest are t40 games.

joe MD

unread,
Jul 2, 2019, 10:04:53 PM7/2/19
to LCZero
Hello lczero community, 

I'm going on vacation,  hopefully when I get back we can restart training a bigger network and use the knowledge we gained here in the training of a new network.  Thanks for understanding.

Joe mD

Tomas Spark

unread,
Jul 3, 2019, 12:32:14 PM7/3/19
to LCZero
We are going to wait for you to come back.

Spark9_RTX
Reply all
Reply to author
Forward
0 new messages