Testing new parameters for Leela

1,742 views
Skip to first unread message

NuclearPawn

unread,
Mar 3, 2020, 10:49:49 PM3/3/20
to LCZero
Hello friends.
I have been recently testing a vast arrays of Configuration Settings for Leela. Inspired by Kiudee's findings I set to try and maybe find a possible better combination of parameters that uses the most of Leela's potential. I have tested all my experimental parameters against T60 Kiudee & T40 Kiudee. After a lot of tests I stumbled upon 2 very powerful configuration settings. At least in bullet.
These 2 configurations crush T60 & T40 Kiudee in bullet TC (1m+1s)
I have attached the PGN for this test. Each engine played the other 60 times with random 6 move openings played by both in each color. Their performance results are as follows (3600 elo starting point of all engines)

 Program                               Elo      +   -  Games  Score    Av.Op.  Draws

  1 Lc0 62502 Value2         : 3639   48  44    60    57.5 %   3587   71.7 %
  2 Lc0 62502 Value1         : 3635   47  43    60    56.7 %   3588   73.3 %
  3 Lc0 62502                     : 3569   42  45    60    44.2 %   3610   75.0 %
  4 Lc0 40850                     : 3556   55  57    60    41.7 %   3615   60.0 %
 
Since I don't have much resources my tests run too slow at long TC, so I would ask to all the testers if they can help on testing these settings at higher TCs and share the results here, and hopefully we make Leela even stronger.

Here are the two configurations:

Lc0 Value1 (Configuration 1)
CPut: 2.231000
CPutAtRoot: 2.231000
CPutBase: 19126.000000
CPutBaseAtRoot: 19126.000000
CputFactor: 2.865000
CputFactorAtRoot: 2.865000
Temperature: 0.000010
TempEndgame: 0.000010
FpuValue: 0.451000
FpuStrategyAtRoot: 0.451000
PolicyTemperature: 1.590000
SmartPruningFactor: 1.350000

Lc0 Value2 (Configuration 2)
CPut: 1.831000
CPutAtRoot: 1.831000
CPutBase: 17126.000000
CPutBaseAtRoot: 17126.000000
CputFactor: 2.256000
CputFactorAtRoot: 2.256000
Temperature: 0.000100
TempEndgame: 0.000100
FpuValue: 0.411000
FpuStrategyAtRoot: 0.411000
PolicyTemperature: 1.100000
SmartPruningFactor: 1.600000
Lc0 values Tests.pgn

glbchess64

unread,
Mar 4, 2020, 6:46:01 AM3/4/20
to LCZero
  1. The TC give no information if you don't give the graphic card. The relevant information in fact is the average nodes per move (it can be estimated from GPU and TC but the best is to record it in the PGN, cutechess-CLI do the job). So it is not possible to understand what these tests mean.
  2. There is not enough game to be sure that these settings are better than kiudee one. If you have tested many settings on few games this is normal that some of them seems to be better than kiudee : it is a consequence of statistical fluctuations, giving just that ones results likely in a strong selection bias. So even with the right meaning the results are likely unreliable. (on 200 games, same openings, 1s per move, RTX 2060, repeating the test 5 times I had often 40 elo variations, some times more, you can imagine the variations on 60 games).

    Now some secondary points :

  3. kiudee setting is a priori far from optimal for 42850 or in general with T40 based nets (kiudee is designed for T58, T59, T60 it may be better than default for other nets but there is no reason that it is very good for them, training parameters are too different).
  4. Last T60, kiudee setting, at 1500 nodes per moves is close to the level of T40, default setting, at 3000 nodes per move (about equal TC). So if you have a RTX card and play enough game at 1s per move you will have at least x5 nodes per move and I don't think that even with the best setting T40 could be a serious opponent for T60. May be SV-20x256-T40-1541 that is about 25 elo stronger than 42850 is still a challenger at 1s per move but I am not sure.

NuclearPawn

unread,
Mar 4, 2020, 4:33:50 PM3/4/20
to LCZero
Hi glbchess64

1. My GPU is NVIDIA GeForce GTX 1080 4gb. The nodes per second vary between 4K - 24K depending on position. At the opening phase I have observed the speed to be between 6K-13K. In highly tactical positions I have observed the lowest node counts even down to 2K in some instances. I know my graphic card might not be the best, but I think is decent enough.

2. I agree. Not enough games, and that is why I'm asking for help from testers willing to try those settings so we can build a larger game data to either support my claim or discard it. But it all comes down to willing testers.

3. If Kiudee is especially good for T60s then my settings convincingly beat the Kiudee settings with the same net (we still fall to your second point though)

4. I don't have RTX so maybe some willing testers can help me out.

Thank you for your valid points :)

Jesse Jordache

unread,
Mar 8, 2020, 9:03:14 AM3/8/20
to LCZero
It's true - exploring hyperparameters in play - also in the training phase - is sorely needed.

Stefan Albrecht

unread,
Mar 11, 2020, 5:41:37 PM3/11/20
to LCZero
Hello,

if somebody is already testing the new params please let us know. No need to test it double. If not,
I am curious to test them with 1.000 games at TC 1+1 to reduce the statistical fluctuations.

My idea is a gauntlet with the latest SF dev vs the latest Leela T60 net with 5 settings:

- former default (baseline)
- Kiudee
- NuclearPawn Config. 1
- NuclearPawn Config. 2
- Kayra2

est. duration: 15 days (5000 games at TC 1+1), Opening Book: Balsa 5 move with 500 positions,
SF on i7 8700 - 12 threads / Lc0 on RTX 2060; Lc0 Ratio about 1 (with a 20x256 net)., Arena GUI

Does that sound good / any suggestions before I start it?

Greets, Stefan

NuclearPawn

unread,
Mar 12, 2020, 2:38:27 AM3/12/20
to LCZero
That would be great Stephan!
I am currently testing long TCs, and please feel free to test 1+1. As I have mentioned my bullet tests weren't of high quantity, but from as much as I tested the new parameters, especially Parameters 1 seem very strong and more balanced results.

Please let us know of the results.
Cheers,
Jesse

mehmet1921

unread,
Mar 12, 2020, 3:30:55 AM3/12/20
to LCZero
This test is a good idea. I think one engine (with new settings) per author is enough.
Message has been deleted

Stefan Pohl

unread,
Mar 12, 2020, 9:22:58 AM3/12/20
to LCZero
With that short timecontrols, I recommend a 20x256 Net, not a (bigger) T60 Net. Because more calculated nodes per played moves are better with that short timecontrol, because you get a bigger search tree and the main changes of that parameter-settings are modifications of the search(-tree). So, the paramter-changes should have more "impact" on the results.
Leelenstein 13 (is free) woud be a good choice. Or the latest T40 net (42850). Or the strong S.Vieri T40-1541 Net.

Stefan Albrecht

unread,
Mar 12, 2020, 1:40:58 PM3/12/20
to LCZero
Hello Stefan,

thanks for your input, I totally agree with your arguments for a larger search-tree = smaller net size,
However, I read in this forum more than one time that the Kiudee parameters are designed specially for T60 (and maybe useful for T58 & T59)  and are worse for using it with older nets...
Additionally the params from NuclearPawn are also designed/tested with the t60 net.
What is your experience with your NN-test? Are the Kiudee parameters with other testruns/nets better than the former standard ones?

Stefan Pohl

unread,
Mar 12, 2020, 2:39:56 PM3/12/20
to LCZero
I did one testrun with LS 12.2 (20x256)  with old default parameters and one testrun with Kiudee Settings. +31 Elo. Kiudee works fine with 20x256 nets.

4  Lc0 0.23.1k LS 12.2 (20x256)   : 3593 300 (+166,=  0,-134), 55.3 %
7  Lc0 0.23.1 LS 12.2 (20x256)    : 3562 300 (+153,=  0,-147), 51.0 %

Stefan Albrecht

unread,
Mar 12, 2020, 5:00:05 PM3/12/20
to LCZero
I think we can kill two birds with one stone, using the latest T59 net instead of T60.
First it will provide double speed compared to a 20x256 net and 4 times speed compared to T60,
second it's use is recommended with Kiudee parameters.

Stefan, in yout tests t59 is about 60 elo behind t40. Then probably SF9 would be the right opponent for t59.

NuclearPawn, Mehmet and Stefan, what do you think?

NuclearPawn

unread,
Mar 12, 2020, 5:42:30 PM3/12/20
to LCZero
I have never tested T59 so I am not familiar with it. Go for it whatever it is best. I'm mostly testing it with the T60 nets.

Stefan Albrecht

unread,
Mar 12, 2020, 6:52:29 PM3/12/20
to LCZero
Supplemental:
note that t59 is the little sister of t60. T60 is actually trained with the params of t59. So parameter changes should have a similar effect to both nets with the same amount of nodes/move.

Stefan Pohl

unread,
Mar 13, 2020, 12:58:52 AM3/13/20
to LCZero
SF 9? Perhaps a little too weak?!? Hard to say. Perhaps a dev-version between SF 9 and SF 10 would be better?!?
Message has been deleted

mehmet1921

unread,
Mar 13, 2020, 1:32:48 AM3/13/20
to LCZero
Maybe Sfish 10 or Sfish 11 but with less cores is suitable.

Stefan Albrecht

unread,
Mar 15, 2020, 3:53:13 PM3/15/20
to LCZero
After some short tests it turns out that t59 is very strong in bullet and SF dev is a level opponent.
So I started the testing Lc0 v24.0 ID 591226 on RTX 2060 vs SF 090320 on 12 threads (6 cores) @ 3.1 ghz on Saturday.

The first parameters I am actually testing are Kiudee to set a baseline.
So far this match is very level. After 500 games Lc0 has +1 elo point.

My schedule is

2. NuclearPawn config. 1
3. Kayra 2
4. NuclearPawn config. 2
5. Former standard

I will let you know when I have news :-)

Tony Mars Rover

unread,
Mar 15, 2020, 11:05:28 PM3/15/20
to LCZero
Hi! Interesting discussion! I was gonna say that, in my opinion, the best possible opponent in this case should be SF9-4 CPU or SF10-4 CPU, with book and 6-man EGTB, and similar GPU config! My math may be off, but I believe we would get a clearer picture of the actual playing strength of Lc0 24.x w/ 59xxxx, with book and 6-man EGTB, if we tested both engines running on 4 CPU HW platforms - everything else being kinda-sorta equal.

Stefan Albrecht

unread,
Mar 16, 2020, 6:03:37 PM3/16/20
to LCZero
Hello Tony,

the objective here is to find out if the suggested parameters from NuclearPawn and Kayra2 are better than the standard ones (Kiudee).
Therefore it is good when the baseline test, here Kiudee's parameters, is about level with its opponent.

If the aim of this test would be to compare both engines on the same HW, you are right.

Greets Stefan 

Stefan Albrecht

unread,
Mar 17, 2020, 4:18:35 PM3/17/20
to LCZero
The baseline is set.
I am impressed how strong t59 is in bullet.

Conditions:
1000 games at TC 1+1, Opening Book: Balsa 5 move with 500 positions, Fritz 16 GUI,
SF 090320 on i7 8700 - 12 threads/6 cores @3.1 ghz (turbo boost off) vs Lc0 v0.24 on RTX 2060; 6men tb

Kiudee's parameters scored
+123/=755/-122 50.05%  500.5/1000 +0 Elo (+-11 Elo)

Next test will be NuclearPawn Config. 1...

Stefan Albrecht

unread,
Mar 21, 2020, 3:29:05 PM3/21/20
to LCZero
Here are the updated test results, now with NuclearPawns Configuration 1:

Kiudee's parameters:  +0 Elo (+-11 Elo)   +123/=755/-122 50.05%  500.5/1000

NuclearPawn Config1: -5 Elo (+-11 Elo)    +104/=778/-118 49.30%  493.0/1000

Next test will be Kayra2 parameters...good luck!


Stefan Albrecht

unread,
Mar 24, 2020, 8:21:33 PM3/24/20
to LCZero
Update Kayra2 parameter:

Kiudee's parameters:  +0 Elo (+-11 Elo)   +123/=755/-122 50.05%  500.5/1000
Kayra2 parameter:      -1 Elo (+-11 Elo)    +120/=756/-124 49.80%  498.0/1000
NuclearPawn Config1: -5 Elo (+-11 Elo)    +104/=778/-118 49.30%  493.0/1000

....so we are still searching for better parameter than Kiudee's...

Nest test will be NuclearPawn Config. 2... good luck!

Stefan Pohl

unread,
Mar 25, 2020, 8:57:38 AM3/25/20
to LCZero
Very interesting. Keep up !
Message has been deleted

Stefan Albrecht

unread,
Mar 26, 2020, 4:21:18 PM3/26/20
to LCZero
Update NuclearPawn Config. 2:


Kiudee's parameters:  +0 Elo (+-11 Elo)   +123/=755/-122 50.05%  500.5/1000
Kayra2 parameter:      -1 Elo (+-11 Elo)    +120/=756/-124 49.80%  498.0/1000
NuclearPawn Config 1: -5 Elo (+-11 Elo)    +104/=778/-118 49.30%  493.0/1000
NuclearPawn Config 2: -28 Elo (+-15 Elo)    +52/=401/-97 45.91%  252.5/550;   Fritz16 GUI stats: 99.7% --> [-70, -12] Elo

I stopped the match after 550 games because it is very, very unlikely that the NP2 setting can be superior to Kiudees parameters according to EloStat and Fritz.

At the end of the day the three tested parameter-settings seem to be not superior to Kiudee's ones. The search goes on...
If you have new promising settings you like to be tested, let me know.

Next test will be the former standard parameters, to compare them with Kiudee's under this test conditions.

Stefan Pohl

unread,
Mar 27, 2020, 8:23:10 AM3/27/20
to LCZero
Laskos on talkchess had good results with Kiudee plus CPuct (and CPuctAtRoot for lc0 0.24.1) = 1.9 for short thinking time. Perhaps you could try this out?!

NuclearPawn

unread,
Mar 27, 2020, 10:38:08 AM3/27/20
to LCZero
I would suggest you try my 1st parameters against Kiudee and see the difference.

Stefan Albrecht

unread,
Mar 30, 2020, 3:37:18 PM3/30/20
to LCZero
Update former standard parameters:


Kiudee's parameters:  +0 Elo (+-11 Elo)    +123/=755/-122 50.05%  500.5/1000
Former standard:         -16 Elo (+-11 Elo)  +111/=732/-157 47.70%  477.0/1000

Kayra2 parameter:      -1 Elo (+-11 Elo)     +120/=756/-124 49.80%  498.0/1000
NuclearPawn Config 1: -5 Elo (+-11 Elo)    +104/=778/-118 49.30%  493.0/1000
NuclearPawn Config 2: -28 Elo (+-15 Elo)     +52/=401/-97 45.91%  252.5/550;   Fritz16 GUI stats: 99.7% --> [-70, -12] Elo

@NuclearPawn: To be honest with you I don't like Leela vs Leela matches that much. IMO even if a parameter setting would be superior against another one in a Leela vs Leela competition, what is it worth if it is worse in Leela vs other engine competitions?

Next test will be Stefans suggestion with Kiudee + CPuct and CPuctAtRoot = 1.9 from "Laskos on talkchess"...

Stefan Albrecht

unread,
Apr 3, 2020, 3:31:22 PM4/3/20
to LCZero
Update Laskos parameters (Kiudee + CPuct and CPuctAtRoot = 1.9):



Kiudee's parameters:  +0 Elo (+-11 Elo)    +123/=755/-122 50.05%  500.5/1000
Former standard:         -16 Elo (+-11 Elo)  +111/=732/-157 47.70%  477.0/1000

Kayra2 parameter:      -1 Elo (+-11 Elo)     +120/=756/-124 49.80%  498.0/1000
Laskos parameter:   -2 Elo (+-11 Elo)     +120/=754/-126 49.70%  497.0/1000

NuclearPawn Config 1: -5 Elo (+-11 Elo)    +104/=778/-118 49.30%  493.0/1000
NuclearPawn Config 2: -28 Elo (+-15 Elo)  +52/=401/-97 45.91%  252.5/550;   Fritz16 GUI stats: 99.7% --> [-70, -12] Elo

@Stefan Pohl: It seems that on my system laskos settings are not superior to kiudees at 1+1.
My next test is maybe also interesting for you:
I will test former standard params vs Kiudee's with ID 42850. So we will find out if Kiudee's parameter are really worse for T40...

Stefan Pohl

unread,
Apr 4, 2020, 6:09:58 AM4/4/20
to LCZero
Interesting, thank you. Especially, because a small testrun by me on CPU (so Lc0 was much slower) gave a measureable Elo gain:

4 different lc0 0.24.1 CPU (dnll version) nets/versions running singlethread, 2'+2'', 5moves human openings (no Armageddon), 1080 games,
CP 1.9 means, CPuct=1.9 and CPuctAtRoot=1.9
 
     Program                       Elo    +    -   Games   Score   Av.Op.  Draws
   1 Lc0 0.24.1 591226 CP 1.9    : 3259   16   16   540    60.7 %   3180   48.1 %
   2 Lc0 0.24.1 591226           : 3228   16   16   540    55.0 %   3191   53.7 %
   3 Lc0 0.24.1 700953           : 3214   16   16   540    52.5 %   3195   50.2 %
   4 Lc0 0.24.1 J64-210          : 3099   17   17   540    31.8 %   3234   40.2 %

Stefan Albrecht

unread,
Apr 4, 2020, 8:49:08 AM4/4/20
to LCZero
Interesting, Stefan. So Laskos parameter seems to be a good choice for cpu-only users. For (modern) gpu-users Kiudee is probably still the best setting.

Stefan Albrecht

unread,
Apr 14, 2020, 3:27:21 PM4/14/20
to LCZero
Update ID 42850 former standard vs Kiudee's parameter:

Former standard: +6 Elo (+-11 Elo)     +113/=791/-96  50.85%  508.5/1000
Kiudee's params:+14 Elo (+-11 Elo)   +151/=739/-110 52.05%  520.5/1000

The results are too close to make a definitive statement which setting is better for t40.
However, it seems that Kiudee's parameter are also working well with ID 42850 at short tc.

Greets, Stefan
Reply all
Reply to author
Forward
0 new messages