cPUCT=3.0 is +66 Elo

1,188 views
Skip to first unread message

Albert Silver

unread,
May 6, 2018, 2:38:46 PM5/6/18
to LCZero
Since Leela is already so strong in positional play, it seemed to me that working on anything that might improve the tactics would outweigh a small positional loss. As a result, I tested the opposite of what was introduced in v0.8. In the special cuDNN version of the engine, which has a default value of 1.7, I increased it quite a bit to 3.0. Using 1.7 with NN246, I had a rough rating performance of 2927 CCRL. In identical conditions, with cPUCT set to 3.0 it scored 2993. Both were tested at 10+0 and 100 games. I have not tested other values to see if this improves or worsens.

alvaro...@gmail.com

unread,
May 6, 2018, 2:51:21 PM5/6/18
to LCZero
What did you test the program against? Can you post the exact W/D/L counts? I can then try to assess if there is any statistical significance to what you measured.

Albert Silver

unread,
May 6, 2018, 3:16:44 PM5/6/18
to LCZero
Default:

lc0cdunn-246-disasterarea-10-SS, B  2018
1 The Lc0 chess engine.   +33/=35/-32 50.50% 50.5/100
2 DisasterArea-1.65w64   +32/=35/-33 49.50% 49.5/100

CPUCT=3.0


lc0cdunn-30-246-disasterarea-10-SS  2018
1 The Lc0 chess engine. +45/=30/-25 60.00% 60.0/100
2 DisasterArea-1.65w64 +25/=30/-45 40.00% 40.0/100

The Openings Suite is attached.
silversuite.pgn

FWCC1

unread,
May 6, 2018, 3:17:43 PM5/6/18
to LCZero
What GPU did you use

Albert Silver

unread,
May 6, 2018, 3:20:19 PM5/6/18
to LCZero
I have a GTX1060 6GB

On Sunday, May 6, 2018 at 4:17:43 PM UTC-3, FWCC1 wrote:
What GPU did you use

Jesse Jordache

unread,
May 6, 2018, 3:20:44 PM5/6/18
to LCZero
My theory about this, and I've posted it elsewhere so I apologize if I'm being repetitive, is that a low Puct is better for training, and a... not so low Puct might be better for a "tournament setting".


On Sunday, May 6, 2018 at 2:38:46 PM UTC-4, Albert Silver wrote:

alvaro...@gmail.com

unread,
May 6, 2018, 5:36:20 PM5/6/18
to LCZero
That’s a t-value around 1.65, which is not that great. That means that it could just be luck that 3.0 did better than the other value. Two hundred games is not nearly enough, especially if you don’t match the two versions against each other.

I would run 1000 games between the two versions, and then you might have enough statistical significance ( if the Elo difference is big enough).

Gian-Carlo Pascutto

unread,
May 7, 2018, 5:23:57 AM5/7/18
to lcz...@googlegroups.com
On 06-05-18 23:36, alvaro...@gmail.com wrote:
> especially if you don’t match the two versions against each other.

The default PUCT was optimized for the self-play settings and an SPRT
pass (with an estimated +93 Elo or so) for that.

I think that in the topic people observed that it didn't necessarily
seem better against other engines. So if you are aiming for a more solid
result, trying a self-match is probably not the best way to spend your time.

I haven't tried to do a tuning run for for PUCT against other opponents,
IMHO self-play is the most important until there's at least 500 extra
Elo :-)

--
GCP

alvaro...@gmail.com

unread,
May 7, 2018, 8:42:17 AM5/7/18
to LCZero
Let me remind people of an arithmetic fact that should inform this discussion.

If you have two programs, A and B, and you try to measure their relative performance by testing against a reference opponent R, you need 4 times as many games to get the same error bar in the measurement as if you had played games between A and B directly. This is because the variance of the Elo measurement is inversely proportional to the number of games played and because the variance of the difference of two independent random variables is the sum of the variances.

Variance of A -vs- B after k games =~ Constant / k

Variance of A -vs- R after 2k games =~ Constant / (2k)
Variance of B -vs- R after 2k games =~ Constant / (2k)
Variance of (A -vs- R) minus (B -vs- R) =~ Constant / k

In practice you can assume a factor even bigger than 4 if A and B are similar versions of the same program, because self-play exaggerates Elo differences (and this is a good thing when you are trying to determine which version is better).

I am pretty sure GCP knows this, but I suspect a lot of people in this group don't.

Albert Silver

unread,
May 8, 2018, 9:28:53 AM5/8/18
to LCZero
I ran a 254-game match between the two settings in lc0-cuDNN of 1.7 and 3.0:

lcocudnn-17-30test-20-large, Blitz 5m  2018

1 The Lc0 chess engine.Puct=3.0 +40 +57/=169/-28 55.71% 141.5/254
2 The Lc0 chess engine. Puct=1.7 -40 +28/=169/-57 44.29% 112.5/254

The games were 5m+0s. Puct 3.0 had a +40 Elo performance. The reason for the odd 254 number is that Fritz and Co. have a 255-game limit for engine tournaments.

Albert

alvaro...@gmail.com

unread,
May 8, 2018, 9:59:00 AM5/8/18
to LCZero
That’s much better. The t-value for that is about 3.5.

Gian-Carlo Pascutto

unread,
May 8, 2018, 10:08:20 AM5/8/18
to lcz...@googlegroups.com
On 08-05-18 15:28, Albert Silver wrote:
> I ran a 254-game match between the two settings in lc0-cuDNN of 1.7 and 3.0:
>
> lcocudnn-17-30test-20-large, Blitz 5m  2018
>
> 1The Lc0 chess engine.Puct=3.0+40+57/=169/-2855.71%141.5/254
> 2The Lc0 chess engine. Puct=1.7-40+28/=169/-5744.29%112.5/254
>
> The games were 5m+0s. Puct 3.0 had a +40 Elo performance. The reason for
> the odd 254 number is that Fritz and Co. have a 255-game limit for
> engine tournaments.
>

I did a sanity check here. I tested the leela-chess client (which
defaults puct=0.6, corresponding to 1.2 in the lc0-cuDNN version) versus
the same client with puct=1.5 (which matches your 3.0).

Time control (10s + 0.5s) so that it does about ~2000 visits on my
machine, fairly in line with the visits used for training.

It stopped it at +82 =12 -6 for the default settings. So I'm fairly
confident they are good...

I don't know why your results are so different, either it's the longer
time control or something with lc0-cuDNN.

--
GCP

Jesse Jordache

unread,
May 8, 2018, 11:04:26 AM5/8/18
to LCZero
They're linked I would think.  The cudNN versions traverse the tree much faster (3x faster in my case) that it makes for a big 'ol increase in time control.

Gian-Carlo Pascutto

unread,
May 8, 2018, 11:07:30 AM5/8/18
to lcz...@googlegroups.com
On 08-05-18 17:04, Jesse Jordache wrote:
> They're linked I would think.  The cudNN versions traverse the tree much
> faster (3x faster in my case) that it makes for a big 'ol increase in
> time control.

If the ideal PUCT is correlated with the time control, it indicates that
not the constant, but the growth curve should be adjusted. That is, the
term currently raises by a factor puct * sqrt(parent_visits). It seems
at least something that grows faster might be good, then.

--
GCP

Albert Silver

unread,
May 8, 2018, 12:34:10 PM5/8/18
to Gian-Carlo Pascutto, LCZero
I can run a default LCZv08 against the cuDNN version with 3.0 and see.

Albert


--
GCP

--
You received this message because you are subscribed to a topic in the Google Groups "LCZero" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lczero/0BLPr9xpA4U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lczero+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lczero/d9f24598-832e-8275-ef68-91ffa0355cf8%40sjeng.org.
For more options, visit https://groups.google.com/d/optout.

Jesse Jordache

unread,
May 8, 2018, 12:37:21 PM5/8/18
to LCZero
I understand.  I misread the OP - sorry for making you explain things to me like a halfwit: you're a very good sport for doing so.

Has that ever been tested, by the way?  Intuitively it seems a higher Puct would be more appropriate for longer times, maybe at log rate: certainly not linearly(!), but I can't back that statement up.

Albert Silver

unread,
May 8, 2018, 12:42:08 PM5/8/18
to Jesse Jordache, LCZero
It might also be linked to the cuDNN version specifically. I will test the default LC0v8 against itself with PUCT=1.5 and against the cuDNN with PUCT=3.0 to see if they get similar results, or there is a clear disparity. Same 100-game set, same openings, and same 5-minute TC.

Albert

--
You received this message because you are subscribed to a topic in the Google Groups "LCZero" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lczero/0BLPr9xpA4U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lczero+unsubscribe@googlegroups.com.

Pera Kojot

unread,
May 8, 2018, 12:54:08 PM5/8/18
to LCZero
Check 2 things. First you should run your games in LittleBlitzer so can observe what is average nps that LC0-cudnn has in one and the other PUCT case. What I've noticed is big slowdown on lower PUCT values.
Second check number of losses on time, or ideally filter out games with time losses especially since you are playing without increment.

Gian-Carlo Pascutto

unread,
May 8, 2018, 12:57:28 PM5/8/18
to lcz...@googlegroups.com
On 08-05-18 18:37, Jesse Jordache wrote:
> I understand.  I misread the OP - sorry for making you explain things to
> me like a halfwit: you're a very good sport for doing so.
>
> Has that ever been tested, by the way?  Intuitively it seems a higher
> Puct would be more appropriate for longer times, maybe at log rate:
> certainly not linearly(!), but I can't back that statement up.

The idea would not be to change the coefficient itself. The actual
formula for the UCT algorithm contains a growth factor, and in the
DeepMind version that is sqrt(x).

--
GCP

Albert Silver

unread,
May 8, 2018, 1:50:36 PM5/8/18
to Pera Kojot, LCZero
In 500 games there have not been any time losses.

--
You received this message because you are subscribed to a topic in the Google Groups "LCZero" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/lczero/0BLPr9xpA4U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to lczero+unsubscribe@googlegroups.com.

Kevin Kirkpatrick

unread,
May 8, 2018, 2:18:35 PM5/8/18
to LCZero
Statistics question:

If the hypothesis is "increasing PUCT from 1.7 to 3.0 improves ELO", and you have a "budget" of 500 games, wouldn't you get much better statistics from a regression test, e.g.:

100 games => PUCT 1.7
100 games => PUCT 2.0
100 games => PUCT 2.3
100 games => PUCT 2.6
100 games => PUCT 2.9

By "better statistics" I mean "lower chance of false-negative or false-positive"?  Yes, the individual five 100-game ELO measures will be less precise than the individual two 250-game ELO measures.  But wouldn't the overall regression among the 5 measures have more statistical weight (with respect to the hypothesis) than just two measures at the extremes?

Besides, sampling at just two values really only addresses the specific performance of 1.7 vs 3.0.  It could be 100% true that 3.0 is better than 1.7.  But if the actual model had peak performance at 2.7 (and steadily declined from there), any extrapolation beyond those two specific PUCT values would be wrong.

Disclaimer: I am 25+ years removed from any formal statistics study.  This is an appeal to intuition; and I'm more than happy for someone to set me straight if I'm way off the mark here.

Jesse Jordache

unread,
May 8, 2018, 5:12:40 PM5/8/18
to LCZero
A ha!  log base 2.  So it's like, already a feature.

Neat.
Reply all
Reply to author
Forward
0 new messages