Program equity question

Phill Skelton

unread,

Sep 20, 1996, 3:00:00 AM9/20/96

to

I have often seen equities from Jellyfish or TD-gammon thrown
around in discussions of positions, and in annotated matches.
I never see any mention of the uncertainties in these numbers.
eg: move 1 +0.457
move 2 +0.455

ergo move 1 is 'clearly' better. I have trouble believeing that
the estimates from a neural net are going to be accurate to
1 part in 1000. I wouldn't trust the numbers beyond the first two
digits to be honest, making both equiteis above the same, as
accurately as the bots can judge them.
Does anyone have any numbers about the uncertainties
for the various programs (like +/- 0.004 or whatever).

Also I seem to recall reading somewhere Kit saying that
TD-gammon has a systematic bias in it's equities, leading to
some wrong cube decisions. Does anyone have some hard numbers
on the various 'bots systematic errors?

Phill

Chuck Bower

unread,

Sep 25, 1996, 3:00:00 AM9/25/96

to

In article <32428A01...@sun.leeds.ac.uk>,

You mention systematic errors, which is clearly what you are asking for.
Before sidestepping the question, I think it might help to try and define
(and show the difference between) "systematic" and "random" uncertainty.
NOTE: "error" and "uncertainty" are taken as having equivalent meaning
here.

Random error results from a finite number of trials (rollouts) and
is proportional to 1/sqrt(N) where "N" is the number of trials. More
rollouts lead to smaller random errors. You can get the random error
to an arbitrarily small value if you have enough (computer) time. A
typical value for a random uncertainty for a "typical" backgammon position
is 0.014 (in equity units) for about 10,000 trials. (NOTE: one can
quickly calculate from this that 100 trials gives about 0.14 equity
random uncertainty since sqrt(10,000/100) = 10.)

Jellyfish rollout results DO have the uncertainty listed in the results
window. I have checked these on occasion with my own calculations and
found them to agree, so I trust them. Now this "random error" is actually
a "standard deviation", which is kind of abstract to a non-statistician,
but one can convert standard deviation to confidence level, which should
be more undertandable. For example, one standard deviation is equivalent
to 84% confidence. Two standard deviations is 98% confidence. Another
way to look at this is to ask the question: "If I do a HUGE (let's say
1,000,000,000,000) number of rollouts, what is the chance that the order
of the answer would be different?" (That is, what is the chance that the
"worse" play comes out better?) The answer is the confidence factor.

OK, since many of you are probably totally confused by all this jargon,
let's just look at an example. The following are REAL rollout results of
JFv1.0 level 5 cubeless rollouts:

Number of rollouts equity standard deviation

play A 10,368 0.134 0.013
play B 10,368 0.122 0.013

So, play A is better than play B by 0.012 units of cubeless equity. But,
is it really better? Well, first note that the difference is about one
standard deviation (actually, less, since we need to use SD = sqrt(0.013^2)
since BOTH rollouts have random error). Anyway, you can see that we are
about 84% confident that a HUGE number of rollouts would lead to play A
being better, but conversely, 16% confident that a HUGE number of rollouts
would lead to a reversal (that is B better than A). That is large enough
to require more rollouts. However, maybe that will be futile, since the
"systematic error" may be larger.

"Systematic error" is the uncertainty of a result based on INCORRECT PLAY
during the rollouts. For example, take a look at the following position:

------||654321
|| oo
||
||
|| x x
------||654321

X to play a single 1 (assume other half of roll already played). If every
time the rollout robot reached this position, it moved 5/4 instead of 3/2,
a "systematic bias" = "systematic error" would result. Of course if it is
equally likely that "o" would have a similar position (and the robot misplays
that side as well) then the errors would cancel out. This is very unlikely,
though.

Now, finally, to answer Phill's question: "what are the systematic
uncertainties for the robots' equity numbers?", I'd have to say "I don't
know!" In fact, it's safe to say that "NO ONE KNOWS" for sure. In order
to know, you'd have to compare the robot results with "PERFECT PLAY" and
unless someone is keeping a secret, this last creature has yet to be
discovered.

Still, people may have a "feel" for the uncertainties. For JFv2.01
level 7 evaluation and level 6 rollouts, I think of a difference of 0.04
and larger as "important". So I guess somehow I assign 0.04 equity units
to JF's systematic error. But I'm really only guessing. Maybe the
programmers (Frederick, Harald, and Gerry) have a better answer.

Chuck
bo...@bigbang.astro.indiana.edu

Phill Skelton

unread,

Sep 26, 1996, 3:00:00 AM9/26/96

to

Chuck Bower wrote:

> You mention systematic errors, which is clearly what you are asking
> for. Before sidestepping the question, I think it might help to try
> and define (and show the difference between) "systematic" and "random"
> uncertainty. NOTE: "error" and "uncertainty" are taken as having
> equivalent meaning here.

I was looking for both the random and systematic uncertainties.
I was going to ask what the standard deviation of the equity was, but
I figured that this would just confuse half the people reading this.

> Jellyfish rollout results DO have the uncertainty listed in the
> results window.

I don't have Jellyfish, so I didn't know this. Why oh why
do people never quote this when giving the equity for a position!?

>
> Now, finally, to answer Phill's question: "what are the
> systematic uncertainties for the robots' equity numbers?", I'd have to
> say "I don't know!" In fact, it's safe to say that "NO ONE KNOWS" for
> sure. In order to know, you'd have to compare the robot results with
> "PERFECT PLAY" and unless someone is keeping a secret, this last
> creature has yet to be discovered.
>
> Still, people may have a "feel" for the uncertainties. For
> JFv2.01 level 7 evaluation and level 6 rollouts, I think of a
> difference of 0.04 and larger as "important". So I guess somehow I
> assign 0.04 equity units to JF's systematic error. But I'm really
> only guessing. Maybe the programmers (Frederick, Harald, and Gerry)
> have a better answer.

Granted that all of this is true for rollouts. What about
the equities that I am assuming the programs use during a match.
I assume that they use their happy little networks to make some
estimate of the equity during a game (or rollout) so as to choose
which move to make or what doubling decision is correct. I figure
it has to do something like this or it would never be able to do
rollouts at all.

What are the uncertainties in these numbers? Random and
systematic are perhaps inaccurate descriptions for these, but are
good analogies.
Random : how far apart do two equities have to be before
you can say that one play is better than another?
(again, 1 standard deviation if poss.)

Systematic : If the program gives two equiteis for two plays,
say 0.45 and 0.42, and the difference between them
is significant (sigma=0.005 say), then how accurate
are these numbers in terms of the number of points
won and lost.

example: If you divided all the equities by two, then the program
would still continue to make all the correct moves, but when it
quotes an equity of -0.06, you would be losing 0.12 points/game.

ie: is there a systematic bias between the numbers that the network
produces when evaluating a board position (without a rollout) compared
with the equities produced by a rollout. (A program might under-
estimate the danger of a gammon or redouble in it's evaluation,
but these would be accounted for inevitably in a rollout).

I hope you can see what I am trying to ask.

Phill

Chuck Bower

unread,

Sep 26, 1996, 3:00:00 AM9/26/96

to

In article <324A4620...@sun.leeds.ac.uk>,
Phill Skelton <ph...@sun.leeds.ac.uk> wrote:
(snip)

> I was looking for both the random and systematic uncertainties.
>I was going to ask what the standard deviation of the equity was, but
>I figured that this would just confuse half the people reading this.

Well, you can't NOT confuse all of the people all of the time,
but maybe you can NOT confuse some of the people some of the time.
I figured YOU understood the meanings (you did use "systematic error")
but this is a newsgroup with lots of readers, and many may wonder about
the same things we wonder about, so I gave a not so brief intro.

(Bower wrote:)

>> Jellyfish rollout results DO have the uncertainty listed in the
>> results window.

(Skelton replied:)

> I don't have Jellyfish, so I didn't know this. Why oh why
>do people never quote this when giving the equity for a position!?

"Never" is a long time. Actually some of us do quote the standard
deviations for JF rollouts results. However, many of the JF quotes are
for evaluations only, which don't have a random error. As you have
pointed out, they do have systematic error, but we need to get Frederick
to estimate that for us. I'll try to e-mail him and get him to post.

(snip)
(then Skelton continues:)

> Granted that all of this is true for rollouts. What about
>the equities that I am assuming the programs use during a match.
>I assume that they use their happy little networks to make some
>estimate of the equity during a game (or rollout) so as to choose
>which move to make or what doubling decision is correct. I figure
>it has to do something like this or it would never be able to do
>rollouts at all.

You are right...

> What are the uncertainties in these numbers? Random and
>systematic are perhaps inaccurate descriptions for these, but are
>good analogies.
> Random : how far apart do two equities have to be before
> you can say that one play is better than another?
> (again, 1 standard deviation if poss.)
>
> Systematic : If the program gives two equiteis for two plays,
> say 0.45 and 0.42, and the difference between them
> is significant (sigma=0.005 say), then how accurate
> are these numbers in terms of the number of points
> won and lost.
>
>example: If you divided all the equities by two, then the program
>would still continue to make all the correct moves, but when it
>quotes an equity of -0.06, you would be losing 0.12 points/game.

I don't think I understand what you are saying here, but...

>ie: is there a systematic bias between the numbers that the network
>produces when evaluating a board position (without a rollout) compared
>with the equities produced by a rollout. (A program might under-
>estimate the danger of a gammon or redouble in it's evaluation,
>but these would be accounted for inevitably in a rollout).

Actually, I do have some limited data on comparing JF evaluations
(level 7) with JF rollouts (level 5 and level 6) for blitz positions.
For about 200 positions, I was surprised to find that the AVERAGE of
the evaluation equities differed from the AVERAGE of the rollout
equities by only 0.02. Now, part of this difference is statistical
and part is random (sounding like a broken record?). I didn't figure
out the statistical part (I can do so), but I think it is safe to say
that for blitz positions, the systematic error (between rollouts and
evaluation) is no more than 0.02 in equity units. PDG IMHO!

Chuck
bo...@bigbang.astro.indiana.edu

Chuck Bower

unread,

Oct 3, 1996, 3:00:00 AM10/3/96

to

>In article <324A4620...@sun.leeds.ac.uk>,

>Phill Skelton <ph...@sun.leeds.ac.uk> wrote:
> (snip)
>> I was looking for both the random and systematic uncertainties.
>>I was going to ask what the standard deviation of the equity was, but
>>I figured that this would just confuse half the people reading this.

(snip)

I wrote Fredrik and asked him your questions. Here is what he wrote back:

From: Fredrik Dahl <fred...@ifi.uio.no>
Date: Wed, 2 Oct 1996 08:33:03 +0200
To: "Chuck Bower" <bo...@bigbang.astro.indiana.edu>
Subject: Re: Systematic Uncertainty of Jellyfish

Saw parts fo the discussion on rgb.
The sytematic error is rather dependant on the type of positions.
In most middlegame or opening positions the evaluation error,
compared to an infinite level 5 rollout is less than 0.05 in cubeless eq
on level 6 and 7. Level 5 disagrees with level 6 evaluations by 0.023
on average for complete games.

Of course, the relative error for all levels is lower than the absolute,
so normally I consider errors of 0.02 to be meaningful on level 6 and 7.

Deep and well timed backgames are underestimated systematically,
sometimes even giving an absolute error of 0.2 for level 7.
(In fact, for those positions the level does not make a very big difference
in correcting the misevaluations, because it's rooted further down the road,
when the attacker starts leaving shots.)
Even here play vs play equities are usually ok.

Any rollout (with low enough sd) is better than any evaluation,
so even a truncated level 5 rollout is better than level 7 evaluation.

Level 6 rollouts have (obviously) less systematic error than level 5 ones,
but for most 'normal' positions the diff is less than 0.02.
In tricky positions where one side has all the hard plays the diff is bigger,
for example deep backgames are often improved by 0.05 for the backgame side.

Please note that all of this is based upon experience, so anyone who has
tested a lot can have a wellfounded oppinion.
Except the 0.023 average diff between level 5 and 6 evaluations;
that I have sampled.

All the best
Fredrik.