Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

confidence intervals for rollouts

0 views
Skip to first unread message

Gerry Tesauro

unread,
Feb 2, 1994, 2:32:45 PM2/2/94
to
The following article is by Stig Eide; I'm posting it for him
because he doesn't have posting priveleges himself.

--------------------------------------------------------------------

CAN WE TRUST THE ROLLOUTS?

In a time with an increasing number of backgammonprograms which plays
a decent game, we have got a powerful tool: The rollout feature.
I want to present a statistical tool that should follow any rollout:
The confidence interval.
What is a confidence interval? After you have performed a rollout,
you'll have an estimate of the probability of a 'success'. This can be
winning or losing. It doesn't matter. The confidence interval is
an interval with the estimate in the centre, and you'll know how sure
you can be that the probability is inside that interval.

The formula:
z*sqr(y/n*(1-y/n)/n)=a

The variables:
n is the number of rollouts.
y is the number of 'successes' that occured during the rollout.
y/n is the estimated probability that a 'success' occures.
a is the deviance from the estimated probability y/n.
The confidence interval is (y/n-a,y/n+a).
z is chosen in order to tell the reliability of the confidence interval.
You can choose z to be any real number, and get any confidence interval
you want, but here is the 3 most used z's and their respective confidence
intervals:

z=1.96 gives you a 95% confidence interval
z=2.17 gives you a 97% confidence interval
z=3 gives you a 99.74% confidence interval

So, if you choose z to be 1.96, then you can be 95% sure that the
probability of a success is between y/n-a and y/n+a.

EXAMPLE:
You have performed 4000 rollouts of a position that occured during
a game. The computer tells you that if he had played both you and your
opponent, you would have won 3037 of those 6000 games (75.925%). You want
to make a confidence interval that is 97% reliable. The variables:

z is 2.17
y is 3037
n is 4000
a = z*sqr(y/n*(1-y/n)/n) = 2.17*sqr(3037/4000*(1-3037/4000)/4000) = 0.0128

The 97% confidence interval is now (0.75925-0.0128,0.75925+0.0128) or
(0.746,0.772). This means that you can be 97% sure that the chance of
winning the position is between 74.6% and 77.2%. If you want to claim
that this position is either a drop or a take, you have to perform a
new rollout, with more than 4000 rollouts, because that will narrow down
the confidence interval (give you a smaller a).

Stig Eide (stig...@avh.unit.no)

0 new messages