Rollouts

Andrew Bokelman

unread,

Oct 9, 1998, 3:00:00 AM10/9/98

to

I saw a questionable move by Snowie so I rolled it out. The rollout
found that it was not the best move. But I only rolled 100 games. Is this
too few to get me into the ballpark?

Hank Youngerman

unread,

Oct 10, 1998, 3:00:00 AM10/10/98

to

I'm not specifically familiar with Snowie, but I can answer that
question in general:

1. The "standard" rollout is 1296 trials, rotating the first roll for
each side. Prior to computers, the standard rollout was smaller, of
course. On a computer I wouldn't think of doing a rollout that was
not 1296, or some mulitple thereof.

2. If you are in a position that is truly 50-50 for either side to
win, and you roll it out 100 times, the odds are about 5% that the
rollout will show one side or the other winning at least 60 times. So
statistically, 100 is not really enough. Actually, 1296 isn't enough
if the two checker plays are really close, say, within .03 equity of
each other (I'm approximating here, not doing any statistical
calculations). .03 in equity equates to 1.5% winning chances, or
about a difference of 20 games in 1296. (I'm simplifying of course,
ignoring the cube, gammons, etc.) It's quite possible for the
inferior move to be 20 wins luckier than the superior one over a
sample of 1296 games, even with the first-roll control.

Is there a reason you did only 100 games? Is Snowie slow?

Andrew Bokelman

unread,

Oct 11, 1998, 3:00:00 AM10/11/98

to

Chuck,

Thanks for your offer to go over this in such detail. I went back and looked
at how Snowie was set to roll out and realized that I cannot give enough
information. Since doing the rollout I fiddled with the settings and I'm not
sure where they were before. The only thing I can say, which probably
doesn't help, is I took the out of the box settings, made sure it was
3-ply, switched it to cubeful, and there I went.

So I understand this is not enough for you to tell me how many games to roll.

But let me ask you this. Given the position below, how would you roll out
what Snowie listed as the top three choices.

--------------------------------------------------------------------
| AndrewB (X) vs. Snowie (O) |
| 9 point Match |
--------------------------------------------------------------------

Match to 9. Score X-O: 3-7

-------------------------- Move 16 O -------------------------
O to play (5 1)
+-1--2--3--4--5--6--------7--8--9-10-11-12-+
| O X O O O | | O O O |
| X O O O | | O O O |
| X | | O | S
| | | | n
| | | | o
| |BAR| | w
| 6 | O | | i
| X X | | | e
| X X | | |
| X X X | | |
| X X X | | |
+24-23-22-21-20-19-------18-17-16-15-14-13-+
Pipcount X: 88 O: 113 X-O: 3-7/9 (6)
CubeValue: 1

* 1. 2 bar/20 7/6 1.028
2. 2 bar/20 9/8 0.953 (-0.076)
3. 2 bar/19 0.943 (-0.086)
4. 1 bar/20 8/7 0.872 (-0.157)
5. 1 bar/20 6/5 0.839 (-0.189)

Chuck Bower

unread,

Oct 12, 1998, 3:00:00 AM10/12/98

to

In article <uPRFb2$89GA...@ntawwabp.compuserve.com>,
Andrew Bokelman <73457...@CompuServe.COM> wrote:

>I saw a questionable move by Snowie so I rolled it out. The rollout
>found that it was not the best move. But I only rolled 100 games. Is this
>too few to get me into the ballpark?

The generic answer to this question is NEI (not enough information).
And there are two reasons:

1) The statistical power of a rollout depends on what kind of rollout was
performed. If truncation or 'variance reduction' type rollouts are used
then the statistical significance of the rollout is (much) higher than for
standard (e.g. hand) rollouts.

2) Regardless of the type of rollout, the statistical confidence of the
result depends upon BOTH the statistical power of the rollouts AND the
difference in equity between the rollout results. A mathematician might
comment "for arbitrarily small equity difference, an indefinitely large
number of rollouts will be required".

Bottom line, Andrew: give us the rollout details. Then we can tell
you whether or not you need to do more rollouts. "Details" means type
of rollout including level, cubeless or cubeful, "checker play according
to match score" switch on or off, equity of the various candidate plays,
"equiv. to" numbers, (and maybe more). Hey, you've got ahold of the wheel
of a Ferrari. Better make sure you know how to drive it!!

Chuck
bo...@bigbang.astro.indiana.edu
c_ray on FIBS

Chuck Bower

unread,

Oct 14, 1998, 3:00:00 AM10/14/98

to

In article <#TL2KiY9...@nih2naaa.prod2.compuserve.com>,
Andrew Bokelman <73457...@CompuServe.COM> wrote:

(snip)
>...Let me ask you this. Given the position below, how would you roll out

Good question. Snowie is such a new tool AND has so many options that
I (and probably a lot of other analysts) aren't sure what the best approach
is when doing rollouts. I just tonight got the new update (v1.1) loaded and
it is noticably faster (as advertised) so maybe my recommendation below will
be obsolete in a few days. Still, this is how I would approach the problem:

1) Run cubeless money rollout at 3-ply truncated (at 5 rolls) and 144 trials
for candidates 1, 2, 3.

2) Run cubeFUL, "play according to match score" rollouts for play 1 and one
of the other (2 or 3). 3888 trials (and expect this to run for quite a
while)!

BTW, I checked with the dwarve's houseguest and it showed that the
numbers shown above are 2-ply lookahead. At 3-ply lookahead a couple fresh
things are worth noting:

a) although the order of the plays didn't change, the differnces between
candidates 1 and 2 was only 0.004 and between 1 and 3 only 0.015.

b) SW saw 2% fewer wins with candidate 1, BUT considerably more gammons,
which is surely the reason for it's choice of plays.

I'd be surprised if voluntarily breaking the prime in this position
is correct, and I'll be surprised if your rollouts keep this same order.
Then again, I'm often surprised. ;)

Michael J. Zehr

unread,

Oct 14, 1998, 3:00:00 AM10/14/98

to

In article <#TL2KiY9...@nih2naaa.prod2.compuserve.com>,
Andrew Bokelman <73457...@CompuServe.COM> wrote:

>Chuck,
>
>Thanks for your offer to go over this in such detail. I went back and looked
>at how Snowie was set to roll out and realized that I cannot give enough
>information. Since doing the rollout I fiddled with the settings and I'm not
>sure where they were before. The only thing I can say, which probably
>doesn't help, is I took the out of the box settings, made sure it was
>3-ply, switched it to cubeful, and there I went.
>
>So I understand this is not enough for you to tell me how many games to roll.
>

>But let me ask you this. Given the position below, how would you roll out

The first thing I always do in analyzing a position is to get the best
evaluation possible with the tool, and only then decide if it needs
rollouts. If I had a second tool in my toolbox (Jellyfish) then I
usually use that as well:

S 3-ply J L7
1. bar/20 7/6 .991 .862
2. bar/20 9/8 .989 .883
3. bar/19 .976 .888
4. bar/20 6/5 .911 .744
5. bar/20 8/7 .888 .800
6. bar/20 5/4 .873 .796

Most of the time I'd call these differences too small to be worth doing
a rollout. However this position is different because there are two
very different strategies: come home safely and break points eventually;
or voluntarily break the prime and try to force X off the anchor, hoping
to close out 1-4 checkers eventually. (In other words, you gave us a
great sample position!)

This might make a good reference position for these strategies since the
evaluations are close. (The rollouts might show huge differences
however.) But we have to be aware that rollouts are not always the best
tool for such analysis. If one strategy is better than another and the
neural nets evaluate it incorrectly, the best we can do is force a
neural net to make the desired play on the first move. For example,
jellyfish might remake the 7point on the next play when we roll out
bar/20
7/6, and Snowie might break the 7point next turn when we roll out bar/20
9/8.

Next step in the analysis would be to pick one play from each of the
candidate strategies. Later I might want to compare the choices for
each strategy, but I can do that after determining which strategy I want
to examine more closely. Just eyeballing the plays I'd choose bar/20
9/8 and bar/20 7/6. (If "eyeballing" isn't a good enough justification,
consider these reasons: 9/8 prepares to eventually break the 9 without
having to break a point with a 6, so that builder is in a better
position on the 8 than the 9. That's why I'd prefer that to bar/19.
Either one leaves one slightly awkward roll (55 or 66). For the other
choice, I wouldn't break the 5 or 6 because I want those for when I hit
X. Given that I don't want to break the 5 or 6 but I'm willing to break
an outside point, 7/6 gives me an extra builder. Furthermore 7/6 gives
a chance to hit a 4th blot on a 42, whereas 51 doesn't expose a blot
after 8/7. After 7/6 X can't get a blot to safety on a
44, but after 8/7 he can on a 55. (It might not be right to run a blot
all the way, but unless I'm positive it's wrong, why give X that
choice?))

As part of analyzing this I note that the 2-ply and 3-ply evaluations
show a big change. Because the ordring of plays is the same a 2-ply
rollout is probably okay, but I make a mental note to consider a 3-ply
rollout later. And since I'm still in the early stages of deciding how
to do this I'll do some short rollouts first before doing the longer
ones. (I agree with Chuck's statements about 1296 being a minimum
length rollout for any serious analysis. But I believe the real art and
science of doing rollouts is interpreting the results afterwards.
Anyone can hit a problem over the head with a 1296-pound hammer, but
knowing whether you have data or information (i.e. whether the results
are only precise or if they're accurate) is much more difficult and
requires a lot of experience.

I'll start with short (216 game) rollouts just to get a feel for whether
I'm getting anything useful out of the computer before running an
overnight job:

JF L6 pure: .950 (0.5 18.3 88.3 11.7 0.3 0.0) std dev .014
JF L6 impure: .915 (0.4 20.1 85.8 14.2 0.5 0.0) std dev .016

If I were using only Jellyfish I'd have to do my own analysis of how
the cube changes things (since JF doesn't have true live cube
rollouts). With the pure play JF says O is too good to double, but no
longer has a double after 7/6 and X rolling a 4. Since after the pure
play the equity will change very slowly, O is likely to be able to cash
a fair number of the 11.7% losses. But after the impure play, many of
the 14.2% losses come after X rolls an immediate 4, and now it's X that
gets benefit of the cube. So I would expect to see the pure play be
that much more ahead of the impure play with the cube live.

Since the difference in equity between the two plays is .35, which is
greater than the sum of the std dev (.30), I don't expect to see the
order of the plays change after longer rollouts (Chuck could tell you
exactly how likely it would be for the order to change), but I'll do a
JF 1296-game rollout anyway.

Meanwhile I'll check the short Snowie rollout (2-ply, large search
space):

pure: 1.069 (1.4 20.9 87.4 12.6 0.9 0.0)
impure: 1.073 (2.2 23.4 86.0 14.0 1.6 0.0)

Snowie agrees with Jellyfish that O is too good to double before rolling
the 51, but thinks O has a double/pass after the impure play followed by
X rolling any 4 but 44 or 42 (no double after 44, too good after 42).
This means we can expect the cubeful results to be quite different from
the cubeless ones.

Back to the JF results of 1296 games:
pure: .923 (0.4 17.6 87.4 12.6 0.5 0.0 std dev .006)
impure: .907 (0.5 19.5 85.7 14.3 0.8 0.0 std dev .006)

The difference in the plays is still only a little bit greater than the
sum of the standard deviations, so the chances that an even longer
rollout would change the order haven't gone down much.

Before we do level 5 rollouts with settlements we need to do some
thinking with the cube to set the settlement limits, which might be
different for each side, complicating things. But you can also do an L5
rollout and check the cubeless results against the L6. If they disagree
then no matter what you set the settlement limit the results are going
to be suspect.

JF L5, 1296 games:
pure: .903 (0.5 15.9 87.1 12.9 0.4 0.0) std dev .023
impure: .846 (0.5 15.8 84.5 15.5 0.7 0.0) std dev .025

From this we can conclude that Jellyfish level 5 plays the pure position
pretty well, but the impure position is harder to play. (That's not too
surprising -- the choices are much more complicated after breaking the
prime.)

Because we can't get a reliable result out of L5 for the impure play, it
isn't worth figuring out the correct settlement limits (other than as an
example for other problems where L5 plays well enough).

Before starting a long Snowie rollout we need to decide if 2-ply is
adequate or if we should be using 3-ply. We can check this by doing a
shorter 3-ply rollout (36 games) and comparing that to the 216 game
2-ply rollout:

pure: 1.056 (2.1 18.4 88.6 11.4 0.7 0.0)
impure: .997 (1.4 20.1 85.1 14.9 2.3 0.1)

This is a bit surprising because we expect the impure play to be harder
to play and hence gain for O when played 3-ply instead of 2-ply.
However that might just be the shorter rollout, or it might be something
else -- the correct play might be to play pure now and then break the 7
point next turn when the extra builder is placed better.

It would take a long time to get reliable 3-ply cubeful results, so the
next step is to do a 1296 game 2-ply cubeful rollout. Depending on the
results of that we might do more analysis at 3-ply.

Stay tuned for the next part....

Michael J. Zehr

Michael J Zehr

unread,

Oct 15, 1998, 3:00:00 AM10/15/98

to

In article <362557C5...@michaelz.com>,

Michael J. Zehr <mich...@michaelz.com> wrote:
>In article <#TL2KiY9...@nih2naaa.prod2.compuserve.com>,
>Andrew Bokelman <73457...@CompuServe.COM> wrote:
>>

>> [How would one analyze the following position?]

[Explanation of how to start, including 3-ply evaluation and short
rollouts at both 2-ply and 3-ply.]

>Stay tuned for the next part....

The next step is to look at the 2-ply cubeful results. (As described in
the first post, having a live cube is likely to make a big difference.
In general O can take risks to increase gammon chance provided O still
has a cash if the risk doesn't pay off. So with a live cube, O can make
a play that decreases cubeless equity but increases cubeful equity.)

Snowie 2-ply, 1296 games
pure: 89.85% match winning chances
impure: 89.93% match winning chnaces

Because the live cube increases variance, these values are too close to
be statistically significant. (By way of reference, Snowie has a value
of 89.1% for O winning 1 point (and of course 100 for winning 2 points),
so 89.65 is like winning 1 point and winning .75/10.9 = .07 of a second
point, for an equivalent equity of 1.07. 89.93 would be equivalent to
1.08. But standard deviation on 1296 games with a cube is greater than
.01.)

We have already seen that the 2-ply and 3-ply evaluations are quite
different, so the picture might change if we do 3-ply rollouts:

Snowie 3-ply, 1296 games
pure: 89.85% mwc
impure: 90.06% mwc

Now we have equivalent equities of 1.07 and 1.09, but that isn't
statiscially significant either.

The simple conclusion to draw is that there isn't much of a difference
between the strategies. However this is where one has to delve deeper
into computer results to ferret out what is really happening. The
strategies we're comparing are keeping the prime and playing purely
vs. trying to force X off the anchor to win a gammon.

If you look at how Snowie plays the position, it often breaks the 7point
on the second play. So we're not comparing two different strategies.
We're comparing trying to force X off the anchor this turn vs. waiting
until next turn. That's part of the explanation for why the plays are
so close in equity.

With computer rollouts, not only is it important to have a consistent
method for doing the rollouts, but it's also important to know how to
interpret the results. One of the real challenges in doing computer
rollouts is to convince the computer to try one strategy over another
and follow through on that strategy for enough moves that you can get a
valid comparison.

In a position like this if the computer's evaluation picks 7/6 over
9/8, then rolling out the 9/8 position only forces the computer to keep
the prime for one turn, and then it will break the prime anyway. On the
other hand if the computer's evaluation picks 9/8 over 7/6 then you can
get a better result. One third of the time the computer is forced into
playing on with a broken prime, though after a miss by X it might try to
remake the prime.

Five years ago Kent Goulding wrote that computers weren't going to
replace human analysis any time soon, and despite the huge advances in
neural net technology since then, it's still true. Computers can give
us lots of data, but it still takes humans to turn that into
information.

-Michael J. Zehr

Andrew Bokelman

unread,

Oct 16, 1998, 3:00:00 AM10/16/98

to

Thanks for the rollout suggestions. I've started doing them and I'll
report the results.

Andrew Bokelman

unread,

Oct 16, 1998, 3:00:00 AM10/16/98

to

>>>Most of the time I'd call these differences too small to be worth doing
a rollout. However this position is different because there are two
very different strategies: come home safely and break points eventually;
or voluntarily break the prime and try to force X off the anchor, hoping
to close out 1-4 checkers eventually.

This is what intrigued me about this position. Also the fact that I would
not have thought to apply the Snowie strategy in this position. But, as I
was sitting there behind the prime, I also realized that with my speed board
and a decent escape I could win.

As it turned out I did hit Snowie on the next roll and was ready to escape
at least that one checker. As I brought it around Snowie hit me back. And
when I entered from the bar I felt I was worse off then before because
one was sitting alone and ready for Snowie to attack. I was also faced with
breaking the stack of two and this would leave two checkers inside for Snowie
to attack.

As you point out rollouts don't tell everything. I learned how things
could get worse for me even though I hit and did not get closed out. So
something might be said testing a position by playing it several times to see
the ways you can end up 10 rolls later.

>>>If one strategy is better than another and the neural nets evaluate it
incorrectly, the best we can do is force a neural net to make the desired play
on the first move. For example, jellyfish might remake the 7point on the next
play when we roll out bar/20 7/6, and Snowie might break the 7point next turn
when we roll out bar/20 9/8.

Right. There is also the element of style-of-play. I started
introducing some of the preferred openers into my game on FIBS and
ended up loosing more. I think the reason is that I don't have all the other
skills that go along with dealing with what follows. So even if this is
the best move for Snowie to make, it might not be the best one for me to
make.

Anyway, I've read over your other comments and some of them have my head
swimming. I just recently found out about Jellyfish and Snowie and hope to
eventually understand more of the number crunching.