Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

How best to do Jellyfish rollouts?

2 views

Skip to first unread message

Farhan Malik

unread,

Apr 6, 1996, 3:00:00 AM4/6/96

I just bought the Jellyfish analyzer 2.0 and am trying to
figure out the best way to perform rollouts. Depending on how I set
the variables I get very different and even conflicting results. As
an example I used the opening position from Barclay Cooke's _Paradoxes
and Probabilities._ Red opened with 4 1 and played 13/9 6/5. White
now has 4 4 to play. I want to compare these two possible replies:

1) 24/20* 20/16* 13/9 13/9
2) 24/20* 20/16* 8/4 8/4

Move two seems superior to me but Jellyfish Level 7 verified
lookahead likes move one (.486 vs .461).

I rolled it out 36 times on level 6 and got these results:
Move 1 .516
Move 2 .413

I changed the random number seed and did another 36 rollouts:
Move 1 .451
Move 2 .559

Since these results are conflicting I assumed 36 games is not
enough to draw a conclusion. I rolled it out 106 times at Level 6
with a complete new seed:
Move 1 .439
Move 2 .530

I then tried 7776 truncated rollouts on Level 5 with Horizon 20:
Move 1 .484
Move 2 .491

I don't like the idea of truncated rollouts because they rely
heavily on JF's evaluation of the position. If it is incorrectly
evaluating the position then the results are not worth much. It doesn't
seem to evaluate backgames well and the above position easily turns into
one.

I'm new to this rollout business and am not making much out of
the above results. I'm also starting to think that JF rollouts are
way overrated. I studied the JF rollouts of Robertie's Advanced
Backgammon and I find Robertie's logic far more convincing than the
rollouts in the vast majority of the problems. I seriously doubt JF
is a better player than Robertie and it disagrees with him quite
often.

I would appreciate some advice from those more experienced
with rollouts as to how better utilize the program. What paramaters
work best for the above rollout? I thought move 2 would be better by
a wide margin but I could easily be wrong. What are the thoughts as
to whether JF rollouts are overrated? I often see posts where the JF
rollout is treated like the last word as to which move is best.

scriabin on FIBS

David Montgomery

unread,

Apr 7, 1996, 4:00:00 AM4/7/96

In article <4k6siv$2...@nyx10.cs.du.edu> fma...@nyx10.cs.du.edu (Farhan Malik) writes:
> I just bought the Jellyfish analyzer 2.0 and am trying to
>figure out the best way to perform rollouts. Depending on how I set
>the variables I get very different and even conflicting results.

[ rolling out 2 plays:
Play 1) 24/20/16* 13/9(2)
Play 2) 24/20/16* 8/4(2)
after an opening 4-1 played 13/9 6/5 ]

[ results so far:

Play 1) JF7 evaluation .486
Level 6 (36x) .516
Level 6 (36x) .451
Level 6 (106x) .439
Level 5 truncated (7776x) .484

Play 2) JF7 evaluation .461
Level 6 (36x) .413
Level 6 (36x) .559
Level 6 (106x) .530
Level 5 truncated (7776x) .491
]

> I don't like the idea of truncated rollouts because they rely
>heavily on JF's evaluation of the position. If it is incorrectly
>evaluating the position then the results are not worth much. It doesn't
>seem to evaluate backgames well and the above position easily turns into
>one.

Well, it's true that truncated rollouts rely on JF's evaluations,
but most of the time, and for most positions, this isn't much of
a problem. This is because the errors in JF's evaluation will
in large part cancel out -- sometimes the evaluation will be too
high and other times too low. And JF evaluations are really pretty
good. Better than human evaluations, anyway. Some error may remain
if the game tends to develop into positions in which there is some
bias in JF's evaluation. By itself, this usually isn't too much
of a problem, because most positions tend to branch out into a wide
variety of types of positions, and the positions which don't, and
for which JF's evaluations are off, are often positions that you
can't trust JF with anyway. If you review the rollouts of Robertie's
_Advanced_Backgammon_, you can get a good feeling for the amount
of error that typically arises from using truncated rollouts.

For the position in question, there should be very little trouble
with using truncated JF rollouts. JF understands opening checker
play very well, and the game is likely to evolve into a wide
variety of different kinds of positions, so there should be
relatively little bias due to truncation. I disagree that this
position will "easily" become a backgame. The first player should
generally be very much trying to avoid this scenario, and will usually
succeed. Certainly, with JF at the helm, this will very rarely
become a backgame.

The main advantages of truncated rollouts are two:
1) they are faster, and
2) they have lower variance. That is, they converge toward the
"infinite rollout" equity with fewer trials, on average.

Item two just means that you need fewer trials to get your
answer, so the advantage of truncated rollouts comes down
to just one thing, which is that they are faster.

The disadvantage of truncated rollouts is that sometimes they
are biased. This is less of a problem in a checker play rollout
(which is also when speed is more of a concern), but very
important for cube rollouts. But the more significant disadvantage
to truncated rollouts is that JF does not give you "live cube"
figures with truncated rollouts, which it does with non-truncated
rollouts. This is obviously a problem when you are rolling out
a cube action problem, but also a factor in many checker play
problems (see, for example, Jeremy Bagai's excellent article in
the Jan-Feb Inside Backgammon, or the solution to Inside Backgammon
quiz problem #110). For these reasons, I almost always do
complete rollouts, but truncated rollouts are not as suspect
as you think.

> I'm new to this rollout business and am not making much out of
>the above results. I'm also starting to think that JF rollouts are
>way overrated. I studied the JF rollouts of Robertie's Advanced
>Backgammon and I find Robertie's logic far more convincing than the
>rollouts in the vast majority of the problems.

Well, my guess is that you're overrating Robertie's logic. The
fact is, most interesting backgammon problems cannot be tackled
by logic. Over the board, we reason as best we can, but ultimately
we are just guessing based on our experience. Robertie recognizes
this himself. A few years back he sharply criticized a problem
solution by Kleinman (which was based on reasoning from general
principles), and backed up his criticism with (hand) rollouts.
Robertie wrote that backgammon was not "an exercise
in deductive logic" but rather, at least for correctly analyzing
positions, an exercise in empirical science. Rollout data is
exactly what is needed to determine the correct play, most
of the time.

The fact is that many of Robertie's solutions are after-the-fact.
Long propositions were played, and Robertie learned the result
and saved the position. In his book, he justifies the solution
based on logic or reasoning or breaking down the rolls or
emphasizing one very important feature of the position. In doing
this, he is showing the reader how one might approach the problem
over the board, which is exactly what you want to know to play
better backgammon. But the important thing to realize is that
the empirical data came first, and the reasoning to point you to
the correct play is derivative. Kit Woolsey has also often
emphasized this point, by saying how he has learned a lot from
trying to figure out rollout results which at first seemed unintuitive.

Now, as to whether JF rollouts are overrated -- I guess it depends
on the person and the position. JF rollouts are a tremendous source
of empirical data for a wide variety of positions. But they do
have their limitations. First of all, any rollout is subject to
statistical variation. So when results come out very close, there
is very good reason to be skeptical about the results' significance.
JF gives the standard deviations of the rollouts it performs, so
this can be a guide for that.

Secondly, any position can be misplayed. Putting aside for the
moment major thematic errors, small mistakes can be made favoring
one side or the other, and these small mistakes should add a little
more doubt to the significance of close results, even in positions
that we believe JF handles well.

Now, turning to the question of thematic errors, its well documented
that JF has a few of these. Here are the ones that come to mind
right now:

- JF gets low results with outside primes -- the further outside,
the more irrelevant the results. JF doesn't completely understand
how to walk a prime home against a single trapped checker.
- JF doesn't understand well how and when to try for a second checker
after a bearoff hit.
- JF gets high results in many backgames. However, I think this
bias has been overemphasized. In backgames nearing resolution,
where the timing issue has been resolved, as is the case in many
forward (e.g., 34 or 45) backgames, JF's results are not that
far off. In these cases, JF may give up a little due to having
to walk its prime home after a hit, but probably not much. In
deeper backgames, JF gives up more because capturing a second checker
may be a significant consideration. Also, JF doesn't always
understand when to split its rear checkers to generate more shots.
In positions where the timing issue is not yet resolved, or where
there is still significant forward equity, as in a two-way game,
JF *may* give up significant equity because it often will avoid
the backgame strategy that a human would choose. I emphasize may
because I think JF is often right in avoiding the backgame, and that
human players are often wrong about this. JF probably gives up
the most in well-timed deep backgames where the leader is still
a long ways from the bearin.
- JF can get weird results in noncontact positions. This problem
has probably been reduced by the bearoff database in JF2.0, but
JF still isn't the best tool for these kinds of positions.
- JF gets low results for many priming positions against one back,
even when the prime is deep in the board. This is especially
true when slotting the back of the prime is important. JF very
often doesn't do this when it is correct.
- Wilcox Snellings thinks JF gets high results vs deep anchor
games, especially vs ace point games. I don't know whether this
is true or not, but it's plausible. Part of the equity of acepoint
games comes from capturing a second checker after a late hit.
- JF can get results that are off in what I call "runaround" positions.
These are positions where one side is trying to navigate the last
few checkers around the opposition. An example is: side A has
4 checkers each on the 1, 2, and 3 points, and 1 checker each on
the 4, 17 and 18 points; side B has a closed board, and 1 checker each
on the 18, 19, and 20 points. JF doesn't count shots, so sometimes
it makes significant checker play errors when rolling these positions
out.
- JF gets low results in bearoff hit positions in which there is
a lot of play. For example,

X O O . X . | | . . . . X . [2]
O O | |
O O | |
O O | |
| |
X X | | X X X
. . O X O X | | X X X . X X

X's home board. O has 5 off

With O owning a 2-cube, X's equity is about 0.70. JF gets
.261 cubeless, .345 after doubling to 2 (3888 trials).
Interesting, humans tend to overrate the value of these
positions.
- many purely technical decisions are less amenable to rollouts,
whether by JF or humans. This is especially true if the technical
decision tends to repeat itself.
- because of the way JF uses the cube in live cube rollouts, sometimes
its cube numbers are way off. A common example is a position where
the trailer has a busted board, one checker back at the edge of a
five prime, and the leader has checkers back in the trailer's home
board. In this situation, the trailer may leap the prime and obtain
a double-in (which JF doesn't recognize), only to obtain a huge
cash one roll later. In general, if the trailer has only one common
recube variation, and this variation yields mostly weak doubles-in,
JF's live cube algorithm will not give accurate results.
- Another common live cube error ends up with the cube owner doing
*worse* owning the cube. Apparently when this happens JF has
erroneously played on for the gammon some of the time.

So yes, JF rollouts cannot be trusted implicitly. However, for
most positions JF rollouts are the best source for equities, and
considered carefully, the best tool for improving your game.

An interesting corrolary to the fact that JF misplays the above
situations, is that JF plays other types of positions *better*
than a human of overall equivalent strength would. This shows
up most prominently in the play of attacking positions, where
JF frequently gets results that are higher than humans get.

> I would appreciate some advice from those more experienced
>with rollouts as to how better utilize the program. What paramaters
>work best for the above rollout?

Here's my advice:
-Always do rollouts in multiples of 36 (unlike the 106 game rollout)
and in multiples of 1296 if doing level 5 rollouts.
-If you have time and a fast enough computer, do complete rollouts.
This way you avoid any bias and get the cube numbers as well.
-When doing checker play rollouts, set the seed the same for all
the plays under consideration.
-Don't regard checker play results that are within 2 standard
deviations as anything significant. If you don't want to bother
to look at the standard deviations, as a rule of thumb, consider
differences of .10 significant for rollouts of 1296, .07 for
2592, .06 for 3888.
-There are decreasing returns as you roll positions out more times.
You will reduce the standard deviation, but if the equities are
still close, the errors in checker play are probably more significant
than the random error. I usually don't roll plays out more than
3888 times.
-When rolling out checker plays, go ahead and roll out all those
plays that fit the themes of the position, even if you don't think
they are candidate plays. Occasionally one of these plays you
didn't like will actually turn out to be best, and you'll learn
something. If you're short on time, do small short truncated
rollouts first to identify the real candidates.
-Look at both the cube numbers and the cubeless numbers.
-If you really want to understand what's going on in a cube action
situation, rollout several variations of the position, so that
you can see how they affect the equity. Use the same seed for
all of these rollouts.
-DON'T just believe the rollout results as though they came
from on high. But try to understand the sorts of positions
where the results are off, and why, so that you can know
when you can trust JF and when you should be skeptical, and
the probable direction of the error.
-If you suspect that the rollout is biased, you can look at how
JF plays the first numbers, or set up a few important variations
to see how it plays those. You may find that with level 6 it
does a better job, in which case use that. If it still seems
to be playing the position wrong on level 6, use the interactive
rollout feature. One approach would be to play it 36x with
you playing one side, JF level 6 the other, and then another
36x with you playing the other side and JF level 6 the first.
If you're right that JF is screwing the position up (and you're
not), you'll see it in the results.
-Be careful about interpreting the rollout results for a
particular match score. JF does all its rollouts based on
choosing the best cubeless plays, with gammons and backgammons
counting (and counting equally for both sides), so the results
may not be valid in a match situation. For many match scores,
there is no satisfactory way to set the JF cashing parameter
to give a reasonable match live cube rollout, so you are better
off interpreting the cubeless numbers.

My experience is mostly with using JF level 5 rollouts, but it
may well be better to use JF level 6 by default. It certainly
plays better on level 6, and with JF's variance reduction algorithm,
its not a lot slower, effectively.

>scriabin on FIBS

Hope this is of some use to you,
David Montgomery
monty on FIBS

Kit Woolsey

unread,

Apr 7, 1996, 4:00:00 AM4/7/96

Farhan Malik (fma...@nyx10.cs.du.edu) wrote:
: I just bought the Jellyfish analyzer 2.0 and am trying to

: figure out the best way to perform rollouts. Depending on how I set

: the variables I get very different and even conflicting results. As

: an example I used the opening position from Barclay Cooke's _Paradoxes
: and Probabilities._ Red opened with 4 1 and played 13/9 6/5. White
: now has 4 4 to play. I want to compare these two possible replies:

: 1) 24/20* 20/16* 13/9 13/9
: 2) 24/20* 20/16* 8/4 8/4

: Move two seems superior to me but Jellyfish Level 7 verified
: lookahead likes move one (.486 vs .461).

: I rolled it out 36 times on level 6 and got these results:
: Move 1 .516
: Move 2 .413

: I changed the random number seed and did another 36 rollouts:
: Move 1 .451
: Move 2 .559

: Since these results are conflicting I assumed 36 games is not
: enough to draw a conclusion. I rolled it out 106 times at Level 6
: with a complete new seed:
: Move 1 .439
: Move 2 .530

: I then tried 7776 truncated rollouts on Level 5 with Horizon 20:
: Move 1 .484

: Move 2 .491

: I don't like the idea of truncated rollouts because they rely
: heavily on JF's evaluation of the position. If it is incorrectly
: evaluating the position then the results are not worth much. It doesn't
: seem to evaluate backgames well and the above position easily turns into
: one.

: I'm new to this rollout business and am not making much out of

: the above results. I'm also starting to think that JF rollouts are
: way overrated. I studied the JF rollouts of Robertie's Advanced
: Backgammon and I find Robertie's logic far more convincing than the

: rollouts in the vast majority of the problems. I seriously doubt JF

: is a better player than Robertie and it disagrees with him quite
: often.

: I would appreciate some advice from those more experienced
: with rollouts as to how better utilize the program. What parameters
: work best for the above rollout? I thought move 2 would be better by
: a wide margin but I could easily be wrong. What are the thoughts as
: to whether JF rollouts are overrated? I often see posts where the JF

: rollout is treated like the last word as to which move is best.

: scriabin on FIBS

As you are finding out, you can need a pretty big sample size to get
accurate results with a full rollout. The luck factor can be pretty
large. For example, when I first had access to a rollout program I tried
rolling out an opening 4-2 1296 times and got the startling result that
13/11, 13/9 was a bit better than 8/4, 6/4! As you might guess this was
way off the end of the bell curve and a longer rollout quickly set things
straight, but it does give an idea of how large a sample size one might
need to be comfortable with full rollout results.

Truncated rollouts have two advantages. First of all, they obviously
take much less time. Secondly, the luck factor is cut down
considerably. This is because you aren't dependent on the lucky rolls at
the end of the game which determine the winner -- that is factored into
the jellyfish estimates. My experience has been that 1296 trials,
truncation 7, is quite sufficient for most play vs. play problems and
leads to good results. As an experiment, try taking some play vs. play
problem (avoid backgames -- JF does have problems there), and roll out
the two plays 1296 times truncation 7 (same seed to get the duplicate
dice, of course). Then try rolling them each out 10,000 times on a full
rollout (again, same seed). I predict that the relative results will be
very similar -- i.e. if the truncated rollout says that play A is .03
better than play B then the full rollout will say about the same. Note
that the truncated rollouts may give bad estimates for absolute equities
for various reasons, but for play vs. play problems they are very good.

There are many things that can screw up a JF rollout. The most common is
that it is making some big thematic mistake on the first roll or two,
which it will be repeating over and over. If you are really curious
about a position I suggest you see how it plays the first couple of
moves. Also the program may have trouble handling the overall position
decently -- this is often a problem in some end-game positions
particularly backgames. In general, however, the program plays quite
well even on level 5 (which you have to use to get the fast rollouts),
and for most normal positions the results are very accurate.

As for not believing rollouts, you have to be careful. Sure it is easy
to be convinced by an expert's arguments. He is probably convinced
himself. The problem is that his arguments may be based on false
premises, which can lead to false conclusions. Except for certain
end-games or technical plays it is very difficult to *prove* that a play
is correct -- one can argue the plusses and minuses of a play, but if the
weighting of the parameters is wrong you can get the wrong result.

Let's look at your actual example: the 4-4 response to the 4-1 opening.
We might find Robertie saying:
Making the four point is clear. When the opponent has two men on the
bar, it is a must to go for the throat. The swing if he rolls a four is
enormous. The tactical gains outweigh the slight positional disadvantage
of giving up the eight point.
On the other hand, we might find Woolsey saying:
Making the nine point is clear. You have a very strong advantage, and it
is time to solidify things. Bringing the builders down gives you
ammunition to pounce wherever your opponent enters, and the solidity of
the nine and eight points will hold your advantage whatever happens. The
positional gains from this play outweigh the slight temporary advantage of
making the four point.
Both arguments are reasonable, but which one is right? The answer is we
don't know! Only our judgment and experience can guide us here. Your
initial impression was that Robertie's arguments are correct, and the
tactical considerations are overriding. The rollouts showed the plays to
be very close (which, in fact, I think they are). You have learned
something -- in this sort of position you are overweighing the tactical
considerations. Now that you have this benchmark result down pat, you
can use it to help you with other similar plays. For example, suppose
your opponent opens with a 5-4, plays 13/9, 13/8, and you roll 4-4.
Should you play 24/16*, 8/4(2) or 24/16*, 13/9(2). My guess from your
comment about the original position is that, before seeing the rollout
results, you would have considered this a close call. Not any more! If
it was close with two men on the bar, then with only one man on the bar
the positional play of making the nine point must be considerably better
(which I believe it is). This is the way you improve your backgammon play.

When a rollout result is considerably different than what you would
expect, don't be quick to disbelieve it. Think about the position, and
see if maybe your weights of the relevant parameters are not correct.
Look hard -- there is almost always a reason for an unexpected rollout
result and if you can find that reason you will have learned a lot. I
have seen many experts (myself included) make plays which were .150 or
more worse than another play without having any idea that they were
making an error. There is a lot we have to learn about backgammon, and
the tool of the jellyfish rollout is by far the most valuable tool we
have today. The results are definitely not always gospel, but there is
often a lot of truth in them.

Kit

Joseph A. Wetherell

unread,

Apr 7, 1996, 4:00:00 AM4/7/96

Hi,

I tried to get on this server by putting the software in my C: disk but
to no avail. I think they should send detailed instructions on how to
hook on to their Backgammon server.

Regards,

Joe Joeweth on FIBS...does this still extist...haven't been able to
access all weekend.

0 new messages