JF and New Ideas in BG

N Gibbions

unread,

Mar 7, 1998, 3:00:00 AM3/7/98

to

The following may be of interest to owners of Woolsey's book, "New Ideas in Backgammon".

Having recently uprgaded my copy of Jellyfish from version 2.01 to version 3.0, I
decided see how many problems in the book each version got right playing at levels 5, 6
and 7. The idea was to get some sort of a handle on the following issues:

(1) The relative merits of the two versions of Jellyfish.

(2) A comparison of JF's standard of checker play with my own.

(3) A feel for the relative playing strengths of different levels within the same
version of JF.

Here are the results:

JF2.01 JF3.0
Wrong Equity Average Wrong Equity Average
Level 5 40 3.058 0.029 41 3.006 0.029
Level 6 24 1.724 0.017 23 1.622 0.016
Level 7 18 1.388 0.013 8 0.508 0.005

The heading "Wrong" refers to the number of problems (out of a total of 104) each
program got wrong (taking the solutions in the book, which are based upon extensive
rollouts, and expert opinion, as given). "Equity" refers to the total equity sacrificed
as a result of these errors. And "Average" refers to the average equity sacrificed per
problem (i.e. total equity sacrificed, divided by 104).

Some general conclusions from the above results:

(1) The improvement between levels, within the same version of JF is quite significant.
In both versions, level 6 gets many problems right that level 5 gets wrong, and the
average equity sacrificed over this series of 104 difficult problems almost halves as a
consequence. The improvement between levels 6 and 7 is not so pronounced in JF2.01, but
in JF3.0 is quite remarkable.

(2) At levels 5 and 6, the two versions of JF do equally well in terms of the number of
problems they get right, but version 3.0 has a slight edge, in that, its mistakes tend
to be less significant that version 2.01's. Hence JF3.0 sacrifices less equity overall
than version 2.01. and has a slightly lower average at these levels. On level 7,
however, JF3.0 does much better than the previous version.

The results for JF3.0 playing on level 7 are well worth pondering. I obtained them by
setting the timing factor to 30 on a Pentium 100MHz machine. Increasing the tinming
factor further did not alter any of JF's plays on my machine. In each case, JF made its
move in just a few seconds. When I think about how I sweated and scratched my head over
these problems, and still got MANY of them wrong, I am impressed indeed that JF3.0 makes
just 8 mistakes.In his introduction to the book, Woolsey says that anyone who averages
an error of 0.03 or less over these problems must be playing backgammon at a very high
level. Either version of JF, at any of these three levels is therefore playing at a very
high level, but JF3.0, level 7's average of 0.005 seems truly frightening. I'm sure that
there's a lesson somehwere in all of this for those people who complain that JF cheats.

In case anyone is interested, the problems which JF3.0, level 7 gets wrong are the
following: 13, 49, 51, 63, 66, 93, 97, 103.

I did some further rollouts on these positions just to see if they agreed with the
results cited in the book. Although some of my results (using JF3.0, level 5 and 6
rollouts) suggest that the differences between JF's chosen play and the solution given
in the book are not as great as indicated (in the case of problem 13, the difference
seems quite marginal), I'm satisfied that the above 8 are indeed position in which even
JF 3.0 level 7 makes an error.

Finally, just so people without the book don't feel totally left out, here is what turns
out to be JF3.0's biggest error on level 7, Problem 51:

| - - - - - - | - | - - - - - - |
| x o o | | o o x |
| x o o | | o x |
| o | | |
| o | | |
| o | | |
| | | |
| | | |
| x | | |
| x | | x o |
| x x | | x x o |
| o x x | | x x o o |
| - - - - - - | | - - - - - - |

Money Game, O owns cube. X to play 66

JF 3.0 level 7 plays 24/18(2), 7/1*(2), the middle way, for an equity of 0.610. The
solution, as upported by rollouts (including mine) is to go the whole hog with 7/1*(2),
8/2(2). Those loose blots in the outfield just give X too many gammon opportunities to
make it worth worrying about defensive structure with this roll. Still, JF3.0 level 7 is
in good company: five out of the eight experts assembled to vet the problems for the
book made the same mistake (and, for what it's worth, so did I).

I'd be interested to hear any comments on the above results, especially on the strength
of JF3.0, playing at its highest level.

Best Wishes

Nigel Gibbions.

Chuck Bower

unread,

Mar 7, 1998, 3:00:00 AM3/7/98

to

In article <350164...@sheffield.ac.uk>,
N Gibbions <N.Gib...@sheffield.ac.uk> wrote:

>The following may be of interest to owners of Woolsey's book, "New
>Ideas in Backgammon". Having recently uprgaded my copy of Jellyfish from
>version 2.01 to version 3.0, I decided see how many problems in the book
>each version got right playing at levels 5, 6 and 7. The idea was to get
>some sort of a handle on the following issues:
>
>(1) The relative merits of the two versions of Jellyfish.
>
>(2) A comparison of JF's standard of checker play with my own.
>
>(3) A feel for the relative playing strengths of different levels within
> the same version of JF.
>
>Here are the results:
>
> JF2.01 JF3.0
> Wrong Equity Average Wrong Equity Average
>Level 5 40 3.058 0.029 41 3.006 0.029
>Level 6 24 1.724 0.017 23 1.622 0.016
>Level 7 18 1.388 0.013 8 0.508 0.005

(snip)

A nice study, IMHO. A couple things I can add to Nigel's comments
(some of which I have snipped):

a) It looks like JF hasn't improved much from v2.0 to v3.0 at level-5 and
level-6. However, at level-7 the improvement is quite noticeable
in these problems.

b) One should keep in mind that comparison of JF evaluations with JF
rollouts is a biased study. On the other hand, that is the only
choice MOST of us humans have (as far as using robots is concerned)
because the only mechanical competitors for JF are either outdated
(like Expert Backgammon), don't have rollout/evaluation capabilities
(like TD-Gammon), or are available only to a select few (e.g. their
authors, like Snowie, M-Loner, Motif, and their cousins). I'd make
a plea for new, STRONG, commercially available robots, but it seems
like someone has already done that on this newsgroup lately. ;)

c) The book (Woolsey and Heinrich's "New Ideas...") is not yet outdated!
That's more than can be said about most BG books a couple years
after their release. (Many are outdated BEFORE release!)

Chuck
bo...@bigbang.astro.indiana.edu
c_ray on FIBS

Kit Woolsey

unread,

Mar 7, 1998, 3:00:00 AM3/7/98

to

N Gibbions (N.Gib...@sheffield.ac.uk) wrote:

<snip>

: The results for JF3.0 playing on level 7 are well worth pondering. I

: obtained them by setting the timing factor to 30 on a Pentium 100MHz
: machine. Increasing the tinming factor further did not alter any of JF's
: plays on my machine. In each case, JF made its move in just a few seconds.
: When I think about how I sweated and scratched my head over these
: problems, and still got MANY of them wrong, I am impressed indeed that

: JF3.0 makes just 8 mistakes. In his introduction to the book, Woolsey says

: that anyone who averages an error of 0.03 or less over these problems must
: be playing backgammon at a very high level. Either version of JF, at any
: of these three levels is therefore playing at a very high level, but
: JF3.0, level 7's average of 0.005 seems truly frightening. I'm sure that
: there's a lesson somehwere in all of this for those people who complain
: that JF cheats.

Thanks for the excellent analysis. It should be noted that Jellyfish
probably isn't playing quite as well as these results would indicate.
The reason is that we are using Jellyfish's own rollouts to determine the
best play. While Jellyfish plays well enough on any level so most
rollouts can be trusted (and Hal and I tried to avoid positions where we
thought there might be problems with the rollouts), there are bound to be
a few positions in the book which Jellyfish misplays badly enough so the
rollouts give erroneous results. Since the same program is doing the
rollouts and choosing the moves the bias will be in the same direction,
so Jellyfish's opinions are likely to echo any false rollout results.

It should also be noted that the positions in the book are the types of
positions which expert human players have trouble with, while the neural
nets do well on these sort of positions. They are largely "judgment"
problems, where one has to weigh conflicting conflicting priorities and
come up with the right balance. This is the area where the neural nets
are very strong. If we were looking at different types of positions
which were of a more technical nature, human experts would outscore
Jellyfish.

Despite all this, you are quite correct: The program plays a damned good
game of backgammon.

Kit

Unknown

unread,

Mar 9, 1998, 3:00:00 AM3/9/98

to

On Sat, 07 Mar 1998 15:16:26 +0000, N Gibbions
<N.Gib...@sheffield.ac.uk> wrote:

>The following may be of interest to owners of Woolsey's book, "New Ideas in Backgammon".
>
>Having recently uprgaded my copy of Jellyfish from version 2.01 to version 3.0, I
>decided see how many problems in the book each version got right playing at levels 5, 6
>and 7. The idea was to get some sort of a handle on the following issues:
>
>(1) The relative merits of the two versions of Jellyfish.
>
>(2) A comparison of JF's standard of checker play with my own.
>
>(3) A feel for the relative playing strengths of different levels within the same
>version of JF.
>
>Here are the results:
>
> JF2.01 JF3.0
> Wrong Equity Average Wrong Equity Average
>Level 5 40 3.058 0.029 41 3.006 0.029
>Level 6 24 1.724 0.017 23 1.622 0.016
>Level 7 18 1.388 0.013 8 0.508 0.005

Could you please post the numbers of the 10 problems which Jf 3.0
managed to correct ?

Thanks

>The heading "Wrong" refers to the number of problems (out of a total of 104) each
>program got wrong (taking the solutions in the book, which are based upon extensive
>rollouts, and expert opinion, as given). "Equity" refers to the total equity sacrificed
>as a result of these errors. And "Average" refers to the average equity sacrificed per
>problem (i.e. total equity sacrificed, divided by 104).
>
>Some general conclusions from the above results:
>
>(1) The improvement between levels, within the same version of JF is quite significant.
>In both versions, level 6 gets many problems right that level 5 gets wrong, and the
>average equity sacrificed over this series of 104 difficult problems almost halves as a
>consequence. The improvement between levels 6 and 7 is not so pronounced in JF2.01, but
>in JF3.0 is quite remarkable.
>
>(2) At levels 5 and 6, the two versions of JF do equally well in terms of the number of
>problems they get right, but version 3.0 has a slight edge, in that, its mistakes tend
>to be less significant that version 2.01's. Hence JF3.0 sacrifices less equity overall
>than version 2.01. and has a slightly lower average at these levels. On level 7,
>however, JF3.0 does much better than the previous version.
>

snip

>In case anyone is interested, the problems which JF3.0, level 7 gets wrong are the
>following: 13, 49, 51, 63, 66, 93, 97, 103.

My Jf 3.0 plays problem 63 ok
6/3 14/13 eq -.031
8/5 6/5 eq -.035