Level=7 Time factor=1000
Bearoff database option=selected
Use cube option=not selected
Animation, automization options=all blank/not selected
Seed=999999999 during all games
Notes:
- Counters for the first three games are the way they
are because I wanted to see if it would wrap around
at 100000. Starting with game 4, it's set to 0 and
left alone for the remainder of the games.
- Points column indicates gammons also (just in case
it matters to anybody, for any purpose).
- The rightmost colums show the winner and score as
it would happen if JF had played 100 games (each
starting at the same seed/counter values) against
itself.
- "*" indicates games where not only the outcome of
the actual game matches the simulated game, but
the number of moves was also exactly the same (but
I haven't compared to see if all of the moves were
also exactly the same, though).
Now here are the results:
JF against JF
Game# Winner Points Counter Winner/Points
----- ------ ------ ------- ------------
01 MK 2 99990 JF 1
02 MK 1 100062 JF 1
03 MK 1 100112 MK 1
04 JF 1 0000 JF 2
05 JF 1 0106 MK 1
06 MK 2 0159 JF 3
07 JF 1 0208 JF 1
08 MK 1 0271 JF 1
09 JF 1 0371 JF 1
10 MK 1 0423 MK 2
11 JF 1 0506 MK 1
12 JF 1 0570 JF 1
13 JF 2 0651 JF 1
14 MK 1 0697 JF 1
15 MK 1 0754 MK 1
16 JF 1 0799 MK 1
17 JF 2 0858 JF 1
18 JF 1 0902 MK 1
19 MK 1 0947 MK 1
20 JF 2 1004 JF 1*
21 JF 1 1067 JF 1
22 MK 1 1156 JF 1
23 MK 1 1254 JF 2
24 MK 1 1307 MK 1
25 MK 1 1352 JF 1
26 MK 1 1406 MK 1*
27 MK 1 1450 MK 1
28 JF 1 1527 MK 1
29 MK 1 1573 JF 1
30 MK 1 1633 JF 1
31 MK 1 1676 JF 1
32 MK 1 1741 MK 2
33 JF 1 1854 JF 1
34 JF 2 1919 JF 1
35 MK 1 1960 MK 2
36 JF 1 2009? JF 2
37 MK 1 2058 MK 1
38 JF 2 2110 MK 1
39 JF 1 2162 MK 1
40 MK 1 2207 MK 1
41 MK 1 2258 MK 1
42 MK 1 2301 JF 2
43 MK 1 2349 JF 1
44 JF 1 2393 JF 1
45 JF 2 2445 MK 1
46 JF 1 2496 JF 1*
47 MK 2 2540 JF 1
48 JF 2 2590 MK 1
49 JF 2 2655 JF 1
50 MK 1 2699 JF 1
51 MK 2 2151 MK 1
52 MK 1 2202 MK 2
53 JF 1 2251 JF 1
54 JF 1 2303 JF 1
55 MK 1 2365 JF 1
56 JF 3 2413 MK 1
57 MK 1 2520 MK 2
58 JF 1 2564 JF 1
59 JF 1 2609 JF 2
60 JF 1 2666 JF 2
61 JF 1 2732 MK 1
62 JF 2 2780 JF 2
63 MK 1 2825 MK 2
64 JF 2 2880 MK 1
65 MK 1 2951 MK 1
66 MK 1 2987 MK 1
67 JF 1 3042 MK 2
68 MK 1 3085 JF 1
69 MK 2 3130 MK 2
70 JF 1 3180 MK 1
71 MK 1 3230 MK 1
72 JF 3 3287 JF 1
73 JF 2 3328 JF 2*
74 JF 1 3371 MK 2
75 MK 1 3471 JF 1
76 MK 1 3517 MK 1
77 JF 2 3555 MK 1
78 JF 2 3603 MK 1
79 JF 1 3660 MK 1
80 MK 1 3705 JF 2
81 MK 1 3748 MK 1
82 MK 1 3827 MK 1
83 JF 1 3885 MK 1
84 MK 1 3957 JF 1
85 MK 1 4007 JF 2
86 MK 1 4063 MK 1
87 JF 1 4137 JF 2
88 JF 1 4201 MK 1
89 MK 1 4259 JF 2
90 JF 2 4323 JF 1
91 JF 1 4369 MK 2
92 MK 1 4416 MK 1
93 JF 2 4455 JF 2
94 MK 1 4507 JF 1
95 JF 1 4546 JF 2
96 MK 2 4585 JF 1
97 MK 1 4686 JF 1
98 MK 1 4746 MK 1*
99 JF 1 4788 JF 1
00 JF 1 4827 MK 1
Games won by MK=51 (gammons=6)
Games won by JF=49 (gammons=17)
I was expecting/hoping to win around 55-60 games.
At the mid-point (50th game) I had 27 wins and it
looked as though that could/would happen. But, at
level 7-1000, I gues I should be satisfied enough
with these results (especially when according to
JF, things should have gone the other way around
by almost exactly the same margin).
When you let JF auto-play 100 games against itself
starting at the same seed/counter values,
MK wins 48 games (10 gammons)
JF wins 52 games (15 gammons)
Thus, this double-stitches that what I achieved
against JF is better than even what JF could
have achieved playing against itself, under the
same circumstances.
One interesting thing is that the outcomes of only
52 games correspond in the two situations, which
means almost half of the games went in the opposite
direction (real-life compared to JF-against-JF). I
think this is very significant! For one thing, it's
a definite indication that there are more than one
ways to "skin a Jellyfish":) (Sorry if some readers
miss this pun).
What's more significant is that this shows the "bad",
"wrong", "incorrect", "weird", "unusual", "not the
best", "whatever else" moves I make against JF work
consistently well enough. In fact, one should say
more than well enough, because at level 7-1000, if
you mess around with a neural trained opponent, you
should get smashed pretty bad (shouldn't you?). It's
true that many times I lost a certain game much worse
than I should have because of some moves I made. But
overall, not often enough to effect the outcome. JF
didn't even get to win a significantly higher number
of gammons because of this (only 17 instead of 15, if
it were to play against itself). If anything, my such
moves seem to have effected the outcome in the opposite
direction to my benefit (which would perhaps be the
"unexpected direction" for the folks eager to bet on
JF against me:).
Is 100 games proof enough of anything here? I think
it is. Here comes a "nobody-amateur" whose name had
never been heard of in the backgammon circles, and
doesn't loose against JF 15 to 85, doesn't loose 37
to 63, doesn't loose 48 to 52 (which even JF itself
would have lost that many games against itself!), but
goes on to actually win by 2 points...! Obviously JF
itself could not have made better use out of the dice
that it rolled for me. Obviously I could not have won
against JF at level 7-1000, by merely having made
"wrong-but-lucky" moves dozens, if not hundreds of times.
So, what can be concluded from this experiment? I'll
leave this to the readers. But, I wouldn't mind
suggesting some possibilities:
A- None of this can possibly be true. No way. I just
have nothing better to do than concoct some very
elaborate stories and pull everyone's legs in this
newsgroup.
B- I just got another long lasting stream of luck. If
we play another 100 games, JF will beat me for sure
(in fact, it will beat me so badly that, the people
offering to let me pick whatever stakes, will make
lots of money off of me, if I make the mistake of
betting against them/JF).
C- As my previous comments in this newsgroup had proven
to some folks already:), I'm not that good of a player
at all, but I'm onto something about JF (that they
haven't discovered yet), from which I can draw an
advantage. (I have a feeling that not too many JF
"groupies" will go for this option:)...
D- Just out of nowhere, I simply happen to be as good
as or even slightly better than JF.
I don't intend to over-rate myself, but if the last
case is true, there may be something to be learned
from my moves against JF also. I say this simply
because JF is praised as a great tutor for beginners,
etc. and by simple deduction it follows that at least
some beginners, etc. may be able to learn something
equally useful from my style. I don't know whether
anyone will choose the above last option to be the
reasonable one, but if anyone does and would like
to see the mat files, they are all neatly named
in dos filename format (2 digits for game number, 1
letter for winner initial, 1 digit for actual points,
4 digits for counter value, plus ".gam" extention)
and I'd be glad to make them available.
Initially, I had intended to run some stats on these
games but I'm not sure how much more useful they would
be beyond the comments I made above already, in addition
to letting JF auto-play each game and comparing the two
outcomes. However, if there are some automated ways of
deriving stats from gam files, I'd be glad to spend some
time to do that also. But I just can't justify another X
many hours to spend manually compiling stats.
MK
Thanks very much for posting these results! It's good to see some useful
data -- from time to time I see people post statistics of games, but they're
often so vague about how the experiment was performed that the numbers are
useless, or provide so few games that the result isn't statistically
meaningful. One point games are definitely best, because the variance is
much smaller.
> Level=7 Time factor=1000
> Bearoff database option=selected
> Use cube option=not selected
Just one query -- I'm not familiar with Jellyfish's configuration, does this
mean that 1-point matches were being played (gammons don't count) or money
games without a cube (gammons _do_ count)? I'll assume it's one point
matches, please correct me if I'm wrong -- if gammons ARE significant, then
please disregard the rest of this article; the conclusions will all be wrong.
> Now here are the results:
[snip]
> Games won by MK=51 (gammons=6)
> Games won by JF=49 (gammons=17)
OK, we conclude that the sample mean is that you win 51% of games. For
100 games between equal players, the standard deviation is 2.5 games so
it's very likely (assume 2 std. dev'ns for all the figures I give -- roughly
a 96% confidence interval, ie. if a large number of such experiments are
performed then we expect the true value to lie within the given range 96%
of the time) that you `truly' win between 46% and 56% of all games against JF.
Assuming Jellyfish playing 1-point matches has a FIBS rating of 2000, then
winning 46% of games against it would give you a rating of 1861 (strong
intermediate to advanced), consistently winning 51% would give you 2035
(high advanced) and 56% would rate you at 2209 (new world champion!)
Unfortunately it's hard to be more accurate than that without more
data. I would be inclined to guess that the strong intermediate/advanced
conclusion is more likely than the world champion simply from the observation
that there are far more strong intermediates than there are world champions;
however this is purely a personal interpretation and there's nothing in
the data to justify making a decision either way.
> When you let JF auto-play 100 games against itself
> starting at the same seed/counter values,
>
> MK wins 48 games (10 gammons)
> JF wins 52 games (15 gammons)
>
> Thus, this double-stitches that what I achieved
> against JF is better than even what JF could
> have achieved playing against itself, under the
> same circumstances.
I doubt this means very much at all. It's generally agreed there's very
little correlation between different games played with duplicate dice -- the
games tend to diverge very early on and then comparisions become meaningless.
> One interesting thing is that the outcomes of only
> 52 games correspond in the two situations, which
> means almost half of the games went in the opposite
> direction (real-life compared to JF-against-JF). I
> think this is very significant!
Actually I think it is exactly what you expect. If the dice were completely
different, you'd expect half the games to "go in the opposite direction"
anyway. Obtaining the same result with duplicate dice only goes to show
that duplicate dice really don't make much (any?) difference. I recommend
you search on Backgammon Galore (www.bkgm.com) for earlier articles about
experiments with duplicate dice.
> Is 100 games proof enough of anything here?
It's evidence of something. More is always better (in the sense of drawing
more accurate conclusions), but 100 is significant.
> So, what can be concluded from this experiment? I'll
> leave this to the readers. But, I wouldn't mind
> suggesting some possibilities:
>
> A- None of this can possibly be true. No way. I just
> have nothing better to do than concoct some very
> elaborate stories and pull everyone's legs in this
> newsgroup.
I'm willing to believe you.
> B- I just got another long lasting stream of luck. If
> we play another 100 games, JF will beat me for sure
> (in fact, it will beat me so badly that, the people
> offering to let me pick whatever stakes, will make
> lots of money off of me, if I make the mistake of
> betting against them/JF).
There's a certain amount of luck in any result, but out of a series of 100
games it's very unlikely that luck would count for more than 5 games either
way (2 standard deviations again).
> C- As my previous comments in this newsgroup had proven
> to some folks already:), I'm not that good of a player
> at all, but I'm onto something about JF (that they
> haven't discovered yet), from which I can draw an
> advantage. (I have a feeling that not too many JF
> "groupies" will go for this option:)...
Judging from this result, I would conclude you are a strong player (in the
case of the 1-point matches which you are investigating). It's very unlikely
that a typical beginner or intermediate player (somebody under 1800, say)
would win 51/100 against Jellyfish.
> D- Just out of nowhere, I simply happen to be as good
> as or even slightly better than JF.
The evidence you gave certainly supports this (more accurately, the data
provides insufficient evidence to demonstrate either you or Jellyfish are
better than the other). It could be argued that Jellyfish plays 1-point
matches somewhat weaker than money games (since that was what its evaluator
was trained for) but it's not clear how much of an influence this is.
I don't know how much effort it is for you to come up with this data and
record matches, but if you accumulate more data I'd be keen to see it.
Unfortunately the std dev'n decreases with the square root of the number
of samples (so to be twice as accurate, you need to play 4 times as many
games). Another thing you might be interested in trying is playing 100
1-point matches against Jellyfish with manual dice -- if you won 58 or
more games with manual dice (against 51 with its dice), that would be
significant evidence that you were more likely (again at a 96% confidence
interval) to win with manual dice, and perhaps JF was `cheating'. If
you won less than 58 games, that would tend to support the hypothesis that
Jellyfish does not cheat (or possibly that it does, but by an amount
too small to be detected with this test).
Thanks again for the data, and congratulations on your wins -- whatever
else you conclude, winning 51/100 matches against Jellyfish is an
excellent achievement.
Cheers,
Gary.
--
Gary Wong, Department of Computer Science, University of Arizona
ga...@cs.arizona.edu http://www.cs.arizona.edu/~gary/
MK, I echo Gary Wong's "thanks" for conducting this experiment and posting the
results.
> > Games won by MK=51 (gammons=6)
> > Games won by JF=49 (gammons=17)
>
> OK, we conclude that the sample mean is that you win 51% of games. For
> 100 games between equal players, the standard deviation is 2.5 games so
> it's very likely (assume 2 std. dev'ns for all the figures I give -- roughly
> a 96% confidence interval, ie. if a large number of such experiments are
> performed then we expect the true value to lie within the given range 96%
> of the time) that you `truly' win between 46% and 56% of all games against JF.
Oops! The standard deviation is actually sqrt(100 * 0.5 * 0.5), which is 5
rather than 2.5. This error invalidates most of the statistical judgments you
made forward of this point.
Truth is a 100-game match says little about the relative skills of players,
since even a 60-40 outcome is within the range of believability.
> > So, what can be concluded from this experiment? I'll
> > leave this to the readers. But, I wouldn't mind
> > suggesting some possibilities:
> >
> > A- None of this can possibly be true. No way. I just
> > have nothing better to do than concoct some very
> > elaborate stories and pull everyone's legs in this
> > newsgroup.
>
> I'm willing to believe you.
I also believe you.
I have played over 3000 games against JF. These games were played against
levels 5, 6 and 7. (I agree that I have better things to do with my time.) I
have made a few observations that run counter to expectations.
First is that I did not get trounced by any means. I had an insignificant net
plus against level 6 and level 7, and an insignificant deficit against level
5.
This result is surprising to most people, since JF level 7 obviously whumps JF
level 5 in direct competition. Why shouldn't my results against Level 7 be
worse than against Level 5?
Actually, from my experience in computer Scrabble and computer Chess I was
able to predict this outcome. Specifically, the true skill difference between
successive levels of a game program is smaller in real games than it appears
to be through self-play games.
The reason is that Level 6 plays, by its very construction, specifically to
defeat Level 5. Recall that Level 6 uses "two-ply" lookahead, whereas Level 5
uses a "one-ply" lookahead. By means of looking one-ply deeper than Level 5,
Level 6 is in some sense *selecting* positions that Level 5 will misplay.
My play, on the other hand, is motivated by my own internal model of
backgammon, and that happens to involve positional concepts (e.g. timing,
priming, backgames, and cube-awareness) that give backgammon programs fits.
I observe, for example, that Level 5 plays its checkers loosely (but
generally not rashly). My error rate rises in loose positions, since it is
easy to miscalculate or misjudge something. Level 6 and Level 7 play much
more circumspectly, and I find that my positional instincts are generally
correct in solid positions.
My second observation is that JF thinks I make a lot of errors. Several per
game, in fact. JF believes that I should lose about 0.1 points per game,
cubeless, against it. While I don't have enough data to know whether I can
play as well as JF overall, I do have enough data to state with 100%
confidence that JF is dead wrong its evaluation of my moves.
It got to the point that I turned off its feedback messages. I started looking
at JF's feedback messages this way: JF is simply saying it disagrees with me,
but it is just as likely that JF's move is wrong. If I win substantially the
same number of games as JF, then why should I trust its opinion over mine?
My experience suggests that when JF complains about a move, I shouldn't give
it a moment's thought unless it thinks I are losing 0.05 or more. Neural
networks are simply not sensitive enough to detect differences less than that
with any degree of consistency.
My experiments with backgammon neural networks confirm the extent of the
difficulty networks have in judging small differences. I recorded 200 games
against JF Level 5, then annotated JF's moves using an indepenently-developed
neural network that has skill equal to JF Level 5. When the second network
disagreed with JF, I submitted the resulting positions to rollouts. I observed
that each network made dozens of errors that cost over 0.05 in equity,
including some that cost over 0.150, plus inumerable smaller errors as well.
The cumulative cost of these errors is between 0.05 and 0.1 points per game,
cubeless.
I have two conclusions.
First, backgammon is far from solved by neural network technology. There
remains a considerable gap between network performance and the best-possible
performance.
Second, we overweigh JF's evaluations in judging moves. Our own skills are
higher than we think.
Warm Regards,
Brian Sheppard
-----== Posted via Deja News, The Leader in Internet Discussion ==-----
http://www.dejanews.com/rg_mkgrp.xp Create Your Own Free Member Forum
Aaargh, silly me! Thanks for the correction, Brian -- I don't know where
the factor of 2 slipped out. A case of engaging keyboard before brain was
in gear...
The rest of my article turned out to be measuring one standard
deviation (which is significantly weaker than the two it claimed).
Replace "very likely" with "likely", and "96% confidence interval"
with "60%". Alternatively, if the same proportion of results were
obtained with _four_ hundred samples, then the conclusion would have
been valid.
Also, in order to provide evidence to support that the results are
different between manual and computer-generated dice you would
actually need to win _65_ games out of 100 with manual dice (or a
smaller proportion of a larger sample of games).
The fact that 17 of Jellyfish's 49 wins were gammons still leaves me
with a nagging doubt that it was playing to win gammons (or perhaps
some other factor was involved, eg. Murat favouring extreme back games
to exploit a Jellyfish weakness). Can anybody confirm or deny that
Jellyfish was set up to play one-point matches rather than cubeless
money games? If Jellyfish _was_ playing to win gammons (and the
score was interpreted that way), then it won by a reasonable margin
(although the test would not have been at all fair to Murat in that
case).
> I have played over 3000 games against JF. These games were played against
> levels 5, 6 and 7. (I agree that I have better things to do with my time.) I
> have made a few observations that run counter to expectations.
That's a decent sized sample! Do you have the data available?
> First is that I did not get trounced by any means. I had an insignificant net
> plus against level 6 and level 7, and an insignificant deficit against level
> 5.
>
> This result is surprising to most people, since JF level 7 obviously whumps JF
> level 5 in direct competition. Why shouldn't my results against Level 7 be
> worse than against Level 5?
That's very interesting. How much better does JF level 7 perform playing
against itself at level 5?
> The reason is that Level 6 plays, by its very construction, specifically to
> defeat Level 5. Recall that Level 6 uses "two-ply" lookahead, whereas Level 5
> uses a "one-ply" lookahead. By means of looking one-ply deeper than Level 5,
> Level 6 is in some sense *selecting* positions that Level 5 will misplay.
Yes, that's quite a sound conclusion. I had never really considered
that effect before. If it's true, it implies that the excess
computing power we're developing isn't being used effectively if it's
only to search the tree deeper -- instead, we should be looking to
improve the static evaluator (perhaps adding more neurons to the
hidden layer; adding more layers; providing more training; increasing
the number of custom features provided as inputs; using different
training strategies to eliminate weaknesses in the net; or
investigating more alternative to neural nets). It also reinforces
the point that `skill' (points per game, probability of winning, etc.)
is NOT transitive -- comparing the skill of players by measuring them
against a common third party is very dangerous, and is NOT equivalent
to the true measurement of having them play each other.
> Second, we overweigh JF's evaluations in judging moves. Our own skills are
> higher than we think.
Yes, that would seem to be the case. Why are the neural nets rated so highly
on FIBS, then? Is it because they tend to play short matches against lowly
ranked players to a greater extent than humans with comparable ratings do?
(FIBS ratings are generally known to favour stronger players somewhat in
short matches and weaker players in long matches, but that's another
story...) Or is Jellyfish better at exploiting weaknesses of weaker
players than humans are? (This is the opposite of what I would expect.)
To add yet more numbers into the fray: I don't have Jellyfish and I'm
sure I couldn't compete with level 5 and above as well as Murat and
Brian have, but I have saved statistics on about 500 games against
Motif. (They were all played `naturally', ie. not deliberately
seeking positions that Motif misevaluates and letting the cube
skyrocket.) I came out ahead, although the result is insignificant
because the variance is very high (I won one game at 32, and lost two
at 16 which has a massive effect on the variance). If anybody is
interested in modelling the results of backgammon games, the average
number of points won per game was 1.996; the mean number of net points
won by either player should be assumed to be zero and the sample
standard deviation was 2.9. 1.1% of games were backgammons (I don't
know about gammons sorry -- I only saved the number of points per
game, and 1.1% of those contained a factor of 3).
>In article <wt67goa...@brigantine.CS.Arizona.EDU>,
> Gary Wong <ga...@cs.arizona.edu> wrote:
>> mu...@cyberport.net (Murat Kalinyaprak) writes:
>>> So, what can be concluded from this experiment? I'll
>>> leave this to the readers. But, I wouldn't mind
>>> suggesting some possibilities:
>>> A- None of this can possibly be true. No way. I just
>>> have nothing better to do than concoct some very
>>> elaborate stories and pull everyone's legs in this
>>> newsgroup.
>> I'm willing to believe you.
>I also believe you.
I appreciate that at least some readers do and that
makes the time I spend in this group less of a waste.
>First is that I did not get trounced by any means. I had an
>insignificant net plus against level 6 and level 7, and an
>insignificant deficit against level 5.
>This result is surprising to most people, since JF level 7
>obviously whumps JF level 5 in direct competition. Why
>shouldn't my results against Level 7 be worse than against
>Level 5?
If I had read this two days ago, it would have surprized
me also. But after somebody used level 5 as basis when
explaining to me how the ratings were calculated, I
thought I should play a few games at level 5 also, to
see what would happen. If I could beat JF at level 7,
I expected that I would beat it much more easily at
level 5. But after 10 games, it was winning 6-4. I didn't
go on to play more games, but even this much was enough
to make me think something wasn't adding up. It's good to
hear about other people's experiences/observations, as it
helps us know we aren't alone in feeling a certain way
regarding some aspects of JF.
>My experiments with backgammon neural networks confirm the
>extent of the difficulty networks have in judging small
>differences. I recorded 200 games against JF Level 5, then
>annotated JF's moves using an indepenently-developed neural
>network that has skill equal to JF Level 5. When the second
>network disagreed with JF, I submitted the resulting positions
>to rollouts. I observed that each network made dozens of errors
>that cost over 0.05 in equity, including some that cost over
>0.150, plus inumerable smaller errors as well. The cumulative
>cost of these errors is between 0.05 and 0.1 points per game,
>cubeless.
And such things lead some of us to suspicions, which
seem to grow on you after you try some mistake on JF
and feel that you can win against it by beating it at
its own tricks...
>I have two conclusions.
>First, backgammon is far from solved by neural network
>technology. There remains a considerable gap between network
>performance and the best-possible performance.
>Second, we overweigh JF's evaluations in judging moves. Our
>own skills are higher than we think.
I tend to agree with both, and rather strongly with the
last one.
MK
> mu...@cyberport.net (Murat Kalinyaprak) writes:
>> When you let JF auto-play 100 games against itself
>> starting at the same seed/counter values,
>> MK wins 48 games (10 gammons)
>> JF wins 52 games (15 gammons)
>> Thus, this double-stitches that what I achieved
>> against JF is better than even what JF could
>> have achieved playing against itself, under the
>> same circumstances.
> I doubt this means very much at all. It's generally agreed there's very
> little correlation between different games played with duplicate dice -- the
> games tend to diverge very early on and then comparisions become meaningless.
Gary, let me first say I like reading your comments. If we
were talking about two human players here, I would have
agreed with this comment also. For example, I make even the
opening moves differently from one game to the next. Humans
may do this for many reasons ranging from avoiding monotony
to feeling lucky that this time around the opponent won't
roll XX, etc. So, games may diverge not only "early on",
but even "from the very first move on".
Yet, JF arguedly plays every roll according to the board,
which applies to all of its moves from the first one to the
very last. And JF gets so much reverence in this newsgroup
that, everytime I log on I wonder if I'm going to see an
articles posted by a kid saying "When I grow up, I'm going
to play backgammon just like a Jellyfish"... :)
So, from this perspective, I'll insist that my observation
is very significant for this particular experiment (based
on the fact that JF will never diverge from itself).
>> One interesting thing is that the outcomes of only
>> 52 games correspond in the two situations, which
>> means almost half of the games went in the opposite
>> direction (real-life compared to JF-against-JF). I
>> think this is very significant!
> Actually I think it is exactly what you expect. If the dice were completely
> different, you'd expect half the games to "go in the opposite direction"
> anyway. Obtaining the same result with duplicate dice only goes to show
> that duplicate dice really don't make much (any?) difference. I recommend
> you search on Backgammon Galore (www.bkgm.com) for earlier articles about
> experiments with duplicate dice.
I think you misunderstood, maybe because I didn't
explain it clearly. In this case the dice were the
same but half of the games went the other way...
> games). Another thing you might be interested in trying is playing 100
> 1-point matches against Jellyfish with manual dice -- if you won 58 or
> more games with manual dice (against 51 with its dice), that would be
> significant evidence that you were more likely (again at a 96% confidence
> interval) to win with manual dice, and perhaps JF was `cheating'. If
> you won less than 58 games, that would tend to support the hypothesis that
> Jellyfish does not cheat (or possibly that it does, but by an amount
> too small to be detected with this test).
When referring to manual dice, I was talking about using
some freware dice rolling programs. I'm not sure if they
are good enough or if anyone knows of a good dice roller
program. Or maybe I should roll dice physically on my
desk? That would be even more pain than using a dice roller
program but it may be true test of things...?
MK
> The fact that 17 of Jellyfish's 49 wins were gammons still leaves me
> with a nagging doubt that it was playing to win gammons (or perhaps
> some other factor was involved, eg. Murat favouring extreme back games
> to exploit a Jellyfish weakness).
Or maybe some of my "trick" moves backfiring...? :)
> Can anybody confirm or deny that
> Jellyfish was set up to play one-point matches rather than cubeless
> money games?
I specified at the beginning of my post how all the options
(available in the version I was using) were set. If there is
any other way to specify that you want to play gammonsless
1-point games, I don't know how to do it.
> If Jellyfish _was_ playing to win gammons (and the
> score was interpreted that way), then it won by a reasonable margin
> (although the test would not have been at all fair to Murat in that
> case).
Yes, then I would have played a little differently also. If
there is a way to set JF-player to play gammonless games and
if somebody tells me how, I would appreciate it.
MK
While I agree with both of these statements, the situation is a lot more
complicated than that. While the neural nets are far from perfect,
there's a lot of empirical evidence (FIBS ratings, sessions against
experts, etc.) that suggest that neural nets are as perfect as everyone
else. Humans tends to do better at technical positions, neural nets at
positional play. (Look at the various bearoff "bug" threads -- these
are positions for which any top BG player already knows the correct cube
action, that someone who doesn't know the correct action can calculate
it in a minute easily, and yet the computers get wrong. But contrast
those simple mistakes with "New Ideas in Backgammon." These are
positions that most humans will get wrong and yet Jellyfish 3.01 and
Snowie Professional 1.0 score VERY highly on these problems.)
We also should keep in mind that we've learned a great deal from the
neural nets over the past few years. Anyone who could take Jellyfish or
Snowie back in time eight years and enter them in tournaments would no
doubt amass a fortune. Backgammon theory has come a long way in the
90s.
When computers disagree with us we should neither assume they're always
wrong nor assume we're always wrong, but instead ask what the neural net
sees that we don't. Even if it's wrong we can learn from it.
Consider the following position:
X to play (1 1)
+24-23-22-21-20-19-------18-17-16-15-14-13-+
| O O O O O | | O O X |
| O O O O | | O X |
| O | | | S
| | | | n
| | X | | o
| |BAR| | w
| | | | i
| X X | | | e
| X X | | |
| X X X X | | O |
| X X X X | | O |
+-1--2--3--4--5--6--------7--8--9-10-11-12-+
Pipcount X: 97 O: 102 X-O: 0-0/1 (1)
Think about your play....
Did you consider anything other than b-23* 6-5(2)? I sure didn't....
but both Jellyfish and Snowie say this is a mistake. They claim making
the ace point is better!
The blind thing to do is believe the neural nets and decide I goofed.
(-.047 according to Snowie, -.008 according to Jellyfish.)
A slightly less blind thing to do is to perform rollouts. Jellyfish
says making the five point wins 58.6%, making the ace point wins 55.4%.
Snowie says 57.4 and 55.7 respectively. So this is a position humans
will get right and the neural nets will get wrong.
But we shouldn't stop there, because chances are we humans were still
wrong about one thing -- that making the ace point is even close in
equity to making the five point. Why is that? What is the computer
seeing that we aren't?
Both neural nets think that if O dances, X is better off with the five
point open. Turns out that they're wrong, but not by an awful lot. The
balancing factors are that with the five point open X has two rolls that
make it instead of one (44 is added), and that X has pick and pass
numbers on the five point but not on the ace point. If X rolls a 6
before O comes in, X doesn't need to close O out at this score and in
fact is safer with the five point open because X can get past O faster
if O anchors.
With some changes in the position it might turn out to be better to make
the ace point -- something that most humans would probably never
consider.
Tools shouldn't be followed blindly and neural net BG evaluators aren't
an exception. They aren't perfect, they make mistakes, but we can still
learn from them if we take the time to ask them enough questions.
They're a bit like a grizzled veteran BG player who when asked what to
do in a position says "play this" but will only give the explanation of
"because it looks better to me" and yet will patiently answer while you
show it variation after variation of the same theme.
-Michael J. Zehr
In the first message on this subject, Murat wrote:
> Played on a computer with 200Mhz CPU, 32Mb RAM
> Against Jellyfish Player version 3.01
>
> Level=7 Time factor=1000
> Bearoff database option=selected
> Use cube option=not selected
> Animation, automization options=all blank/not selected
Unless you selected "play match" and set the match length to 1,
Jellyfish believes that gammons count. Deselecting merely turns off
the cube, it doesn't change the value of a gammon.
(To see this, set up a position where most of one side's wins are gammon
and evaluate it with control-e. Try it playing a 1-point match and with
"play match" deselected but everything else set the way Murat
indicated.)
-Michael J. Zehr
You will also get a different message at the end of a game when a gammon or
backgammon occurs.
Setting #1: "Use Cube" off, "Play Match" off:
When JF wins a BG, the message will say
"Jellyfish wins 3 points"
Setting #2: "Use Cube" off, "Play match" on,
set up for 1 point match:
When JF wins a BG, the message will say "Jellyfish wins the match! Score 1-0"
Presumably JF will play to maximize expected points/game with Setting #1, and
play to maximize probability of winning the match (=game) with Setting #2.
This will presumably lead to changes in the checker play in positions where
gammon or backgammon is possible but risky to play for.
David Brotherton
>>Unless you selected "play match" and set the match length to 1,
>>Jellyfish believes that gammons count. Deselecting merely
>>turns off the cube, it doesn't change the value of a gammon.
Thanks for M. Zehr's giving this info.
>You will also get a different message at the end of a game when
>a gammon or backgammon occurs.
>Setting #1: "Use Cube" off, "Play Match" off:
>When JF wins a BG, the message will say
>"Jellyfish wins 3 points"
>Setting #2: "Use Cube" off, "Play match" on,
>set up for 1 point match:
>When JF wins a BG, the message will say "Jellyfish wins the match!
>Score 1-0"
But in the ".mat" file it still reports the actual
points (i.e. 2 for a gammon). I guess this doesn't
effect the claim that it will play differently with
the above different settings suggested...?
MK
Just wondering...
RODRIGO
> Gary, let me first say I like reading your comments. If we
> were talking about two human players here, I would have
> agreed with this comment also. For example, I make even the
> opening moves differently from one game to the next. Humans
> may do this for many reasons ranging from avoiding monotony
> to feeling lucky that this time around the opponent won't
> roll XX, etc. So, games may diverge not only "early on",
> but even "from the very first move on".
Yes, but if I remember correctly, you were amazed that Jellyfish did
not do better than you, playing against itself with the same dice as
you. But this is not so strange, because if Jellyfish and you do not
play the opening roll in the same way, then the game, even though it
is played with the same dice, is likely to diverge so much that it's
useless to compare the outcome.
>
>
> Yet, JF arguedly plays every roll according to the board,
> which applies to all of its moves from the first one to the
> very last. And JF gets so much reverence in this newsgroup
> that, everytime I log on I wonder if I'm going to see an
> articles posted by a kid saying "When I grow up, I'm going
> to play backgammon just like a Jellyfish"... :)
If someone did say that, and had the endurance to play as consistent
and making as few errors as Jellyfish does, both regarding moves and
cube handling, and also had an aptitude for the game, it's my belief
that the person in question would become an extremely strong player.
As an interesting remark, Jerry Grandell, Olympic Champion 96, World
Champion 97, Istanbul Giant Jackpot Winner 97 and Istanbul World Cup
Challenge Winner 1998, seems to be doing quite well playing like the
"fish". Rumour has it, that in the finals of the last event, Jerry's
moves, all of them, was agreed upon by Jellyfish. And he's sometimes
called JerryFish for his tendency to play like Jellyfish.
--
______________________________________________________________________
Claes Thornberg Internet: cla...@it.kth.se
Dept. of Teleinformatics URL: NO WAY!
KTH/Electrum 204 Voice: +46 8 752 1377
164 40 Kista Fax: +46 8 751 1793
Sweden
>
> When you let JF auto-play 100 games against itself
> starting at the same seed/counter values,
>
> MK wins 48 games (10 gammons)
> JF wins 52 games (15 gammons)
>
When you let it auto-play, that's Jellyfish Level 5
that takes over for both sides. Level 7 certainly
couldn't play that quickly.
Jellyfish does play quite differently under the 1-point
match condition, right from the start of the game. It
will often pass up attacking/blitzing opportunities in
favour of positional play or running its back men.
As a simple example, suppose you start the game with a
roll of 5-1 and you make the normal 24-23,13-8 play.
If JF rolls 5-5, the move it will make depends on the match
conditions. If playing with gammons, it will attack you
with 8-3,8-3,6-1,6-1. However, in a 1 pt. match it
will not attack, and instead play the more simplistic 13-3,13-3.
Another interesting case is its response to an opening 2-1.
In match play, given a roll of 5-5 it of course attacks by
making the 3 and 1 pts. In a 1 point match, however,
JF will respond to a 5-5 by playing the weird looking
13-8,13-8,6-1,6-1, a sort of "safe" attack that doesn't expose
any blots. Extended rollouts of this position would seem to
indicate that this *is* the best play in a 1 pt. match.
Anyway. From your point of view, you outscored JF 51-49 in
the 100 game session (well done btw). But from JF's point of
view, since I believe it was using money play (gammons count),
it thinks (if it could think) that it outscored you 68-57.
So, unless you too play differently under different match
conditions, one might assume that the 68-57 margin of victory
was telling.
>Do you play on GamesGrid as Murat1??
No. And neither do I play anywhere else on the
Internet, under any other name. The only time
I had, was about a year and a half ago at a new
free/amateur site that was still in development.
I had played there a few dozen games at the most,
and gave them some feed back about their Java
script errors, etc.
>If you do, then I just can't believe you beat JF l7 factor
>1000. This Murat1 guy really sucks... I asked him if he
>was the Murat who posted on r.g.b and he didn't even know
>what I was talking about. His profile says "Winning Isn't
>Everything." so you have a slight idea of the strong world
>champ I was playing.
>Just wondering...
"Murat" is a very common Muslim name. In Arabic
or Persian speaking countries, it's usually
spelled/pronounced "Murad". Spelling with a "t"
at the end indicates a Turkish, Eastern European,
etc. usage.
MK
I have been following this thread with interest but have
refrained from commenting until I feel that I have got a
useful contribution to make. It looks to me like we have
style differences here causing some degree of confusion.
Murat was obviously brought up like me playing without a
cube, no backgammons, and only one pair of dice per set.
I hadn't even heard of match play until I found the FIBS
server. Therefore, I suspect that Murat may not be using
Jellyfish in the way Europeans and Americans expect, and
so giving rise to some unforeseen results. One JF option
which is not defined in the list above is whether Muarat
selected File...New...Single Game, or File...New...Match
...1 Point when playing his 100 games. If the former, JF
will take note of gammons and backgammons and try to win
extra points. If the latter then gammons and backgammons
only count for one point and JF will not take chances to
win them. We appear to have a situation in which the two
players are playing to achieve different goals, and each
is winning by their own way of reckoning. Please, Murat,
clarify the settings you are using, so that you will all
be interpreting the results in the same way. Perhaps the
next version of JF will include settings for backgammons
count as two points, and that the loser buys the coffee!
Another rule that people I know play with is that you may
only have five men on a point. That screws you up with an
opening 6 5, I can tell you. But that is another story. I
am going to finish now; this blocked text does my nut in!
Regards,
Ian Shaw
In <6pk2nl$d...@news3.force9.net> Ian Shaw wrote:
>so giving rise to some unforeseen results. One JF option
>which is not defined in the list above is whether Muarat
>selected File...New...Single Game, or File...New...Match
>....1 Point when playing his 100 games. If the former, JF
>will take note of gammons and backgammons and try to win
>extra points. If the latter then gammons and backgammons
>only count for one point and JF will not take chances to
>win them. We appear to have a situation in which the two
>players are playing to achieve different goals, and each
>is winning by their own way of reckoning. Please, Murat,
>clarify the settings you are using, so that you will all
>be interpreting the results in the same way.
The incorrect settings have been pointed out earlier,
and the results have been considered pretty much
useless. I wasn't trying to win or prevent gammons
either and in addition to that, looking back, I feel
that I probably wasn't taking full advantage of the
idea of "gammonless games", just because I had never
played the game with such stategy in mind before.
Some observations can still be made by looking at
the individual games, but nothing really meaningful
can be concluded from it and it would be a worthless
thing to do. The best way to go about it is to play
another 100 games with the correct settings, for which
I will report the results soon.
>Perhaps the
>next version of JF will include settings for backgammons
>count as two points, and that the loser buys the coffee!
That would be nice... :) As it is now, the coffee is
in the kitched already made and free, but it won't
even go fetch me a cup...
>Another rule that people I know play with is that you may
>only have five men on a point. That screws you up with an
>opening 6 5, I can tell you.
I never played it that way but I can imagine that it
would be really insteresting to try.
MK
Message-ID: <yvkk94z...@cuchulain.it.kth.se>
Claes Thornberg <cla...@cuchulain.it.kth.se> wrote:
>Murat Kalinyaprak <mu...@cyberport.net> writes:
>> ...... So, games may diverge not only "early on",
>> but even "from the very first move on".
>Yes, but if I remember correctly, you were amazed that Jellyfish did
>not do better than you, playing against itself with the same dice as
>you. But this is not so strange, because if Jellyfish and you do not
>play the opening roll in the same way, then the game, even though it
>is played with the same dice, is likely to diverge so much that it's
>useless to compare the outcome.
I would have no problem with each of the 100 games
played with the same dice rolls, between two pairs of
humans or between "human vs. JF" and "JF vs. JF"
pairs, diverging at any stage and producing different
results, if there wasn't an issue of "best/correct"
moves. Otherwise, it becomes difficult to avoid a
series of arguments like the following:
- First move being best/right or lesser/wrong
doesn't determine the outcome of a game
- Second move being best/right or lesser/wrong
doesn't determine the outcome of a game either
- Third move being best/right or lesser/wrong
doesn't determine the outcome of a game either
- ...
- Last move being best/right or lesser/wrong
doesn't determine the outcome of a game either
In the case that we were talking about, almost
half of the game result were reversed. I think
when you reach such numbers/ratios, Then, the
question becomes: "What are the concepts/claims
of such thing as best/right moves good for"...?
BTW: in the meantime, somebody posted an article
in this group saying that JF's auto-play is done
at level 5. If so, most of these arguments may
have been for nothing anyway.
>> Yet, JF arguedly plays every roll according to the board,
>> which applies to all of its moves from the first one to the
>> very last. And JF gets so much reverence in this newsgroup
>> that, everytime I log on I wonder if I'm going to see an
>> articles posted by a kid saying "When I grow up, I'm going
>> to play backgammon just like a Jellyfish"... :)
>As an interesting remark, Jerry Grandell, Olympic Champion 96, World
>Champion 97, Istanbul Giant Jackpot Winner 97 and Istanbul World Cup
>Challenge Winner 1998, seems to be doing quite well playing like the
>"fish". Rumour has it, that in the finals of the last event, Jerry's
>moves, all of them, was agreed upon by Jellyfish. And he's sometimes
>called JerryFish for his tendency to play like Jellyfish.
Had I known this story, I would have used it
myself to illustrate my point, because it does
fit very nicely.
If Jellyfish and Jerryfish play so much alike,
(i.e. twins of a human and a machine), and if
Jerryfish played against Jellyfish, then Jerryfish
would have lost...
What I keep trying to convey is this impression
of mine that it may be easier to beat Jellyfish
by playing *unlike* Jellyfish, rather than by
playing *like* Jellyfish. And I even go beyond
this, to say that *unlike Jellyfish* may not
necessarily have to mean *like Mr. John Doe,
the world champion*, but it may mean knowingly
making lesser/wrong moves *like me*... (on top of
the lesser/wrong moves that I make unknowingly:)
Now, my thinking that I can outsmart JF by such
"trick moves" may be no more than my imagination.
Those moves may actually be doing my game more
harm than good, and maybe I'm somehow still able
to compensate for them later on in the game and
manage to come out on top, despite the harm they do
(this possibility is hard even for me to believe).
You need to understand that I'm not trying to
prove I can beat JF or brag about myself, but
that I have a real dilemma here. I see that
I'm doing rather well against JF, and trying
to find an explanation for it, which wouldn't
lead to the conslusion that I'm a world-class
player. I never thought before and still don't
think that I could do nearly as well against a
human world-class (or even lesser than that)
player. Is it possible that I never knew my own
streght? Since I never played in any serious
money or title tournaments, my skills have never
been really "tested to their limits". Maybe there
truth in the adage that one only plays as good as
his opponent, and that my "dormant" skills are
just coming out against JF?
In trying to find an answer, the reasons that I
may be inventing seem to sound too far fetched to
people who firmly believe in JF. But the opposite
possibility (i.e. that I may be as good as a
world-class player like JF) sounds too far fetched
to me. Maybe in time things will get more clear
one way or the other...
MK
> I would have no problem with each of the 100 games
> played with the same dice rolls, between two pairs of
> humans or between "human vs. JF" and "JF vs. JF"
> pairs, diverging at any stage and producing different
> results, if there wasn't an issue of "best/correct"
> moves.
Only in your two most recent posts have you mentioned the issue of
best/correct moves, as far as I remember.
> Otherwise, it becomes difficult to avoid a
> series of arguments like the following:
>
> - First move being best/right or lesser/wrong
> doesn't determine the outcome of a game
>
> - Second move being best/right or lesser/wrong
> doesn't determine the outcome of a game either
>
> - Third move being best/right or lesser/wrong
> doesn't determine the outcome of a game either
>
> - ...
>
> - Last move being best/right or lesser/wrong
> doesn't determine the outcome of a game either
You don't seem to understand the idea of best/correct move. A move
doesn't necessarily determine the outcome of a single game, except
in some special circumstances. The idea of best/correct move is, I
believe, quite simple. Suppose we have a position A, where you can
choose between five different moves. These moves lead to positions
A1, A2, A3, A4, A5. If we play each of these positions hundreds of
times, if not thousands of times, and get the result that position
A1 wins more points than any other position, we'd believe that the
best move was the move which lead to position A1. So the best move
is the move which gives you the highest expected equity. But since
this is cruelest of games, you can lose despite making the correct
move, and you can win even though you did not make one single best
move. So the list above could be written as:
- First move being best/right or lesser/wrong
doesn't determine the outcome of a game, but it gives you the
best chances of winning this game.
- Second move being best/right or lesser/wrong
doesn't determine the outcome of a game either, but it gives
you the second best chance of winning this game.
- Third move being best/right or lesser/wrong
doesn't determine the outcome of a game either, but it gives
you the third best chance of winning this game.
- ...
- Last move being best/right or lesser/wrong
doesn't determine the outcome of a game either, but it gives
you the smallest chance of winning this game.
> In the case that we were talking about, almost
> half of the game result were reversed. I think
> when you reach such numbers/ratios, Then, the
> question becomes: "What are the concepts/claims
> of such thing as best/right moves good for"...?
If you think about it, there is nothing strange about changing the
outcome in so many single games. If Jellyfish doesn't play opening
moves like you, or doesn't repond to an opening move, then this is
about what one would expect, I guess.
> >> Yet, JF arguedly plays every roll according to the board,
> >> which applies to all of its moves from the first one to the
> >> very last. And JF gets so much reverence in this newsgroup
> >> that, everytime I log on I wonder if I'm going to see an
> >> articles posted by a kid saying "When I grow up, I'm going
> >> to play backgammon just like a Jellyfish"... :)
>
> >As an interesting remark, Jerry Grandell, Olympic Champion 96, World
> >Champion 97, Istanbul Giant Jackpot Winner 97 and Istanbul World Cup
> >Challenge Winner 1998, seems to be doing quite well playing like the
> >"fish". Rumour has it, that in the finals of the last event, Jerry's
> >moves, all of them, was agreed upon by Jellyfish. And he's sometimes
> >called JerryFish for his tendency to play like Jellyfish.
>
> Had I known this story, I would have used it
> myself to illustrate my point, because it does
> fit very nicely.
>
> If Jellyfish and Jerryfish play so much alike,
> (i.e. twins of a human and a machine), and if
> Jerryfish played against Jellyfish, then Jerryfish
> would have lost...
Could you please explain how do you come to this conclusion?
Sorry Murat, but I am off to lunch and I'll comment on the rest of
your post later. Maybe.
Regards,
Claes Thornberg