Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Beginners question

16 views
Skip to first unread message

james

unread,
Jun 16, 2003, 10:13:53 AM6/16/03
to
I am new to backgammon.I have the free Jellyfish download and would
like to know how strong the different settings are.


James

Michael Howard

unread,
Jun 16, 2003, 7:01:07 PM6/16/03
to
The Fish is world class on level 7 (around 2000 ELO).
Level 5 & 6 will probably be 150 and 75 points less respectively. Maybe
someone has worked it out at some point?

If your learning though, you should only use level 7 anyway. Why bother
being taught by an inferior player?

Good luck,
M

"james" <jame...@hotmail.com> wrote in message
news:e5870d4.03061...@posting.google.com...

Scott Steiner

unread,
Jun 17, 2003, 5:35:24 AM6/17/03
to
james wrote:
>
> I am new to backgammon.I have the free Jellyfish download and would
> like to know how strong the different settings are.

Tip: if you want to become more serious about backgammon then you will
soon need a more sophisticated program than free JF - use Gnubg instead
of the free JF version.

http://home.online.no/~oeysteij/

Paul Ashby

unread,
Jun 17, 2003, 8:47:53 AM6/17/03
to
"Michael Howard" <mi...@howard666.freeserve.co.uk> wrote in message news:<bcli7q$co7$1...@newsg3.svr.pol.co.uk>...

> The Fish is world class on level 7 (around 2000 ELO).
> Level 5 & 6 will probably be 150 and 75 points less respectively. Maybe
> someone has worked it out at some point?
>
> If your learning though, you should only use level 7 anyway. Why bother
> being taught by an inferior player?
>

I have just started to learn backgammon, using JellyFish Tutor. It
really seems to me like it cheats!! Everytime I have him on bar and
there is only one escape, he rolls the right number. Maybe I'm
imagining it, but it certainly seems like that

Sprozza

unread,
Jun 17, 2003, 11:02:12 AM6/17/03
to
"Michael Howard" <mi...@howard666.freeserve.co.uk> wrote in message news:<bcli7q$co7$1...@newsg3.svr.pol.co.uk>...
>
> If your learning though, you should only use level 7 anyway. Why bother
> being taught by an inferior player?

Because you need feedback! You need to see the very basics of strategy
paying off in terms of the occasional win before you move up a level.
To a rank beginner, Level 7 appears to have mysterious, god-like
powers, not easily learnt tactics.

If you're beginning to learn the piano, you don't start with
Rachmaninov Concerti, and if Sampras was giving you your first tennis
lesson he'd probably play left-handed!

Sprozza

Tom Keith

unread,
Jun 17, 2003, 12:26:55 PM6/17/03
to
This is a common query/complaint of all backgammon programs, but as far
as I know there is no major backgammon game that cheats with its dice.
For more information, see http://www.bkgm.com/rgb/rgb.cgi?menu+computerdice

Tom Keith
http://www.bkgm.com

Michael Howard

unread,
Jun 17, 2003, 2:42:52 PM6/17/03
to
You'd need to explain why level 5 would appear any different to a beginner.
Level 7 took me from novice to expert-ish in less than a year with the help
of a couple of books by Magriel, Dwek, and Robertie. I would say you learn
almost subliminally from board patterns and strong cube decisions.
Involving other weaker levels will just lengthen the learning process.
Willing to listen to your ideas but you'll have to come up with better than
Piano grading metaphors to convince me.
Rgds,
M
"Sprozza" <spamf...@wimbolt.demon.co.uk> wrote in message
news:e8753bd8.03061...@posting.google.com...

Scott Steiner

unread,
Jun 17, 2003, 4:40:54 PM6/17/03
to
Michael Howard wrote:
>
> You'd need to explain why level 5 would appear any different to a beginner.
> Level 7 took me from novice to expert-ish in less than a year with the help
> of a couple of books by Magriel, Dwek, and Robertie. I would say you learn
> almost subliminally from board patterns and strong cube decisions.
> Involving other weaker levels will just lengthen the learning process.
> Willing to listen to your ideas but you'll have to come up with better than
> Piano grading metaphors to convince me.
> Rgds,
> M

IMHO, the whole discussion is quite off the point. When it comes to
bots, the beginner is not going to learn by merely playing JF at level
5,6 or 7, but rather by having his matches analysed after play at an
expert level and reviewing the analyses carefully. Since the poster
mentions he is using the free version of JF, he does not have this
feature. As a result, he and every other beginner will lose big style
at whatever level he plays (5,6 or 7). That is why I suggested using
gnubg instead of JF light.

Paul Ashby

unread,
Jun 18, 2003, 9:21:13 AM6/18/03
to
Thanks Tom, after reading the articles on that link I feel embarrassed
to have asked the question. I really must be rubbish at backgammon,
JellyFish beats me about 5 times out of 4 on Level 4. Better start
studying...


Tom Keith <tom...@ETEbkgm.com> wrote...

Sprozza

unread,
Jun 19, 2003, 5:57:13 AM6/19/03
to
"Michael Howard" <mi...@howard666.freeserve.co.uk> wrote in message news:<bcnnfh$2ft$1...@newsg1.svr.pol.co.uk>...

> You'd need to explain why level 5 would appear any different to a beginner.
> Level 7 took me from novice to expert-ish in less than a year with the help
> of a couple of books by Magriel, Dwek, and Robertie. I would say you learn
> almost subliminally from board patterns and strong cube decisions.
> Involving other weaker levels will just lengthen the learning process.
> Willing to listen to your ideas but you'll have to come up with better than
> Piano grading metaphors to convince me.
> Rgds,
> M

Well, I admire your determination - and I'd agree that there is no
substitute for reading and following top-level matches - something I
should do a lot more of still. And having the tutor option of some of
the bots is invaluable. If someone is the sort of person who really
does see every set-back as an "opportunity" then maybe digging in for
the long haul against the best their bot can do is right for them.

I merely think that for many people, playing against a world-class
opponent when they're learning will be like bashing their heads
against a brick wall - even if (or maybe, especially because) it's
just a bot you're losing to, the psychological impact of losing time
and time again isn't very conducive to thinking about how your game's
improving.

Sprozza

Mike Howard

unread,
Jun 19, 2003, 2:59:36 PM6/19/03
to
>
> Well, I admire your determination - and I'd agree that there is no
> substitute for reading and following top-level matches - something I
> should do a lot more of still. And having the tutor option of some of
> the bots is invaluable. If someone is the sort of person who really
> does see every set-back as an "opportunity" then maybe digging in for
> the long haul against the best their bot can do is right for them.
>
> I merely think that for many people, playing against a world-class
> opponent when they're learning will be like bashing their heads
> against a brick wall - even if (or maybe, especially because) it's
> just a bot you're losing to, the psychological impact of losing time
> and time again isn't very conducive to thinking about how your game's
> improving.
>
> Sprozza
>
> Well, I was determined. I was simply fascinated I guess. Using
the Lite version and Dwek's book I played for 3 weeks solid before I
won my first 3 pt match. It was amazing to me to see primes being
built before my eyes. I had seen these in the book but assumed they
were rare. Gradually I saw the reason for the importance of the 5pt
etc etc. Once I started to win a little more I puchased the full
product. Playing a couple of hours every day I began to 'intuitively'
sense the moment to double and built a mental database to help me
understand when to accept/reject. More books more theory and I
started to make real progress. Able to hold my own on FIBS/Netgammon
etc.

I love the Fish but sadly it is now old stock and GNU will be the only
game in town once the graphics are sorted.

Good luck,
M

amni

unread,
Jun 22, 2003, 7:44:19 AM6/22/03
to
JF level 7\1000 seems to me much stronger than GNUBG 13.1 (I haven't installed
yet GNUBG 14 with new weights which claims being better than 13.1).

I dont't see a point in playing against weak opponnents
(less than level 7\1000). Psycologically,
if one wins 1 of 10 matches and improves his results gradually to better
results he shouldn't be frustrated. And winning 1 of 10 matches
is possibe even for "beginners" (not "absolute" beginners).

I prefer JF for most playing because its graphics is supperior.
If one plays 2 hours a day or more, this is a strong factor.
I played only with the free version 3.5 lite.
If someone enjoy it, the paid versions might worth the
money.

I played definitely more than 50000 games,
maybe 100000 games.
I never read books or "master piece games".
This is based on my experience with chess books,
most of them are boring and the best way to learn
is to learn right from the board.
I believe that learning from software (TUTOR MODE or so)
is more intersting and effective than learning from books.
Books are somewhat archaic style of learning.
By the way, one of top BG player told my friends that
he learned mostly from playing against SNOWIE.

I run GNUBG from time to time TUTOR mode (warning only BAD mistakes),
everything in GRANDMASTER level (3 ply no error noise).
The reason for this is to detect some serious bad decisions
which I was oblivious to.
I don't accept GNUBG TUTOR evaluations because many do not convince me,
upon my experience with JellyFish. (By the way I win at least 50
percent of the money games session with GNUBG 13.1 GRANDMASTER,
I guess I've played at least 5000 money games with that
GNUBG, so my low respect of GNUBG Tutor advices
is not so silly as some of you would think.)


"Michael Howard" <mi...@howard666.freeserve.co.uk> wrote in message news:<bcli7q$co7$1...@newsg3.svr.pol.co.uk>...

Albert Silver

unread,
Jun 22, 2003, 8:43:00 PM6/22/03
to
am...@hotmail.com (amni) wrote in message news:<e3504803.03062...@posting.google.com>...

If you prefer JF over GNU go for it. One thing: I'd play Supremo and
not Grandmaster level. There are irregularities in the odd-plies that
crop up more often than not.

Albert Silver

amni

unread,
Jun 23, 2003, 4:00:50 AM6/23/03
to
As I said, I _do use_ GNU as a crude TUTOR since I didn't have
any commercial (JF or SNOWIE) with TUTORING facilities.
I beleve that the TUTORING accelerate the study,
because it points at some basic mistakes which are
not detected intuitively.

As for level tuning in GNU --- it is possible to use "user defined"
level, not necesarilly the pre-defined levels.
I don't understand your claim about "odd plys" problems
in GNU but it can be resolved (partly) by choosin in "user defined"
level:
non cube action evaluation ==> 3 ply (no noise)
cube action evaluation ==> 4 plies (no noise)

My objection to any noise in learning mode is that noise
make learning harder (you never know if the move is a clever
move or a noise error).

Albert Silver

unread,
Jun 23, 2003, 9:11:05 AM6/23/03
to
am...@hotmail.com (amni) wrote in message news:<e3504803.0306...@posting.google.com>...

> As I said, I _do use_ GNU as a crude TUTOR since I didn't have
> any commercial (JF or SNOWIE) with TUTORING facilities.

Snowie 4 doesn't have any tutoring mode like the one in Jellyfish or
GNU.

> I beleve that the TUTORING accelerate the study,
> because it points at some basic mistakes which are
> not detected intuitively.

Yes, I like it too, but prefer to study by reviewing my moves, ALL my
moves, in the post-mortem analysis.

>
> As for level tuning in GNU --- it is possible to use "user defined"
> level, not necesarilly the pre-defined levels.
> I don't understand your claim about "odd plys" problems

Due to the way the neural nets were developed, it has been noted that
the odd-plies, such as GNU 1-ply and GNU 3-ply, are somewhat erratic
and produce irregular results. You're more likely to get GNU's true
playing ability using the even plies such as GNU's 2-ply. Grandmaster
is a 3-ply level and Supremo is the 2-ply. Also, the 3-ply grandmaster
level doesn't analyze that many moves at 3-ply. Only 2 at the most.

As to the level of GNU, it bears mentioning that GNU 0.13 and GNU 0.14
are considerably stronger than Jellyfish, and as strong as Snowie 4,
possibly even a fraction stronger.

Albert Silver

Scott Steiner

unread,
Jun 23, 2003, 12:09:43 PM6/23/03
to
Albert Silver wrote:
[...]

> If you prefer JF over GNU go for it. One thing: I'd play Supremo and
> not Grandmaster level. There are irregularities in the odd-plies that
> crop up more often than not.
[...]

Hi Albert,

I asked a while ago about differences between odd ply and even ply
analysis and whether odd ply produces bad results at times. Here's the
thread:

http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&threadm=3DF9FEE9.7D510825%40nospam.nospam&rnum=1&prev=/groups%3Fhl%3Den%26lr%3D%26ie%3DISO-8859-1%26q%3Dgroup%253Arec.games.backgammon%2Binsubject%253Aodd%2Binsubject%253Aply%2Binsubject%253Aeven%2Bauthor%253AScott%2Bauthor%253ASteiner

As you can see, Gary Wong answered (I think he is one of the early
developers of gnu, right?) and said that there is nothing that suggests
that even ply was in any way better than odd ply. He further states
that in general one can expect that n+1 ply to be better than n ply.

After playing for 18 months now (which is obviously not that much mind
you ;-)), I for myself have found that Gary Wong's above statement to be
correct. From how I see it and from my experience, 3 ply seems to be
better than 2 ply. I notice a relatively big difference in cube
decisions and only slight differences when it comes to checker play. I
therefore choose to analyse my matches with 3ply on cubing and 2ply on
checker play with occasional further 3 ply analysis on a checker play
which I suspect to be wrong. My rollouts usually confirm cube decisions
done on 3 ply (I hope my rollout settings are right, BTW I am using the
recommended rollout settings you give in the manual).

Any thoughts on why you think that odd ply isn't that reliable are very
welcome since this subject interests me.

Thanks!

Kees van den Doel

unread,
Jun 23, 2003, 2:38:50 PM6/23/03
to
In article <f9846eb9.03062...@posting.google.com>,
Albert Silver <silver...@hotmail.com> wrote:

>As to the level of GNU, it bears mentioning that GNU 0.13 and GNU 0.14
>are considerably stronger than Jellyfish,

What is this statement based on?


Kees (Maggie and cover many proofs that actually I know.)

Albert Silver

unread,
Jun 23, 2003, 6:41:12 PM6/23/03
to
kvan...@xs2.xs4all.nl (Kees van den Doel) wrote in message news:<3ef7493a$0$123$e4fe...@dreader4.news.xs4all.nl>...

> In article <f9846eb9.03062...@posting.google.com>,
> Albert Silver <silver...@hotmail.com> wrote:
>
> >As to the level of GNU, it bears mentioning that GNU 0.13 and GNU 0.14
> >are considerably stronger than Jellyfish,
>
> What is this statement based on?

A number of factors really. One would be the empirical comments of top
experts who judge Jellyfish to be inferior to Snowie 3, and more so to
Snowie 4. As shown in another thread, an analyzed series of matches
between GNU 0.13 and Snowie 4, though won by GNU, actually showed them
to be of equal strength once the luck factor was taken into account.
You can also find hundreds of matches played between Snowie 3 and
Jellyfish at Tony Lezard's site
(http://www.jobstream.com/~tony/backgammon/).

Finally, there has been a very long fantastic series of analyzed games
and positions posted in the GammonLine forum
(http://www.gammonline.com) posted by Michael Depreli. This is the
result of 500 money games played between GNU 0.12 and Snowie 3. The
games were then analyzed by Jellyfish, Snowie 3 and 4, and GNU 0.12,
0.13, and the new 0.14. Each position that was highlighted as an error
by at least one of the bots was then rolled out extensively, and the
results presented with the choices of each bot. Obviously, in each
position, at least one bot got it wrong, and in one case, ALL. So far
more than 400 positions have been published, and here are the results
using Snowie's system of calculating errors. The first number is the
error rate, and the number in parentheses is the number of positions
gotten wrong. Note that if you see a larger error rate but a smaller
number gotten wrong, it is because of the size (equity loss) of the
errors made.

The results after 406 positions:

Overall Checker Errors (>0.039) Total Cube Errors
GNU 0.14 6.963 (85) 4.991 (61) 1.972 (24)
GNU 0.13 7.212 (88) 5.210 (64) 2.002 (24)
Snowie 4 7.872 (85) 5.407 (63) 2.465 (22)
GNU 0.12 9.874 (126) 6.825 (88) 3.049 (38)
Snowie 3 14.200 (167) 9.888 (123) 4.312 (45)
Jellyfish 18.842 (225) 13.835 (175) 5.007 (50)

(results and analysis courtesy of Michael Depreli)

Albert Silver

Albert Silver

unread,
Jun 23, 2003, 8:27:59 PM6/23/03
to
Scott Steiner <big_poppa...@yahoo.com> wrote in message news:<3EF72635...@yahoo.com>...

It has produced odd results, however, Michael Depreli ran a very long
series of games between GNU 0-ply and GNU 1-ply which he then analyzed
with Snowie 4 3-ply precise. The main reason for this last was that
David Montgomery suggested that analyzing with GNU's 2-ply might tend
to favor the even-ply analysis. The conclusion was that the 1-ply
checker play did in fact play very slightly better than the 0-ply
checker play, but that the 1-ply cube decisions were very poor. He
posted these results in the Gammonline Forum.

Albert Silver

>
> Thanks!

Kees van den Doel

unread,
Jun 24, 2003, 1:56:37 AM6/24/03
to

Kees van den Doel

unread,
Jun 24, 2003, 3:50:31 AM6/24/03
to

>The results after 406 positions:

> Overall Checker Errors (>0.039) Total Cube Errors
>GNU 0.14 6.963 (85) 4.991 (61) 1.972 (24)
>GNU 0.13 7.212 (88) 5.210 (64) 2.002 (24)
>Snowie 4 7.872 (85) 5.407 (63) 2.465 (22)
>GNU 0.12 9.874 (126) 6.825 (88) 3.049 (38)
>Snowie 3 14.200 (167) 9.888 (123) 4.312 (45)
>Jellyfish 18.842 (225) 13.835 (175) 5.007 (50)

>(results and analysis courtesy of Michael Depreli)

Forgive me for being blunt, but I conclude from your answer that the
statement "GNU 0.13 and GNU 0.14 are considerably stronger than
Jellyfish" is based on NOTHING.

It's like saying "I once went swimming and I got bitten by this
jellyfish which looked just like a slug but when I went hiking the slugs
didn't bite me at all so the slugs must be a lot smarter."


Kees (The message should disbelieve as by similar phenomena.)

Mike Howard

unread,
Jun 24, 2003, 4:50:43 AM6/24/03
to
>
> The results after 406 positions:
>
> Overall Checker Errors (>0.039) Total Cube Errors
> GNU 0.14 6.963 (85) 4.991 (61) 1.972 (24)
> GNU 0.13 7.212 (88) 5.210 (64) 2.002 (24)
> Snowie 4 7.872 (85) 5.407 (63) 2.465 (22)
> GNU 0.12 9.874 (126) 6.825 (88) 3.049 (38)
> Snowie 3 14.200 (167) 9.888 (123) 4.312 (45)
> Jellyfish 18.842 (225) 13.835 (175) 5.007 (50)
>
> (results and analysis courtesy of Michael Depreli)
>
> Albert Silver
>
> >
> >
Fascinating stuff. Who rolled out the positions though? I frequently
get hugely different results in rollouts when using JellyFish and
Snowie on a tricky position. If you are using just one of these bots
each time it will skew the reuslt immensely and reduce the scientific
authenticity. IMHO.

Rgds,
M

Albert Silver

unread,
Jun 24, 2003, 8:42:44 AM6/24/03
to
kvan...@xs2.xs4all.nl (Kees van den Doel) wrote in message news:<3ef802c7$0$132$e4fe...@dreader4.news.xs4all.nl>...

As long as NOTHING equals match results, analysis, and expert opinion,
then I agree.

>
> It's like saying "I once went swimming and I got bitten by this
> jellyfish which looked just like a slug but when I went hiking the slugs
> didn't bite me at all so the slugs must be a lot smarter."

That's one way of looking at it. Another is to actually look at the
data. That's ok though, if you want to believe that Jellyfish is the
king of the hill, you won't find me insisting otherwise.

Albert Silver

Albert Silver

unread,
Jun 24, 2003, 8:52:02 AM6/24/03
to
mi...@howard666.freeserve.co.uk (Mike Howard) wrote in message news:<c40114a8.03062...@posting.google.com>...

It probably depends on the settings you use too though. In this case,
the vast majority of the positions were rolled out using Snowie 3.2 at
2-ply Huge and 3-ply cube with 1296 trials in a full rollout. There
were positions where he used Snowie 4 and even GNU 0.13. The reason
for these cases was precisely because the author felt the Snowie 3
rollouts could be suspect due to improper play in a position it didn't
understand, or improper cube handling. The results were questioned
OFTEN BTW, and many, including Backgammon Giant Neil Kazaross and US
master Chuck Bower (an ardent user of Jellyfish), but their own
extensive rollouts using Snowie 4 or even Jellyfish always confirmed
the results. The only thing that varied was the size of the equity
error from one move to the next, but that's to be expected. It was
rather interesting to see the weaknesses of the bots highlighted this
way. Some of Jellyfish's errors were pretty astounding to be honest,
but it also got some tough positions right to be fair.

Albert Silver

>
> Rgds,
> M

jthyssen

unread,
Jun 24, 2003, 9:54:02 AM6/24/03
to
kvan...@xs2.xs4all.nl (Kees van den Doel) wrote in message

[snip]


> Forgive me for being blunt, but I conclude from your answer that the
> statement "GNU 0.13 and GNU 0.14 are considerably stronger than
> Jellyfish" is based on NOTHING.
>
> It's like saying "I once went swimming and I got bitten by this
> jellyfish which looked just like a slug but when I went hiking the slugs
> didn't bite me at all so the slugs must be a lot smarter."

You have to explain that analogy to me... :-)

In your opinion, what's wrong with the numbers produced by Michael
Depreli? -- and why can't you conclude that Jellyfish is weaker than
the other bots based on these numbers?

Jørn

Mike Howard

unread,
Jun 24, 2003, 3:05:14 PM6/24/03
to
>
> It probably depends on the settings you use too though. In this case,
> the vast majority of the positions were rolled out using Snowie 3.2 at
> 2-ply Huge and 3-ply cube with 1296 trials in a full rollout. There
> were positions where he used Snowie 4 and even GNU 0.13. The reason
> for these cases was precisely because the author felt the Snowie 3
> rollouts could be suspect due to improper play in a position it didn't
> understand, or improper cube handling. The results were questioned
> OFTEN BTW, and many, including Backgammon Giant Neil Kazaross and US
> master Chuck Bower (an ardent user of Jellyfish), but their own
> extensive rollouts using Snowie 4 or even Jellyfish always confirmed
> the results. The only thing that varied was the size of the equity
> error from one move to the next, but that's to be expected. It was
> rather interesting to see the weaknesses of the bots highlighted this
> way. Some of Jellyfish's errors were pretty astounding to be honest,
> but it also got some tough positions right to be fair.
>
> Albert Silver
Thanks for the clarification. I wish someone would post the accepted
most reliable settings for rollouts using the 3 bots here as I find it
frustrating to get inconsistent results when using the supposedly most
accurate analysis methodology. By the way 1296 games isn't really
enough is it? 3888 would be a better minimum.
My big gripe and slight scepticism with this is that we are blythely
using figures to 3 decimals as if they are really meaningful. Yet a
tiny tweek with the settings of the bots or rollouts can alter these
figures massively. For me this invalidates the figures completely.
I'm not saying the above results were wrong in concept - just in
number detail -probably.
The order of the strength of the bots is of course exactly what I
would have expected.
Thanks again,
> > Rgds,
> > M

Kees van den Doel

unread,
Jun 24, 2003, 3:06:06 PM6/24/03
to
In article <36775ed0.03062...@posting.google.com>,
jthyssen <j...@chem.sdu.dk> wrote:

>> Forgive me for being blunt, but I conclude from your answer that the
>> statement "GNU 0.13 and GNU 0.14 are considerably stronger than
>> Jellyfish" is based on NOTHING.

>> It's like saying "I once went swimming and I got bitten by this
>> jellyfish which looked just like a slug but when I went hiking the slugs
>> didn't bite me at all so the slugs must be a lot smarter."

>You have to explain that analogy to me... :-)

>In your opinion, what's wrong with the numbers produced by Michael
>Depreli? -- and why can't you conclude that Jellyfish is weaker than
>the other bots based on these numbers?

Well, maybe can can, but I got kinda irritated by the long ramblings
about expert opinion, so I sorta assumed the data must be gibberish too.

It's like claiming that Murat is taller than Zare and then trying to
prove it by saying all the length experts agree he is taller and their
medical records show that Murat sustained more headinjuries by bumping
his head against the ceiling, instead of simply measuring their heights.

The claim just begs to be settled by letting them (GNUBG and JF) play
each other till the issue is settled. Of course it's more interesting to
play GNU-Snowie since at least JellyFish has an honest free version so
we know its ethics are at least superior.


Kees (You mean, but, logically speaking, you probably know your demand
that out is, tik druken.)

amni

unread,
Jun 24, 2003, 6:41:41 PM6/24/03
to
they make questionable assumptions about measurments.

I would believe only to direct measurments of substantial samples like:
* a sessin 0f 1000 money games where one bot is better than the other by
average 0.1 point per game,
* or a session of 5000 money games where one bot is better than the other by
average of 0.05 point per game,
* or 500 matches of 25 points where one bot is better than the other by
at least 25 matches.

Again, as I said above, my personal experience is that Jelly Fish
is (Level 7\ 1000 units) is substantially stronger that
GNUBG 13.1 GRANDMASTER
(3 ply checker play no noise, 3 ply cube decision no noise).


j...@chem.sdu.dk (jthyssen) wrote in message news:<36775ed0.03062...@posting.google.com>...

Tom Keith

unread,
Jun 24, 2003, 6:41:41 PM6/24/03
to
Kees van den Doel wrote:
>
> The claim just begs to be settled by letting them (GNUBG and JF) play
> each other till the issue is settled.

Go for it, Kees! If you have a Windows computer, the tools you need are
all easily available:

Jellyfish Player (plays a great game of backgammon) is at
http://jelly.effect.no/

Gnu Backgammon for Windows (also plays a great game of backgammon
and includes analysis tools) is at
http://home.online.no/~oeysteij/

Dueller (a program that acts as an intermediary, automatically playing
Jellyfish, GnuBG, and/or Snowie against one another) is at
http://www.jobstream.com/~tony/backgammon/

(Many thanks to Jellyfish AS, the GnuBG Team, and Tony Lezard for
making these wonderful programs available for free!)

Tom Keith
http://bkgm.com

Tom Keith

unread,
Jun 24, 2003, 7:20:45 PM6/24/03
to
amni wrote:
>
> Again, as I said above, my personal experience is that Jelly Fish
> is (Level 7\ 1000 units) is substantially stronger that
> GNUBG 13.1 GRANDMASTER

This might be possible. Each bot has its own strengths and weaknesses.
For your particular style of play, JF might be a stronger opponent than
GnuBG.

But don't underestimate the strength of GnuBG. It is a very strong
player. Or have you found some weakness in GnuBG that you are able to
exploit? If you have, please post a position!

Tom Keith
http://bkgm.com

Tom Keith

unread,
Jun 24, 2003, 7:31:58 PM6/24/03
to
The differences, I think, are the ones listed on this page:
http://jelly.effect.no/jelly35.htm

The most important difference is that JF 3.5 Lite is free.

Tom

LostVegan wrote:


>
> On Tue, 24 Jun 2003 22:41:41 GMT, Tom Keith <t...@bkgm.com> wrote:
>
> >Jellyfish Player (plays a great game of backgammon) is at
> > http://jelly.effect.no/
> >

> What's the difference between JF 3.5 Lite and JF 3.0 Player?
>
> --
> Marty (to respond via email, drop 'your.pants')
>
> "to be yourself, in a world that tries, night and day, to make
> you just like everybody else - is to fight the greatest battle
> there ever is to fight, and never stop fighting" -- e.e. cummings

Tom Keith

unread,
Jun 24, 2003, 7:39:57 PM6/24/03
to
Or maybe both versions are free. I see on page
http://jelly.effect.no/whatis.htm#differences
that the Player version includes position editing and file saving
and loading.

Tom

Douglas Zare

unread,
Jun 24, 2003, 8:01:03 PM6/24/03
to

amni wrote:

> they make questionable assumptions about measurments.

Exactly which assumptions do you find questionable?
I think there are flaws in Depreli's procedure, but they
seem rather subtle. I don't see anything that should
produce a huge bias against Jellyfish.

> I would believe only to direct measurments of substantial samples like:
> * a sessin 0f 1000 money games where one bot is better than the other by
> average 0.1 point per game,

That's not enough. The variance of a money game
is about 10 square points per game, so after
1000 games, 100 points is 1 standard deviation.

> Again, as I said above, my personal experience is that Jelly Fish
> is (Level 7\ 1000 units) is substantially stronger that
> GNUBG 13.1 GRANDMASTER
> (3 ply checker play no noise, 3 ply cube decision no noise).

Could you post some positions that you feel gnu
gets wrong on those settings and Jellyfish gets right?

Douglas Zare

Kees van den Doel

unread,
Jun 25, 2003, 1:31:59 AM6/25/03
to
In article <3EF8D4C7...@ETEbkgm.com>, Tom Keith <t...@bkgm.com> wrote:
>Kees van den Doel wrote:

>Go for it, Kees! If you have a Windows computer, the tools you need are
>all easily available:

>Dueller (a program that acts as an intermediary, automatically playing


>Jellyfish, GnuBG, and/or Snowie against one another) is at
> http://www.jobstream.com/~tony/backgammon/

For some bizarre reason it does not work on Windows ME which is wehat I
have.


Kees (Explaining for keesing my notes, and Pops n cutie-pie right
decision while standing on news.groups, which includes the
demented forger picks his shoes.)

jthyssen

unread,
Jun 25, 2003, 3:25:01 AM6/25/03
to
kvan...@xs2.xs4all.nl (Kees van den Doel) wrote in message news:<3ef8a11d$0$131$e4fe...@dreader4.news.xs4all.nl>...

> In article <36775ed0.03062...@posting.google.com>,
> jthyssen <j...@chem.sdu.dk> wrote:
>
> >> Forgive me for being blunt, but I conclude from your answer that the
> >> statement "GNU 0.13 and GNU 0.14 are considerably stronger than
> >> Jellyfish" is based on NOTHING.
>
> >> It's like saying "I once went swimming and I got bitten by this
> >> jellyfish which looked just like a slug but when I went hiking the slugs
> >> didn't bite me at all so the slugs must be a lot smarter."
>
> >You have to explain that analogy to me... :-)
>
> >In your opinion, what's wrong with the numbers produced by Michael
> >Depreli? -- and why can't you conclude that Jellyfish is weaker than
> >the other bots based on these numbers?
>
> Well, maybe can can, but I got kinda irritated by the long ramblings
> about expert opinion, so I sorta assumed the data must be gibberish too.

I agree that expert opinions are not objective measures of the bots
strengths.



> The claim just begs to be settled by letting them (GNUBG and JF) play
> each other till the issue is settled.

I guess you know that this requires a gazillion matches to be played!?

Michael Depreli's method is more interesting, since there is no (or
very little) luck involved: he has just sampled 400+ positions, do
rollouts, and find the average error by the bots.

Luck enters this experiment twice: the positions sampled and in the
rollouts. I don't think we should worry about luck in the rollouts,
since they're performed with variance reduction (I assume). Also, it
seems unlikely that any bot should be favoured by luck in rollouts.

Bias in the sampled positions are more likely, perhaps random sampling
would have been better?

Jørn

Mike Howard

unread,
Jun 25, 2003, 4:57:58 AM6/25/03
to
> Could you post some positions that you feel gnu
> gets wrong on those settings and Jellyfish gets right?
>
> Douglas Zare

How would you verify what was right and what was wrong? You'd need
everyone to agree on the method of measurement. Rollouts? By whom?
With what settings? How many etc etc. You can hardly expect the Fish
to accept rollouts by GNU and vice versa!!

All the rollouts by differnt bots I've seen differ hugely in resulting
equity numbers. The 'industry' needs a universally accepted rollout
methodology.

amni

unread,
Jun 25, 2003, 5:23:56 AM6/25/03
to
Douglas Zare <za...@math.columbia.edu> wrote in message news:<3EF8E76F...@math.columbia.edu>...

> amni wrote:
>
> > they make questionable assumptions about measurments.
>
> Exactly which assumptions do you find questionable?
> I think there are flaws in Depreli's procedure, but they
> seem rather subtle. I don't see anything that should
> produce a huge bias against Jellyfish.
>


ANY assumption is questionable. Anyway, I think that
discussion about validity of any assumption is
a DIGRESSION. Let's the direct results speak for
themselves, because it is hard to refute FACTS
(FACTS= the direct results of very large sample session).

>
> Could you post some positions that you feel gnu
> gets wrong on those settings and Jellyfish gets right?
>

It is not a special position which I rely on
but the overall results.

With GNU I win every second session when his level
is GRANDMASTER checkers and cube (3 ply no noise for checkers
play and cube decisions), I run TUTOR mode only for warning
(I don't look at the hints and majority of the cases I ignore its warnings.
And definitely I'm not GRAND MASTER.

With Jelly Fish he'll beat me at least 3 of 4 money game session
(each session with JF is 100 games) and the results are even
worse if I'm not highly concetrated.



> Douglas Zare

amni

unread,
Jun 25, 2003, 7:39:22 AM6/25/03
to
Dueller README.TXT says that GNUBG works only under
WIN_NT (icluding WIN2000 and WIN_XP).

amni


kvan...@xs2.xs4all.nl (Kees van den Doel) wrote in message news:<3ef933cf$0$153$e4fe...@dreader8.news.xs4all.nl>...

Brad Davis

unread,
Jun 25, 2003, 8:33:46 AM6/25/03
to
Douglas Zare <za...@math.columbia.edu> wrote in message >

> I think there are flaws in Depreli's procedure, but they
> seem rather subtle.

Could you elaborate on this point please?


Thanks


Brad

Douglas Zare

unread,
Jun 25, 2003, 8:37:54 AM6/25/03
to

amni wrote:

> Douglas Zare <za...@math.columbia.edu> wrote in message news:<3EF8E76F...@math.columbia.edu>...
> > amni wrote:
> >
> > > they make questionable assumptions about measurments.
> >
> > Exactly which assumptions do you find questionable?
> > I think there are flaws in Depreli's procedure, but they
> > seem rather subtle. I don't see anything that should
> > produce a huge bias against Jellyfish.
>
> ANY assumption is questionable. Anyway, I think that
> discussion about validity of any assumption is
> a DIGRESSION.

In other words, you don't want to try to back up
your statement that makes it sound like you have
found real flaws with Depreli's methodology.
There is a time to ask, "What is truth? How can
we know anything?" That time is a freshman
philosophy class, not a graduate course on
numerical methods in chemistry.

> Let's the direct results speak for
> themselves, because it is hard to refute FACTS
> (FACTS= the direct results of very large sample session).

You haven't given any such results yet.

> > Could you post some positions that you feel gnu
> > gets wrong on those settings and Jellyfish gets right?
>
> It is not a special position which I rely on
> but the overall results.

There are many positions where Jellyfish and
gnu disagree. If Jellyfish is much stronger, then
gnu must be giving up a lot of equity. It should
not be hard to find positions of this sort. If you
have played a lot against both, you should
have no trouble seeing the difference in styles.

> With GNU I win every second session when his level
> is GRANDMASTER checkers and cube (3 ply no noise for checkers
> play and cube decisions), I run TUTOR mode only for warning
> (I don't look at the hints and majority of the cases I ignore its warnings.
> And definitely I'm not GRAND MASTER.
>
> With Jelly Fish he'll beat me at least 3 of 4 money game session
> (each session with JF is 100 games) and the results are even
> worse if I'm not highly concetrated.

Oh, anecdotal ecidence. Why reject systematically
collected data, and accept session results? That
seems silly, particularly if you don't know what a
significant result is.

I've often seen sessions in which one player wins 1
point per game from another player. Do you know
how long such streaks should be common if the
players are equal? It can also happen that a weak
player, who expects to lose a half point per game,
instead wins a half point per game.

If you can consistently beat gnu 3-ply, turn pro.
But if half of the time you lose 80 points and
half of the time you win 40 points, well, that's
the way backgammon goes.

Given the chance, would you back
Jellyfish level 7, 1000, against gnu 13 3-ply?
I, and a lot of other people, would take the
other side of that bet for serious stakes.
Do you believe your impressions enough
to bet on them?

Douglas Zare

Douglas Zare

unread,
Jun 25, 2003, 9:00:08 AM6/25/03
to

Mike Howard wrote:

> > Could you post some positions that you feel gnu
> > gets wrong on those settings and Jellyfish gets right?
>

> How would you verify what was right and what was wrong? You'd need
> everyone to agree on the method of measurement. Rollouts? By whom?
> With what settings? How many etc etc. You can hardly expect the Fish
> to accept rollouts by GNU and vice versa!!

I disagree with your statement that rollouts by different bots
differ with each other by so much as to be useless. It has
been further tested during this series that while bots may
have radically different evaluations their rollouts are
usually very similar.

There are many situations in which one can predict that
this will not be the case, and I described some of them in
my column on rollouts in GammonVillage.com in
October 2002. Most of the time, if long rollouts say
a play is better by 0.050 on one bot, that play will roll
out better on all of the other bots, typically by roughly
the same amount, even though the evaluations often differ
by more than 0.100 EMG.

I've posted a series of problems on Gammonline.com
based on positions that have arisen in real life at the
New England Backgammon Club. On most of the
problems, I have run multiple rollouts, sometimes on up
to 4 different bots. It is very surprising when the high-level
rollouts disagree.

> All the rollouts by differnt bots I've seen differ hugely in resulting
> equity numbers. The 'industry' needs a universally accepted rollout
> methodology.

Different rollouts are appropriate for different positions.
Some are simply a waste of time in a simple position, or
are predictably too weak to trust in another type of position.
There are some types of positions where every rollout
is untrustworthy.

Douglas Zare

Douglas Zare

unread,
Jun 25, 2003, 9:31:09 AM6/25/03
to

Brad Davis wrote:

Well, I'm not the one claiming that the series is nonsense.

However, one issue is that the positions are drawn from
a session between gnu 12 and Snowie 3. These
might tend to hit positions that a nonparticipating bot
would rarely encounter. For example, Jellyfish often
seems to prefer to have spares on the 5, 4, and 3 rather
than on the 6, 5, and 1 after closing out a checker. Snowie
prefers the reverse, and given spares on the 654 it would
play a 3 4/1 rather than 6/3. Jellyfish may then be docked for
misplaying in a position it would not have created.
Maybe it would play significantly better in positions
that would actually arise if Jellyfish is one of the players,
or from the side that Jellyfish plays. This can be a serious
problem if one hits extremely flexible positions in
which there is a lot of choice, such as when there is only
midpoint versus midpoint contact, or both sides have
many checkers sent back.

Another issue is that some bots may have been trained
toward the rollouts of earlier bots. Rollouts tend to agree
with each other, but if they don't, the differences may favor
the bots trained to have evaluations that look like rollouts
of the bot performing the rollouts. Even if the rollouts
agree, bots trained toward rollout results may resemble
rollout results more than they resemble the actual equity
in the position. It's may be like showing some students
the exam with the answer key the day before the exam,
given that the answer key is wrong on some problems.
Not only do weaker students have a better chance to
pretend to get the correct answers right, the students who
see the wrong answers on the key have an unfair
advantage over someone who simply knows the material,
since they have a chance to get the flawed questions
graded correct.

Another issue is that it is unclear how the positions
were selected for rollouts. Some were chosen because
a bot disagreed with another by a large amount
(in EMG). However, some were chosen because
the decision was interesting, or there was an unchosen
play that Michael Depreli wanted to compare to the
bots' plays. Sometimes that play was right, and it is
personally interesting for me, but it is better if an
experimental method does not involve judgement calls.
There is also the fact that Jellyfish does not use EMG,
while the other bots do, so if Jellyfish disagrees with
a play made by another bot, I think the other bots have
to believe there is a difference for the play to be chosen.

This is not intended to be a complete list, but I don't
believe they cause the difference between the errors
of Snowie 4 or gnu 14 versus Jellyfish.

Douglas Zare

Albert Silver

unread,
Jun 25, 2003, 11:37:55 AM6/25/03
to
kvan...@xs2.xs4all.nl (Kees van den Doel) wrote in message news:<3ef933cf$0$153$e4fe...@dreader8.news.xs4all.nl>...

> In article <3EF8D4C7...@ETEbkgm.com>, Tom Keith <t...@bkgm.com> wrote:
> >Kees van den Doel wrote:
>
> >Go for it, Kees! If you have a Windows computer, the tools you need are
> >all easily available:
>
> >Dueller (a program that acts as an intermediary, automatically playing
> >Jellyfish, GnuBG, and/or Snowie against one another) is at
> > http://www.jobstream.com/~tony/backgammon/
>
> For some bizarre reason it does not work on Windows ME which is wehat I
> have.

Dueller only runs on Windows NT/2000/XP

Albert Silver

Albert Silver

unread,
Jun 25, 2003, 11:47:49 AM6/25/03
to
j...@chem.sdu.dk (jthyssen) wrote in message news:<36775ed0.03062...@posting.google.com>...

Why? The bots chose the positions, not him. Just so that the
methodology is clear: a 500-game session was played between between
Snowie 3.2 and GNU 0.12. The entire session was then analyzed by each
and every bot and every 0.040 error or larger highlighted by the bot
was separated as one of the positions. A large number of the positions
were specifically chosen by Jellyfish as it declared the chosen move
to be a significant error. The positions chosen by the bots were then
rolled out, and each bot was allowed a chance to see if it got it
right. He rolled out the top candidates, and in some cases the
rollouts revealed a move to be best that was chosen by none of the
bots. In any case, I think that having all the bots (and only the
bots) highlight moves and test themselves is a fairly objective
methodology.

Albert Silver

>
> Jørn

Brad Davis

unread,
Jun 25, 2003, 11:48:04 AM6/25/03
to
j...@chem.sdu.dk (jthyssen) wrote in message

> Bias in the sampled positions are more likely, perhaps random sampling
> would have been better?


>
> Jørn

I believe the positions chosen were if *any* of 6 bots found *any* of
the top plays of any other bot to be worse than 0.03 in equity it was
rolled out.
So no single bot was chosing the positions.
In what way can this be biased?


Brad

Brad Davis

unread,
Jun 25, 2003, 11:52:01 AM6/25/03
to

> All the rollouts by differnt bots I've seen differ hugely in resulting
> equity numbers.


*All* and *Hugely* Really???
Please define hugely and give some examples using the latest BOTS and
the number of trials ply etc.

Brad

Mike Howard

unread,
Jun 25, 2003, 1:06:19 PM6/25/03
to
> >
> > How would you verify what was right and what was wrong? You'd need
> > everyone to agree on the method of measurement. Rollouts? By whom?
> > With what settings? How many etc etc. You can hardly expect the Fish
> > to accept rollouts by GNU and vice versa!!
>
........

>
> > All the rollouts by differnt bots I've seen differ hugely in resulting
> > equity numbers. The 'industry' needs a universally accepted rollout
> > methodology.
> ************************************************************

> Different rollouts are appropriate for different positions.
> Some are simply a waste of time in a simple position, or
> are predictably too weak to trust in another type of position.
> There are some types of positions where every rollout
> is untrustworthy.
>
> Douglas Zare

I think that last para makes my point for me Douglas. There is no one
method and you seem to be saying an'expert' is needed on hand to
decide what measurement technique to use. Once humans get involved
the tests fall into disrepute. I will re-iterate that my rollout
experience has shown that two bots often differ eough to make the
numbers themselves of limited value, sometimes even as much as .070
equity. The order of preferred moves is I agree 'usually' the same
though.
A bot will rollout based on its own evaluation algorithms and each one
will therefore be different from any other bots rollout.

Jørn Thyssen

unread,
Jun 25, 2003, 1:21:01 PM6/25/03
to
Albert Silver wrote:
> j...@chem.sdu.dk (jthyssen) wrote in message news:<36775ed0.03062...@posting.google.com>...
>
>>Bias in the sampled positions are more likely, perhaps random sampling
>>would have been better?
>
>
> Why? The bots chose the positions, not him. Just so that the
> methodology is clear: a 500-game session was played between between
> Snowie 3.2 and GNU 0.12.

Douglas Zare touches the subject elsewhere in this thread.

This (very) minor flaw can probably be corrected by adding positions
where Jellyfish plays either snowie or gnubg (or both).

Jørn


Jørn Thyssen

unread,
Jun 25, 2003, 1:22:27 PM6/25/03
to

Yes, but the session was generated by Snowie and gnubg, hence the
positions are likely to be of a kind that Snowie and gnubg understands
better than Jellyfish.

Douglas Zare touched the subject elsewhere in this thread.

Jørn

Brad Davis

unread,
Jun 25, 2003, 4:43:08 PM6/25/03
to
Douglas Zare <za...@math.columbia.edu> wrote in message news:<3EF9A543...@math.columbia.edu>...

I couldn't find any reference to this point. I think the selection
process is the same in that if JellyFish disagrees, it simply gets
chosen, except that a lower cubeless equity number is used (if I
remember rightly 0.025).

>
> This is not intended to be a complete list, but I don't
> believe they cause the difference between the errors
> of Snowie 4 or gnu 14 versus Jellyfish.

> Douglas Zare

Thank you Douglas. You bring up some interesting points.
How valid do you feel this series is in terms of comparing the
strenghths of the BOTS based on your observations mentioned and those
not?


Brad

amni

unread,
Jun 25, 2003, 6:12:44 PM6/25/03
to
I don't want to discuss wether depredeli is right or wrong,
for me that discussion is a DIGRESSION.

I don't want to discuss wether my personal experience is
right or wrong. For me that discussion is a DIGRESSION.

I don't want to discuss wether your evaluations ar right or wrong.
For me such discussion is a DIGRESSION.

I want the two programs play 1000 full games,
or 5000 full games, or 500 matches of 25 points.
Let these plain results speak for themselves.
Until I see such straighforward results,
the question which bot is stronger remains
not decided.

Douglas Zare <za...@math.columbia.edu> wrote in message news:<3EF998CA...@math.columbia.edu>...

Douglas Zare

unread,
Jun 26, 2003, 3:43:07 PM6/26/03
to
Brad Davis wrote:

> > There is also the fact that Jellyfish does not use EMG,
> > while the other bots do, so if Jellyfish disagrees with
> > a play made by another bot, I think the other bots have
> > to believe there is a difference for the play to be chosen.
>
> I couldn't find any reference to this point. I think the selection
> process is the same in that if JellyFish disagrees, it simply gets
> chosen, except that a lower cubeless equity number is used (if I
> remember rightly 0.025).

For cube actions, Jellyfish reports a decision, not a
numerical expression of how wrong it thinks the alternative
is. Although you can come up with an expression for the
take/pass error in terms of cubeless equity (complicated
by the fact that Jellyfish recognizes the gammon price is
less than 0.5), I doubt this was used, and errors in doubling
or not are harder to quantify using Jellyfish evaluations. I
think this tends to favor Jellyfish, since Jellyfish has a
simpler cube algorithm.

For checker play, I didn't remember seeing that 0.025.
However, 0.025 cubeless equity is typically a bit larger
than 0.040 EMG. If 0.030 EMG was used for the other
bots, then Jellyfish didn't have the same chance to point
out the errors in a checker plays made by the other bots.

> > This is not intended to be a complete list, but I don't
> > believe they cause the difference between the errors
> > of Snowie 4 or gnu 14 versus Jellyfish.
>

> Thank you Douglas. You bring up some interesting points.
> How valid do you feel this series is in terms of comparing the
> strenghths of the BOTS based on your observations mentioned and those
> not?

I don't know. I think it contributes some very interesting
data, such as that the bots seem to make more checker
play errors than cube errors (as do all top human players),
but on cube decisions perhaps it is more likely for a
decision to be ignored because all bots were wrong.
I think it is clear that Jellyfish has some very serious
weaknesses. However, without confidence intervals on
the data, and an understanding of how much equity is
given up in large errors versus small errors, it is hard
to estimate how much weight should be given to the
results.

Douglas Zare

Brad Davis

unread,
Jun 27, 2003, 8:12:31 AM6/27/03
to
Douglas Zare <za...@math.columbia.edu> wrote in message news:<3EFB4EE5...@math.columbia.edu>...
7> For cube actions, Jellyfish reports a decision, not a

> numerical expression of how wrong it thinks the alternative
> is. Although you can come up with an expression for the
> take/pass error in terms of cubeless equity (complicated
> by the fact that Jellyfish recognizes the gammon price is
> less than 0.5), I doubt this was used, and errors in doubling
> or not are harder to quantify using Jellyfish evaluations. I
> think this tends to favor Jellyfish, since Jellyfish has a
> simpler cube algorithm.
>
> For checker play, I didn't remember seeing that 0.025.
> However, 0.025 cubeless equity is typically a bit larger
> than 0.040 EMG. If 0.030 EMG was used for the other
> bots, then Jellyfish didn't have the same chance to point
> out the errors in a checker plays made by the other bots.

Which if we believe from the results published so far from the series
is probably a good thing for JellyFish, as pointing out what it
believes are errors by other bots will more often than not backfire.


>
> > > This is not intended to be a complete list, but I don't
> > > believe they cause the difference between the errors
> > > of Snowie 4 or gnu 14 versus Jellyfish.
> >
> > Thank you Douglas. You bring up some interesting points.
> > How valid do you feel this series is in terms of comparing the
> > strenghths of the BOTS based on your observations mentioned and those
> > not?
>
> I don't know. I think it contributes some very interesting
> data, such as that the bots seem to make more checker
> play errors than cube errors (as do all top human players),
> but on cube decisions perhaps it is more likely for a
> decision to be ignored because all bots were wrong.

Equally so if all the bots get the checker play wrong.

Missing these and the cube actions wouldn't skew the data in terms of
relative strength though would it?

> I think it is clear that Jellyfish has some very serious
> weaknesses. However, without confidence intervals on
> the data, and an understanding of how much equity is
> given up in large errors versus small errors, it is hard
> to estimate how much weight should be given to the
> results.
>
> Douglas Zare

Brad

Mike Howard

unread,
Jun 28, 2003, 11:49:57 AM6/28/03
to
> > I would believe only to direct measurments of substantial samples like:
> > * a sessin 0f 1000 money games where one bot is better than the other by
> > average 0.1 point per game,
>
> That's not enough. The variance of a money game
> is about 10 square points per game, so after
> 1000 games, 100 points is 1 standard deviation.
>
> > Again, as I said above, my personal experience is that Jelly Fish
> > is (Level 7\ 1000 units) is substantially stronger that
> > GNUBG 13.1 GRANDMASTER
> > (3 ply checker play no noise, 3 ply cube decision no noise).
>
> Douglas Zare

In response to the above idea I started a session of 1000 games
between JellFish and GNU. After 189 games GNU is ahead by 75 points.
It is easy to see from this tiny sample alone why 1000 games simply
isn't enough. In one very volatile game the cube got to 64 and GNU
shaded it. Take out that one game or reverse the result and it makes
all the difference. My impression watching the two fight it out is
that in human terms there is little difference between them and the
scores flow back and forth pretty evenly. I do wonder about some of
the Take decisions though from the Fish which to me looked very close
and probably a drop for most of us. It may have been better for me to
select 'cautious' in the JellyFish settings?

Douglas, how many games are necessary to make it mathematically
useful? 1,000,000 ?? - Could take me a while.

amni

unread,
Jun 28, 2003, 6:59:07 PM6/28/03
to
Run 5000 money games where cube limited to 8. Most games do not
double cube more than 8, therefore this is a fair limitation
which indicates the strength of the bots. If one bot is better
by at least 0.05 poimt per game (totally 250 points), it means
that it is a better bot. If no bot is better than the other
by more than average 0.05 point per game -- it means that
the two bots are of the same strength.

Definitely, you don't have to run 1000000 long session.


The claim that a statistical formula applies to that case seems
rubbish. The distribution is not a known stastical distribution
(like "normal distribution"), therefore no statistical
formula is relevant.

Please, specify the setting of each bot. Otherwise, your test is meaningless.
Jelly fish level 7\ 1000 units is substantially stronger thanlevel 7\ 10 units.
I play with it a lot and I can fill that diference.

For GNU I suggest 3 ply checker play (no noise) and
and cube decisions 4 ply (no noise). Set the cashe
to huge to avoid repeated calculation (speed up).

This session may take much more than 10 hours,
depending on your computer speed.


amni

Michael Howard

unread,
Jun 28, 2003, 7:07:00 PM6/28/03
to
Thanks Amni,

I'll get back with the results in due course.
M
"amni" <am...@hotmail.com> wrote in message
news:e3504803.03062...@posting.google.com...

Albert Silver

unread,
Jun 28, 2003, 7:15:40 PM6/28/03
to

I would think you should want to play with the best standard settings
possible. In GNU's case that would simply be Supremo for checker play
and any 2-ply setting for the cube decisions. As to JellyFish, level
7, speed 1000, and using the bearoff database would seem normal.

Albert Silver

Albert Silver

unread,
Jun 28, 2003, 8:53:00 PM6/28/03
to
am...@hotmail.com (amni) wrote in message news:<e3504803.03062...@posting.google.com>...

> Run 5000 money games where cube limited to 8. Most games do not
> double cube more than 8, therefore this is a fair limitation
> which indicates the strength of the bots.

How is limiting the cube going to prove the strength of the bots??

If one bot is better
> by at least 0.05 poimt per game (totally 250 points), it means
> that it is a better bot. If no bot is better than the other
> by more than average 0.05 point per game -- it means that
> the two bots are of the same strength.
>
> Definitely, you don't have to run 1000000 long session.
>
>
> The claim that a statistical formula applies to that case seems
> rubbish. The distribution is not a known stastical distribution
> (like "normal distribution"), therefore no statistical
> formula is relevant.

Yes, math is overrated.

> Please, specify the setting of each bot. Otherwise, your test is meaningless.
> Jelly fish level 7\ 1000 units is substantially stronger thanlevel 7\ 10 units.
> I play with it a lot and I can fill that diference.
>
> For GNU I suggest 3 ply checker play (no noise) and
> and cube decisions 4 ply (no noise). Set the cashe
> to huge to avoid repeated calculation (speed up).

This advice is insane. 3-ply checker and 4-ply cube????? It will take
him a day to finish a single game if not more. 3-ply checker is
already very slow, and I'm assuming an extremely shallow 3-ply level
like Grandmaster where only 2 moves at most are analyzed at 3-ply, but
4-ply-cube will take absolutely ages. On my computer, albeit not
terribly fast (1 GHz Athlon + 256 MB PC2100 Ram), a single move past
the first one (where there is no cube) takes GNU no less than 8-9
minutes with your proposed settings, and in more complex situations I
would expect that to increase considerably.

I'm a a bit curious now about the settings you played on. To propose
such a setting seems to imply you never had it playing at such
settings, and that you therefore do not know how to configure GNU.



> This session may take much more than 10 hours,
> depending on your computer speed.

More like 10-20 *days* depending on his computer speed. Running day
and night of course.

Albert Silver

amni

unread,
Jun 29, 2003, 2:42:59 AM6/29/03
to
Why limit cube to 8 ? because otherwise two or three games in the whole
session with high cube may change dramatically the the whole resuls.
This means that the results depend more on luck than on skill.
My experience is with players of "equal strength" cube
is doubled rarely beyond 8 (say, less than 1 percent).

The settings 3 ply checker, 4 ply cube do not take so much time.
On my celeron 1200 128MB RAM a game takes avaragely
4 minutes. On pentium 4 it may take half time.
I don't know why it takes so much time on your PC,
maybe the compiler was optimized to INTEL proccessors
rather than to AMD.

amni

unread,
Jun 29, 2003, 3:06:59 AM6/29/03
to
++++++++++++++++

silver...@hotmail.com (Albert Silver) wrote in message news:<f9846eb9.03062...@posting.google.com>...

Albert Silver

unread,
Jun 29, 2003, 9:53:26 AM6/29/03
to
> Why limit cube to 8 ? because otherwise two or three games in the whole
> session with high cube may change dramatically the the whole resuls.
> This means that the results depend more on luck than on skill.
> My experience is with players of "equal strength" cube
> is doubled rarely beyond 8 (say, less than 1 percent).
>
> The settings 3 ply checker, 4 ply cube do not take so much time.
> On my celeron 1200 128MB RAM a game takes avaragely
> 4 minutes.

A whole game with those settings on that hardware would *never* take 4
minutes. I can assure you you aren't playing it at 3-ply checker,
4-ply cube. You most likely configured the wrong part and thought you
were playing against those settings. This is quite understandable as
there are many areas to make similar configurations and one can easily
confuse the correct one.

As an example, you could go to the Settings menu and choose Evaluation
and see a menu giving all those choices. However, it would not change
your playing settings, since the evaluation is only for the Hint
window (or Tutor if you are having the tutor follow the Evaluation
settings). You could also go to the Settings menu and change the
Analysis settings, and see the same window to set the plies, but
different. Still, this would be wrong. You need to do two things:

- Go to the Settings menu and select Players. Only in the Players menu
can you set the strength of GNU's play. There you can set the cube to
4-ply and set the checker play to Grandmaster (where only 2 moves at
most are analyzed at 3-ply) or a stronger 3-ply setting.
- Once you do this, go to the Settings menu and select Save Settings
at the bottom otherwise it will change back to the previous default
settings the next time you load GNU.

If you do this, I think you'll find those 4 minutes aren't even enough
for 3 moves at 4-ply cube.

Albert Silver

amni

unread,
Jun 29, 2003, 10:34:53 AM6/29/03
to
I measuured the time of JellyFish game with itself (DEMO MODE)
and it is averagely 1.5 minute per game if the animation is
fast (the strength setting was level 7, 1000 units, with bearof data base.

I cannot measure the time of GNU in the setting which I suggested
but from my experience it is not slower than Jelly Fish.
I suggested strength settings: 3 ply checker play no noise,
4 ply cube decisiom no noise, move filter "huge", default bearoff
data base activated (if any). For fast game the animation should be
fast possible (may save 10 second per game).

These time estimations are for my computer system:
1200 MEGA celeron, 128MB SDRAM, WINDOWS 98.
For newer PC (pentium 4 2.4GIGA and 128 MB RAM the time may be half time).
I don't know if operatin system WINDOWS XP slows down
the game, WINDOWS NT 2000 wouldn't slow down the game. Note that
the program DUEL demands WINDOWS NT to run GNU
(WIN XP is a version of WINDOWS NT).

If indeed average time of a game is 1.5 minute per game,
it means that 5000 games take 7500 minutes (125 hours).
I guess that this is the reason why nobody tested
directly which bot is better.

Douglas Zare

unread,
Jun 29, 2003, 7:23:13 PM6/29/03
to

Mike Howard wrote:

> > > I would believe only to direct measurments of substantial samples like:
> > > * a sessin 0f 1000 money games where one bot is better than the other by
> > > average 0.1 point per game,
> >
> > That's not enough. The variance of a money game
> > is about 10 square points per game, so after
> > 1000 games, 100 points is 1 standard deviation.
> >

> In response to the above idea I started a session of 1000 games
> between JellFish and GNU. After 189 games GNU is ahead by 75 points.
> It is easy to see from this tiny sample alone why 1000 games simply
> isn't enough. In one very volatile game the cube got to 64 and GNU
> shaded it. Take out that one game or reverse the result and it makes
> all the difference. My impression watching the two fight it out is
> that in human terms there is little difference between them and the
> scores flow back and forth pretty evenly. I do wonder about some of
> the Take decisions though from the Fish which to me looked very close
> and probably a drop for most of us. It may have been better for me to
> select 'cautious' in the JellyFish settings?
>
> Douglas, how many games are necessary to make it mathematically
> useful? 1,000,000 ?? - Could take me a while.

I don't think it is necessary to turn caution on. Gnu is not
likely to exploit the known classes of positions in which
Jellyfish goes crazy with the cube.

Games with 64-cubes are quite rare, even though you have
seen one. A 64-cube contributes more than 4000 to the
variance. Most estimates of the variance in backgammon
have been at most 10, meaning that a single 64-cube
contributes as much of a fluctuation as more than 400
normal games. That doesn't happen more than one game
in 400, and in reality it is much less frequent.

The standard deviation is about 0.1 ppg after 1000 games,
and 0.01 ppg after 100,000 games. Obviously if there is a
larger difference between the playing strengths then it will
take fewer games for this to be apparent. You can't set a
constant number of games to determine which player is
stronger, although you can estimate the advantage with a
particular level of accuracy. Rather than use the raw
results, I recommend using unbiased variance reduction on
the games, as I suggested in my article "Hedging Toward
Skill" in GammonVillage. This may give the same level
of accuracy for 1/10 as many games.

Douglas Zare

Douglas Zare

unread,
Jun 29, 2003, 7:33:42 PM6/29/03
to

amni wrote:

> Run 5000 money games where cube limited to 8. Most games do not
> double cube more than 8, therefore this is a fair limitation
> which indicates the strength of the bots. If one bot is better
> by at least 0.05 poimt per game (totally 250 points), it means
> that it is a better bot. If no bot is better than the other
> by more than average 0.05 point per game -- it means that
> the two bots are of the same strength.
>
> Definitely, you don't have to run 1000000 long session.
>
> The claim that a statistical formula applies to that case seems
> rubbish. The distribution is not a known stastical distribution
> (like "normal distribution"), therefore no statistical
> formula is relevant.

You don't need to know the exact distribution, or even the
exact mean and standard deviation, to use a normal
approximation. At least, I don't, since I know enough relevant
theorems.

What sort of confidence interval do you think you get after
5000 games, even with the cube limited to 8? How likely
do you think it is that the results will be wrong by more
than 0.05 ppg? By more than 0.1 ppg?

Douglas Zare

Kees van den Doel

unread,
Jun 30, 2003, 12:28:19 AM6/30/03
to
In article <e3504803.03062...@posting.google.com>,
amni <am...@hotmail.com> wrote:

>Run 5000 money games where cube limited to 8. Most games do not
>double cube more than 8, therefore this is a fair limitation
>which indicates the strength of the bots. If one bot is better
>by at least 0.05 poimt per game (totally 250 points), it means
>that it is a better bot. If no bot is better than the other
>by more than average 0.05 point per game -- it means that
>the two bots are of the same strength.

Do those numbers have any justification besides "<am...@hotmail.com> sez so"?

For what it's worth I ran GNUBG 0 ply against gnubg 1 play for a while
on a money session and ended up with 3994 (0 ply) - 4159 (1 ply).

This is .04 ppg so you conclude 1 ply is no better than 0 ply?


Kees (The ABC of cow's swollen bag of Ouroubourous is DOM en spontaan
last two countries.)

amni

unread,
Jun 30, 2003, 2:51:43 AM6/30/03
to
You are right !!.
I confused the "evaluation" settings with "players"
settings. I'll test time of weaker settings today or tomorrow.
I consider "players" setting 2 ply checkers, 2 ply cube, move filtering
"narrow". THis weaker settings takes approximately the same time as
JellyFish level 7 1000 units.

silver...@hotmail.com (Albert Silver) wrote in message news:<f9846eb9.03062...@posting.google.com>...

Frank Berger

unread,
Jun 30, 2003, 3:50:48 AM6/30/03
to

> The claim that a statistical formula applies to that case seems


> rubbish. The distribution is not a known stastical distribution
> (like "normal distribution"), therefore no statistical
> formula is relevant.

The game between two bots of similiar strength is a random variable
with a value near 0,5.

By the central limit theorem the average of *every* random sample
converges to the normal distribution. Usually 40 random samples are
sufficient to use the assumption of having a normal distribution. So
with the required thousands of games to get a small enough confidence
interval, you can be pretty sure it is distributed like a normal
distribution.

ciao
Frank

amni

unread,
Jun 30, 2003, 7:53:28 AM6/30/03
to
There is no "scientific justification" for any figures.
I chose figures that majority of "professional" players
might agree that they are "good enough" for comparisions,
and that the session wouldn't takes ages.

I don't know the details of your tests. Maybe it shows that
even 1 ply depend more on luck than on skill, therefore
0 ply and 1 ply have same strength.

I suspect that even GNU 2 ply depend too much on luck,
especially in cube decisions. How can 2 ply cube decision
evaluate strategic position (matreialized, say, after 5 or 6 plies ?).


kvan...@xs2.xs4all.nl (Kees van den Doel) wrote in message news:<3effbc63$0$135$e4fe...@dreader4.news.xs4all.nl>...

Albert Silver

unread,
Jun 30, 2003, 9:43:27 AM6/30/03
to
am...@hotmail.com (amni) wrote in message news:<e3504803.0306...@posting.google.com>...

> You are right !!.
> I confused the "evaluation" settings with "players"
> settings. I'll test time of weaker settings today or tomorrow.
> I consider "players" setting 2 ply checkers, 2 ply cube, move filtering
> "narrow". THis weaker settings takes approximately the same time as
> JellyFish level 7 1000 units.

No, this makes no sense. The reason is that no one wanting to play GNU
at full strength will read up on the instructions to reduce the move
filter. It's as if you decided that Snowie 3-ply was too slow (It's a
bit slower than GNU at Supremo) and that Snowie 1-ply should be used.
The fact is that Snowie at full strength is at 3-ply, so if you want
to see it at its best then that is the setting you have to use. Same
goes for GNU. If you deliberately try to slow it down, you not only
won't be testing it at its normal full strength, but it won't
represent the results users will get, since they will play it at its
best, and not the dumbed down settings.

Here is a solution to improve the speed of Supremo by 20-25% with no
loss in strength:

- Set the Supremo level in the checker play
- Click on the Modify button next to the Move Filters
- Go to the 2-ply tab
- In the second line where it says "Add extra..." reduce the 16 to 12.

That's it. I have compared the results of over 20,000 moves analyzed
by the normal Supremo and my modification, and only two times was a
difference found. In both cases it was a very minor difference in a
non-contact bearin so I consider the strength to be the same. Other
settings have shown greater loss of ability compared to Supremo.

Albert Silver

Douglas Zare

unread,
Jun 30, 2003, 1:59:17 PM6/30/03
to

Frank Berger wrote:

> am...@hotmail.com (amni) wrote in message news:<e3504803.03062...@posting.google.com>...
>
> > The claim that a statistical formula applies to that case seems
> > rubbish. The distribution is not a known stastical distribution
> > (like "normal distribution"), therefore no statistical
> > formula is relevant.
> The game between two bots of similiar strength is a random variable
> with a value near 0,5.

Perhaps you mean the number of games won, but a more
relevant variable is the outcome of the game, expressed in
points.

> By the central limit theorem the average of *every* random sample
> converges to the normal distribution.

No. The central limit theorem refers to many things, but
the most basic result requires that you are adding
independent identically distributed random variables
such that the mean and standard deviation exist. You can
generalize this, but the sum of independent Cauchy
random variables is a Cauchy random variable, not
approximately normal.

> Usually 40 random samples are
> sufficient to use the assumption of having a normal distribution. So
> with the required thousands of games to get a small enough confidence
> interval, you can be pretty sure it is distributed like a normal
> distribution.

Yes.

Douglas Zare


Michael Howard

unread,
Jun 30, 2003, 3:59:49 PM6/30/03
to
Have you guys agreed on how to compare the bots yet? The above stuff has
left me behind even if I wanted to understand it.
I don't want to limit the bots in any way unless my machine really can't
cope. I don't think limiting the cube is a great idea although I can see
why you could want to.
I expected to play JF at lvl 7 1000 with bearoff db and GNU at supremo with
2-ply cube. This was how I started them off and the first 189 games took
around 10 hours in a couple of sessions. I can set up to play about 5 hours
daily and another 5 hours nightly for ever. Someone just tell me how many
games to play and what technical data is required from the results to make a
comparison that is meaningful.
If we can't agree on this simple test, then all the arguments about which
bot is best becomes so much hot air.
Rgds,
M

"Douglas Zare" <za...@math.columbia.edu> wrote in message
news:3F007C98...@math.columbia.edu...

amni

unread,
Jul 1, 2003, 12:19:13 AM7/1/03
to
Hi Michael,

I'll give some suggesstions about _direct measurments_ in a day or two.
I'm running some time tests, whuch are absolutely needed
because GNUBG seems _very slow_ comparing to JellyFish.
I'll give very presise instructions, not theoretical debates.

Please, tell me what is the computer system you are working with
(mainly proccessor and it's speed, and RAM).

"Michael Howard" <mi...@howard666.freeserve.co.uk> wrote in message news:<bdq4rp$sjd$1...@news7.svr.pol.co.uk>...

Kees van den Doel

unread,
Jul 1, 2003, 3:58:57 AM7/1/03
to
I was wondering if you run a money game session for a while and the
score is n-m, what's the relative FIBS rating of the participants?

Maybe someone can ask Zare who probably knows but I pissed him off and
he doesn't want to talk to me anymore.


Kees (But continue Control: version of Holland that falls outside you
piss your mummy doesn't agree with F alto.)

Frank Berger

unread,
Jul 1, 2003, 4:23:39 AM7/1/03
to
Douglas Zare <za...@math.columbia.edu> wrote in message news:<3F007C98...@math.columbia.edu>...
> Frank Berger wrote:
>
> > am...@hotmail.com (amni) wrote in message news:<e3504803.03062...@posting.google.com>...
> >
> > > The claim that a statistical formula applies to that case seems
> > > rubbish. The distribution is not a known stastical distribution
> > > (like "normal distribution"), therefore no statistical
> > > formula is relevant.
> > The game between two bots of similiar strength is a random variable
> > with a value near 0,5.
>
> Perhaps you mean the number of games won, but a more
> relevant variable is the outcome of the game, expressed in
> points.
Naturally you are right. I use usually cubeless rollouts, therefore my
error :-)

> > By the central limit theorem the average of *every* random sample
> > converges to the normal distribution.
>
> No. The central limit theorem refers to many things, but
> the most basic result requires that you are adding
> independent identically distributed random variables
> such that the mean and standard deviation exist. You can
> generalize this, but the sum of independent Cauchy
> random variables is a Cauchy random variable, not
> approximately normal.

Oh well, my lessons in statistics are about 20 years gone, this was
something i scratched from very dark edges of my brain.
But isn't it true, that if you add up random samples of *any*
distribution the resulting average is normal distributed? I never
heard of Cauchy distribution, but that might be related to the fact
that I'm not a mathematican but an economist...

> > Usually 40 random samples are
> > sufficient to use the assumption of having a normal distribution. So
> > with the required thousands of games to get a small enough confidence
> > interval, you can be pretty sure it is distributed like a normal
> > distribution.
>
> Yes.

*sigh* at least here I haven't failed :-)

So to conclude: The method to play some thousand games is a valid
approach and a variance analysis provides reasonable numbers.

ciao
Frank

Michael Howard

unread,
Jul 1, 2003, 4:34:35 AM7/1/03
to
The machine I have available for the test is only a 750mhz P3 I think with
200Mb RAM. However, it is free 10 hours out of 24 so within reason it
doesn't really matter too much if it takes a while. I reckon 100 games
every day would be OK.
I will run a test for my own benefit in any case even if you guys can't
agree how valid the results are. Personally I think I will be probably be
persuaded after some 5000 games. Almost every similar test (though much
smaller) I made between JellyFish 3.5 and Snowie 2 showed a very tiny
advantage to Snowie which is what most 'experts' thought at the time.
Rgds,

M
"amni" <am...@hotmail.com> wrote in message
news:e3504803.0306...@posting.google.com...

Albert Silver

unread,
Jul 1, 2003, 9:14:01 AM7/1/03
to
"Michael Howard" <mi...@howard666.freeserve.co.uk> wrote in message news:<bdrh33$rnd$1...@news7.svr.pol.co.uk>...

> The machine I have available for the test is only a 750mhz P3 I think with
> 200Mb RAM. However, it is free 10 hours out of 24 so within reason it
> doesn't really matter too much if it takes a while. I reckon 100 games
> every day would be OK.
> I will run a test for my own benefit in any case even if you guys can't
> agree how valid the results are. Personally I think I will be probably be
> persuaded after some 5000 games.

I think your use of JF at level 7/100 units and GNU at Supremo is
perfect. Also, if you send the matches (zip them all, please)
generated by Dueller to Joern Thyssen, he might analyze them as he did
my series against Snowie 4 to give the result (and proper relative
strength) factoring in the luck.

One question: are you using GNU 0.14, released 3 weeks ago, or are you
using the older weights? If not the 0.14, I'd beg of you to restart
using the 0.14 weights as they are what users will be downloading and
using.

In any case, thanks for the testing, I know how time-consuming this
can be.

Albert Silver

Michael Howard

unread,
Jul 1, 2003, 1:10:22 PM7/1/03
to
I'll be using the latest build curently available to download from the site.
Do I have Joern's e-mail address?
Thanks,
Michael
"Albert Silver" <silver...@hotmail.com> wrote in message
news:f9846eb9.03070...@posting.google.com...

amni

unread,
Jul 1, 2003, 4:50:07 PM7/1/03
to
I'll give detailed suggestions tomorrow.
Essentially, checker "supremo" and cube decision "world class"
are the first choice. The problem is that it mught take
houndred hours.

amni


"Michael Howard" <mi...@howard666.freeserve.co.uk> wrote in message news:<bdrh33$rnd$1...@news7.svr.pol.co.uk>...

amni

unread,
Jul 2, 2003, 6:09:35 PM7/2/03
to
I finished the details, but I want to make
proof-reading tomorrow befoe posting them.

amni


am...@hotmail.com (amni) wrote in message news:<e3504803.03070...@posting.google.com>...

David Brotherton

unread,
Jul 3, 2003, 12:03:53 PM7/3/03
to
> Oh well, my lessons in statistics are about 20 years gone, this was
> something i scratched from very dark edges of my brain.
> But isn't it true, that if you add up random samples of *any*
> distribution the resulting average is normal distributed? I never
> heard of Cauchy distribution, but that might be related to the fact
> that I'm not a mathematican but an economist...

Not quite *any* distribution, and that's why the Cauchy distribution
is usually a good counter-example for things like this - the Cauchy
distribution is probably the best known distribution for which the
expected value (i.e. mean, or average) does not exist. In other
words, the integral that defines the expected value for a Cauchy
distribution diverges. The central limit theorem requires that the
distribution in question have a finite expected value, and the Cauchy
counterexample shows why that requirement is needed.

BTW, if you've never heard of a Cauchy distribution but you have heard
of the Student's t- distribution, a Cauchy distribution is the usual
Student's t- distribution with 1 degree of freedom - just a bit of
statistical trivia there.

amni

unread,
Jul 3, 2003, 6:14:20 PM7/3/03
to
I posted my detailed suggestions about the test in a _new thread_.


"Michael Howard" <mi...@howard666.freeserve.co.uk> wrote in message news:<bdrh33$rnd$1...@news7.svr.pol.co.uk>...

Douglas Zare

unread,
Jul 5, 2003, 12:31:33 AM7/5/03
to

David Brotherton wrote:

The Cauchy distribution isn't esoteric, though, and there
is a physical consequence of the fact that the Cauchy
distribution does not follow the Central Limit Theorem's
conclusions. If you look up from your computer, you may
see a 2-dimensional Cauchy distribution, since it is the
distribution of light from a point source onto a flat wall.
A basic principle of optics is that you can imagine that
on the way to the wall, the light is absorbed and re-
emitted, and you get the same distribution. You can
repeat this process hundreds of times, and the
distribution will still be Cauchy, and will still look like
it was emitted from the same point source.

Is this relevant to backgammon? Is there a chance that the
expected value does not exist, or more likely that the
standard deviation is infinite? Perhaps. There are
contrived positions with both players on the bar against
a 5 point board that many people have argued shows that
the expected value does not exist without a bound on the
cube, and I constructed more natural positions with both
players on the bar against a 3 or 4 point board that suggest
that the standard deviation should be infinite. However,
the chance of the cube escalating appears to be quite small,
and the distribution after a few thousand games is very
well approximated by a normal distribution.

Douglas Zare


Joseph Heled

unread,
Jul 7, 2003, 1:54:00 AM7/7/03
to

Albert Silver wrote:
> mi...@howard666.freeserve.co.uk (Mike Howard) wrote in message news:<c40114a8.03062...@posting.google.com>...
>

>>>The results after 406 positions:
>>>
>>> Overall Checker Errors (>0.039) Total Cube Errors
>>>GNU 0.14 6.963 (85) 4.991 (61) 1.972 (24)
>>>GNU 0.13 7.212 (88) 5.210 (64) 2.002 (24)
>>>Snowie 4 7.872 (85) 5.407 (63) 2.465 (22)
>>>GNU 0.12 9.874 (126) 6.825 (88) 3.049 (38)
>>>Snowie 3 14.200 (167) 9.888 (123) 4.312 (45)
>>>Jellyfish 18.842 (225) 13.835 (175) 5.007 (50)
>>>
>>>(results and analysis courtesy of Michael Depreli)
>>>
>>> Albert Silver
>>>

I think this is a nice method of comparing different bots, and it would
be nice to extended it by adding matches played between JF/Snowie and
JF/GNU, and matches played by strong players. That may provide a wider
range of position types.

When I made my own benchmark, I used positions obtained from matches
played by humans against GNUBG (on FIBS) to get a better mix than just
taking computer generated positions.

-Joseph

Frank Berger

unread,
Jul 8, 2003, 4:29:54 PM7/8/03
to
Joseph Heled <pep...@sf.net> wrote in message
> I think this is a nice method of comparing different bots, and it would
> be nice to extended it by adding matches played between JF/Snowie and
> JF/GNU, and matches played by strong players. That may provide a wider
> range of position types.
Full Ack!


> When I made my own benchmark, I used positions obtained from matches
> played by humans against GNUBG (on FIBS) to get a better mix than just
> taking computer generated positions.
>

Hm, are you making your positions available? I feel a benchmark for bots is
overdue!

ciao
Frank

Albert Silver

unread,
Jul 9, 2003, 9:21:44 AM7/9/03
to
fr...@bgblitz.com (Frank Berger) wrote in message news:<aab8a7e5.03070...@posting.google.com>...

You don't know what you're asking. He doesn't have a benchmark with
1000 positions or even 10,000, his goes in the hundreds of thousands.

Albert Silver

>
> ciao
> Frank

Frank Berger

unread,
Jul 9, 2003, 3:40:26 PM7/9/03
to
silver...@hotmail.com (Albert Silver) wrote in message news:<f9846eb9.03070...@posting.google.com>...

> > > When I made my own benchmark, I used positions obtained from matches
> > > played by humans against GNUBG (on FIBS) to get a better mix than just
> > > taking computer generated positions.
> > >
> >
> > Hm, are you making your positions available? I feel a benchmark for bots is
> > overdue!
>
> You don't know what you're asking. He doesn't have a benchmark with
> 1000 positions or even 10,000, his goes in the hundreds of thousands.
Well, this might be the case :-)
But I think when you have a set of positions, in an open format, with
the rollouts
already included, it doesn't matter if you have 100 or 100,000 just
that the latter is significant. The only bad thing with a bench with a
limited number of positions, it would be too easy to train a bot for
exactly that.
0 new messages