Dueller & GNU 1.4

Michael Howard

unread,

Jul 4, 2003, 9:15:39 AM7/4/03

to

Has anyone managed to get GNU 1.4 to work under Dueller yet???

Was working like a train with 1.3, but with 1.4, after the Dueller Command
Window comes up it blinks a couple of times and then just sits there.

I know that the the command line has been changed in some way with 1.4 and
it may be that Dueller needs an upgrade to handle it???

Tony Lezzard is in America at present I think---- are you reading this
Tony???

Rgds,
M

Albert Silver

unread,

Jul 4, 2003, 2:27:45 PM7/4/03

to

"Michael Howard" <mi...@howard666.freeserve.co.uk> wrote in message news:<be3um0$f4c$1...@newsg1.svr.pol.co.uk>...

> Has anyone managed to get GNU 1.4 to work under Dueller yet???
>
> Was working like a train with 1.3, but with 1.4, after the Dueller Command
> Window comes up it blinks a couple of times and then just sits there.

Make sure the name of the executable is gnubg-no-gui.exe

Albert Silver

Michael Howard

unread,

Jul 4, 2003, 2:59:22 PM7/4/03

to

Yep, all seems to be right. I get through the JellyFish setup fine and then
the window pops up to communicate with GNU......
Its heading is :- C:\WINDOWS\system32\cmd.exe.

This window blinks a few times as if it's about to start. Then it stops.

Has anybody seen Dueller working on 0.14 ???

Rgds,

M
"Albert Silver" <silver...@hotmail.com> wrote in message
news:f9846eb9.03070...@posting.google.com...

Albert Silver

unread,

Jul 4, 2003, 9:16:19 PM7/4/03

to

"Michael Howard" <mi...@howard666.freeserve.co.uk> wrote in message news:<be4iqd$j1v$1...@news7.svr.pol.co.uk>...

> Yep, all seems to be right. I get through the JellyFish setup fine and then
> the window pops up to communicate with GNU......
> Its heading is :- C:\WINDOWS\system32\cmd.exe.
>
> This window blinks a few times as if it's about to start. Then it stops.
>
> Has anybody seen Dueller working on 0.14 ???
>

I haven't had any problems hence the question. Try installing the new
Installation Archive (uninstall the old one first). Follow the
instructions in my tutorial (under the header Windows 95/98 users) if
you don't know how to preserve your old settings/Player Records. The
new Archive comes with some new DLLs, so that could be the issue.

Albert Silver

Michael Howard

unread,

Jul 5, 2003, 6:48:26 AM7/5/03

to

OK Albert I'll have a look at that. I have only just downloaded the new
file from the GNU site though so it should be pukka.
I am using XP - will that make a difference?

Albert Silver

unread,

Jul 5, 2003, 9:51:30 AM7/5/03

to

"Michael Howard" <mi...@howard666.freeserve.co.uk> wrote in message news:<be6ae2$t9s$1...@news8.svr.pol.co.uk>...

> OK Albert I'll have a look at that. I have only just downloaded the new
> file from the GNU site though so it should be pukka.
> I am using XP - will that make a difference?
>
> Rgds,
> M

I use Win XP myself, so no. Just make sure the name of the commandline
file is indeed exactly gnubg-no-gui.exe The reason I say this is
because I noticed some of the newer builds being made at Nardy's site
or even at Oystein's site (as a separate download) either added the
date or had a slightly different name. If the name isn't exactly that
then Dueller won't be able to start GNU.
I had to make this change myself as I ran into the same problem you
mentioned, hence my insistence on this issue.

Albert Silver

Michael Howard

unread,

Jul 5, 2003, 11:01:06 AM7/5/03

to

Albert,
This is very annoying as I was so keen to get the project rolling. The file
is :-c:\Program Files\gnubg\gnubg-no-gui.exe exactly as you said it should
be. The commands just don't seem to be sent by Dueller. Maybe there's
something more subtle that's wrong with the setup?
This is a brand new fresh install of XP so there are no remnants of the 0.13
version lying around.
I have tried pre-starting JF and the GNU Cmd Line but it makes no
difference.
All the file name/locations seem to match those in the Tony Lezzard registry
entries.

M
"Albert Silver" <silver...@hotmail.com> wrote in message
news:f9846eb9.03070...@posting.google.com...

Albert Silver

unread,

Jul 5, 2003, 4:14:29 PM7/5/03

to

"Michael Howard" <mi...@howard666.freeserve.co.uk> wrote in message news:<be6p7p$mm4$1...@news6.svr.pol.co.uk>...

> Albert,
> This is very annoying as I was so keen to get the project rolling. The file
> is :-c:\Program Files\gnubg\gnubg-no-gui.exe exactly as you said it should
> be. The commands just don't seem to be sent by Dueller. Maybe there's
> something more subtle that's wrong with the setup?
> This is a brand new fresh install of XP so there are no remnants of the 0.13
> version lying around.
> I have tried pre-starting JF and the GNU Cmd Line but it makes no
> difference.
> All the file name/locations seem to match those in the Tony Lezzard registry
> entries.
> M

I really can't say, but I would ask which Jellyfish you are using. I
had the Jellyfish 3.0 Tutor on my machine and no luck with Dueller.
However, when I installed the free Jellyfish 3.5 demo, it ran with no
glitches. If this doesn't work, my lst suggestion would be to try
changing the resolution. When I ran the Snowie 4 vs. GNU 0.13 series,
I was unable to get it to work with 1024x768. It had to either be
800x600 or 1280x1024 due to some problem with the way it read the
screens. Above all, make absolutely sure no other program with a
possibility of a pop-up warning or whatnot is running. Any pop-up or
action on the screen will cause Dueller to stop.

Albert Silver

Albert

amni

unread,

Jul 5, 2003, 5:14:43 PM7/5/03

to

Run the version of GNU which works without problems.

amni

"Michael Howard" <mi...@howard666.freeserve.co.uk> wrote in message news:<be6p7p$mm4$1...@news6.svr.pol.co.uk>...

Brad Davis

unread,

Jul 7, 2003, 3:21:02 AM7/7/03

to

Dueller dosen't work with JF 3.0.
You need to ue the free 3.5 Light version. (Same strength)

Brad

Michael Howard

unread,

Jul 8, 2003, 9:56:46 AM7/8/03

to

Dueller now working fine with JF 3.5 Analyzer and GNU 0.14. Duel will
commence soon and I will keep you posted periodically. Has to pointed out
that whilst JF moves virtually instantaneously on lvl 7 - 1000, GNU takes
several seconds on Supremo which probably means the results would be
different if they both had the same (shorter) time per move.
Rgds and thanks to all for help and advice.
M
"amni" <am...@hotmail.com> wrote in message
news:e3504803.03070...@posting.google.com...

amni

unread,

Jul 9, 2003, 12:37:11 AM7/9/03

to

about timing and how to keep it down.
GNU takes at least 5 timrs more than GNU in its
setting "checker SUPREMO, cube WORLD CLASS"
appart from graphic effects and sound effects
which should be disabled.

"Michael Howard" <mi...@howard666.freeserve.co.uk> wrote in message news:<beeij2$vug$1...@news8.svr.pol.co.uk>...

Michael Howard

unread,

Jul 9, 2003, 8:02:52 AM7/9/03

to

Yeah, it took 7 hours to complete the 1st 50 games with these settings. up
to 20 secs for some GNU moves. Albert gave me some mods to speed it up a
bit... we'll see.
GNU under Dueller is driven by a command interface and thus graphics/sound
etc are not relevant as the normal GUI does not appear.
Rgds,

Albert Silver

unread,

Jul 9, 2003, 9:25:12 AM7/9/03

to

am...@hotmail.com (amni) wrote in message news:<e3504803.03070...@posting.google.com>...

> about timing and how to keep it down.
> GNU takes at least 5 timrs more than GNU in its
> setting "checker SUPREMO, cube WORLD CLASS"
> appart from graphic effects and sound effects
> which should be disabled.

There are no graphics in GNU when run in Dueller since it is run from
the commandline version. One can disable the sound, but must do so
from the GUI, and then save the settings first. In any case, I
honestly don't believe playing back some 15k .wav file slows the
playing down any.

Albert Silver

Jørn Thyssen

unread,

Jul 9, 2003, 12:21:43 PM7/9/03

to

Michael Howard wrote:
> Yeah, it took 7 hours to complete the 1st 50 games with these settings. up
> to 20 secs for some GNU moves. Albert gave me some mods to speed it up a
> bit... we'll see.
> GNU under Dueller is driven by a command interface and thus graphics/sound
> etc are not relevant as the normal GUI does not appear.

Sound is still working in the non-gui version of gnubg. The sound is
played asynchronously, so gnubg does not wait for the sound to be
played, but the load on the system will be slightly higher -- I don't
now by how much!

Drawing the ASCII graphics for the board is probably not very expensive
for this kind of experiment since there is not that many updates per
minute, but for gnubg 0-ply self-play it may actually slow gnubg down
considerably since there will be many updates per second, so here it's
advisable to turn it off with "set display off".

Jørn

amni

unread,

Jul 9, 2003, 1:25:41 PM7/9/03

to

You don't believe maybe because you didn't tested like me
several hours the timing with and without graphics and
sounds.

GNU against itself with graphics and plays
at least double time than without these.

amni

silver...@hotmail.com (Albert Silver) wrote in message news:<f9846eb9.03070...@posting.google.com>...

Albert Silver

unread,

Jul 9, 2003, 5:35:31 PM7/9/03

to

am...@hotmail.com (amni) wrote in message news:<e3504803.03070...@posting.google.com>...

> You don't believe maybe because you didn't tested like me
> several hours the timing with and without graphics and
> sounds.

I don't think you understand. There are NO graphics. Dueller only uses
the no-gui version.

Albert Silver

amni

unread,

Jul 10, 2003, 1:13:09 AM7/10/03

to

Sounds work also in the non-gui version of GNU

---amni

Michael Howard

unread,

Jul 10, 2003, 6:09:42 AM7/10/03

to

Amni,
please don't worry too much about this detail. The test is going well. I
am doing 99 pt sessions overnight (the maximun Dueller will allow for some
reason).
So far about 400 games completed with GNU ahead by around 30 pts I think. I
intend to post all the games for download on a site somewhere so anybody can
look at them and argue about how lucky the Fish was to beat GNU...teehee.
8-)
IMHO the results will stand for themselves and no amount of analysis by 3rd
party bots will change them. Only further head to head duels can refine the
results more.
Trying to take out the luck to say one play was actually better (even if it
lost) is a bit like saying David Beckham is a poor player because he can't
kick with his left foot!!!
Cheers to all,

M
"amni" <am...@hotmail.com> wrote in message
news:e3504803.03070...@posting.google.com...

Jørn Thyssen

unread,

Jul 10, 2003, 8:38:00 AM7/10/03

to

Michael Howard wrote:
> Amni,
> please don't worry too much about this detail. The test is going well. I
> am doing 99 pt sessions overnight (the maximun Dueller will allow for some
> reason).
> So far about 400 games completed with GNU ahead by around 30 pts I think. I
> intend to post all the games for download on a site somewhere so anybody can
> look at them and argue about how lucky the Fish was to beat GNU...teehee.
> 8-)
> IMHO the results will stand for themselves and no amount of analysis by 3rd
> party bots will change them. Only further head to head duels can refine the
> results more.
> Trying to take out the luck to say one play was actually better (even if it
> lost) is a bit like saying David Beckham is a poor player because he can't
> kick with his left foot!!!

I'm sorry I don't understand that analogy, and I certainly don't agree
that we should not care about luck!

By calculating the luck adjusted result I don't say if any particular
move was good or bad. If I wanted to do that, I would do a normal match
analysis or perform an experiment like Michael Depreli's.

I simply let a bot calculate the luck for each roll. I don't care what
the actual moves were. Although a 0-ply luck calculation by gnubg is not
perfect, but it's way better than not including it. Note that this is
not biased since we're merely calculating luck -- not errors by either bot!

Consider this example: I let gnubg 0-ply play gnubg 0-ply. The result
after 14 games was 11-23. Not a very good result, considering we expect
a net result of zero. However, when I calculate the luck adjusted
results I get:

Actual result -12.000 12.000

Luck adjusted result 0.039 -0.039

Going further I can calculate the 95% confidence interval from the
results of the individual games:

No luck adjustment: -12 +/- 1.6
With luck adjustment: -0.039 -/- 0.2

Not only is the luck adjusted result closer to the correct one, but the
std.err is reduced by a factor of 8! So with luck adjustment my small
experiment is equivalent to approximately 900 (8*8*14) games instead of
the actual 14 performed!
(I can send you the SGF file from this session if you want it)

In short: without luck adjustment I hope that luck averages to zero.
With luck adjustment I hope that any errors in calculating the luck
averages to zero, i.e., that there is no bias.

Jørn

Michael Howard

unread,

Jul 10, 2003, 9:11:03 AM7/10/03

to

Jorn,
All this is very impressive but of no real use. The luck measurement is a
fictional figure created by a bot. How can you put a figure on luck without
a dubious value judgement based on an intrinsically flawed evaluation.
Only the results of head-to-head are of real interest to the community
outside you 'boffins'. You simply confuse everyone with numbers to 3
decimal places and kid us they mean something. Have a good time playing
with it (and I look forward to seeing the analysis) but I would say that if
two players go for 5,000 games (or more if possible) the winner is the
better player - end of story. You'll have a hard time proving otherwise to
me with fancy computations.
No offence intended and I accept you think I'm stupid too 8-).
Kind rgds,
M

"Jørn Thyssen" <j...@nospam.com> wrote in message
news:3F0D5E28...@nospam.com...

Albert Silver

unread,

Jul 10, 2003, 12:58:59 PM7/10/03

to

"Michael Howard" <mi...@howard666.freeserve.co.uk> wrote in message news:<bejolg$q82$1...@newsg4.svr.pol.co.uk>...

> Jorn,
> All this is very impressive but of no real use. The luck measurement is a
> fictional figure created by a bot. How can you put a figure on luck without
> a dubious value judgement based on an intrinsically flawed evaluation.
> Only the results of head-to-head are of real interest to the community
> outside you 'boffins'. You simply confuse everyone with numbers to 3
> decimal places and kid us they mean something. Have a good time playing
> with it (and I look forward to seeing the analysis) but I would say that if
> two players go for 5,000 games (or more if possible) the winner is the
> better player - end of story. You'll have a hard time proving otherwise to
> me with fancy computations.

Just because you don't understand them, doesn't make them wrong. His
same math stated that in spite of GNU's 56-44 result against Snowie 4
in 100 7-point matches, that they are in fact the same strength.
Still, your line of argument is similar to that of Creationists who
claim that since no one saw us evolve from primitive creatures similar
to apes, that evolution can't be 'proved' to exist. Of course this is
completely ludicrous and based on ignorance, but it's the same idea.
Ask ANY true scientist, no matter how fervent a christian, which is
correct and Darwin will win the day. Why? Because they understand, and
faith doesn't change a thing. The fact is that there will still be a
significant standard deviation due to luck even after 5000 games, and
this would be enormously negated by factoring in the luck each player
benefitted from. You constantly state that GNU's lead is due to a
lucky large cube, implying its lead is not skill based. Well, it goes
without saying that if JF hands over tons of points due to poor cube
decisions it is simply a worse player. Still, a luck analysis would
better shed light on this, no?

Albert Silver

Nis Jorgensen

unread,

Jul 10, 2003, 1:24:53 PM7/10/03

to

Michael Howard wrote:

> Jorn,
> All this is very impressive but of no real use. The luck measurement is a
> fictional figure created by a bot. How can you put a figure on luck without
> a dubious value judgement based on an intrinsically flawed evaluation.
> Only the results of head-to-head are of real interest to the community
> outside you 'boffins'. You simply confuse everyone with numbers to 3
> decimal places and kid us they mean something. Have a good time playing
> with it (and I look forward to seeing the analysis) but I would say that if
> two players go for 5,000 games (or more if possible) the winner is the
> better player - end of story. You'll have a hard time proving otherwise to
> me with fancy computations.
> No offence intended and I accept you think I'm stupid too 8-).

Not stupid, just stubborn :-)

Can we agree to disagree? That is, you will refer to the un-adjusted
results after the outcome, us 'boffins' refers to the luck-adjusted one?

You make one very strong point though - it WILL be hard to convince the
masses. That doesn't mean we (The International Conspiracy of Boffins)
wont try.

--
Nis Jorgensen
Your man in Amsterdam

Michael Howard

unread,

Jul 10, 2003, 1:46:31 PM7/10/03

to

And your line of argument insinuates that the Wimbledon champioship should
be decided by the IBM statistics and not the result. I don't need to be a
mathmo to know that an evaluation of luck based on equity is only a
subjective 'guess'. Even rollouts are only approximations. There is no
substitute for real results. Just 'cos you like to pretend cleverness with
figures doesn't mean I have to think you're clever.

I fully expect GNU to win because it is a better player. But I don't need a
computer to tell me how much luck or how many bad moves each made to see the
result. The game is played for results not to analyse to death. Why can't
you just say how many games are necessary to prove which is better? You
seem happy with a puny 1296 game rollout to verify the strongest move. If
1,000,000 games are needed, so be it. Asking one bot to say how other bots
perform move by move is itself ludicrous.
We are going to have to agree to disagree on this. I will only concede that
your analysis could help show which is making the stronger moves. The
result however, is the result.
Rgds,

M
"Albert Silver" <silver...@hotmail.com> wrote in message

news:f9846eb9.03071...@posting.google.com...

Michael Howard

unread,

Jul 10, 2003, 1:59:55 PM7/10/03

to

Thanks Jorn, that sounds a good approach.
Not really stubborn, I have no axe to grind except all these damn stupid
figures piss me off. The game is being reduced to computer analysed numers
and nobody has creative thoughts about backgammon anymore, they just run it
past their favourite bot and hey presto, that MUST be the right answer.

I really am looking forward to reading the analysis though - probably be
ready in 2 months. Question: I just saw GNU take a game which was totally
lost. late in bearoff with two men on the bar. Had maybe 10% winning
chances. He threw 6-6 which brought him on and maybe gave him 20% chances.
He then threw another 6-6 which probably increased his chance to 30%.
Finally ( there may have been one more roll I can't remember) he rolled 4-4
to win the game. How do you calculate luck like that? On a roll by roll
basis it was lucky. As a sequence of rolls it was phenomenal.

Cheers,
M

"Nis Jorgensen" <n...@dkik.dk> wrote in message
news:3f0da164$0$45378$edd6...@news.versatel.net...

John R MacLeod

unread,

Jul 10, 2003, 1:57:15 PM7/10/03

to

"Albert Silver" <silver...@hotmail.com> wrote in message

news:f9846eb9.03071...@posting.google.com...

> Still, your line of argument is similar to that of Creationists who
> claim that since no one saw us evolve from primitive creatures similar
> to apes, that evolution can't be 'proved' to exist.

I think you are being a little over-generous to the Creationists here.
There is an argument in the backgammon case with some logic behind it which
is extremely doubtful in the Creationist case.
If the purpose of these matches is to prove which is the 'strongest' bot, it
does seem questionable to use a luck rating calculated by the bots
themselves. I don't know the full details of the calculation but I assume
the 'luck' associated with an individual roll is something like the
difference between the MWC after best playing of the actual roll less the
average MWC chances that would have resulted from best play of the other 35
rolls that didn't occur? Presumably too, MWC chances derive from the bots
evaluation of the positions which could occur after n rolls assuming perfect
play on both sides. Therefore it seems to me that the bots evaluation of
'luck' is based on their own evaluation of the best move. It would seem
theoretically that one should evaluate their playing strength without any
assumptions about their playing strength which is negated if you use the
luck factor.
However, I think in practice the bots have proved their playing strength
sufficiently to use their estimates of luck to get better evaluations than
any human could do without running many, many, more test games.

Michael Howard

unread,

Jul 10, 2003, 2:07:17 PM7/10/03

to

Yes John, agree entirely with this articulate reasoning.... but how many
games is enough. Our clever mathmos won't tell me. I am ready to do it if
5,000 isn't sufficient. Unless I die of boredom of course!
M

"John R MacLeod" <jrma...@consultant.com> wrote in message
news:bek9l3$63lru$1...@ID-73584.news.uni-berlin.de...

Albert Silver

unread,

Jul 10, 2003, 5:19:07 PM7/10/03

to

"Michael Howard" <mi...@howard666.freeserve.co.uk> wrote in message news:<bek9iu$6hd$1...@newsg4.svr.pol.co.uk>...

> Thanks Jorn, that sounds a good approach.
> Not really stubborn, I have no axe to grind except all these damn stupid
> figures piss me off. The game is being reduced to computer analysed numers
> and nobody has creative thoughts about backgammon anymore, they just run it
> past their favourite bot and hey presto, that MUST be the right answer.
>
> I really am looking forward to reading the analysis though - probably be
> ready in 2 months. Question: I just saw GNU take a game which was totally
> lost. late in bearoff with two men on the bar. Had maybe 10% winning
> chances. He threw 6-6 which brought him on and maybe gave him 20% chances.
> He then threw another 6-6 which probably increased his chance to 30%.
> Finally ( there may have been one more roll I can't remember) he rolled 4-4
> to win the game. How do you calculate luck like that? On a roll by roll
> basis it was lucky. As a sequence of rolls it was phenomenal.
>
> Cheers,
> M

I had a match a couple of days ago where I got hit loose by my
opponent on the 2-point (he had the 8, 7, and 6 points made) and I
danced 2 times rolling a 6-6 against his single-point board (he only
had his 6-point), and got gammoned unable to do more than enter and
get hit again after this.

Albert Silver

unread,

Jul 10, 2003, 5:24:17 PM7/10/03

to

"John R MacLeod" <jrma...@consultant.com> wrote in message news:<bek9l3$63lru$1...@ID-73584.news.uni-berlin.de>...

> "Albert Silver" <silver...@hotmail.com> wrote in message
> news:f9846eb9.03071...@posting.google.com...
> > Still, your line of argument is similar to that of Creationists who
> > claim that since no one saw us evolve from primitive creatures similar
> > to apes, that evolution can't be 'proved' to exist.
>
> I think you are being a little over-generous to the Creationists here.
> There is an argument in the backgammon case with some logic behind it which
> is extremely doubtful in the Creationist case.
> If the purpose of these matches is to prove which is the 'strongest' bot, it
> does seem questionable to use a luck rating calculated by the bots
> themselves. I don't know the full details of the calculation but I assume
> the 'luck' associated with an individual roll is something like the
> difference between the MWC after best playing of the actual roll less the
> average MWC chances that would have resulted from best play of the other 35
> rolls that didn't occur? Presumably too, MWC chances derive from the bots
> evaluation of the positions which could occur after n rolls assuming perfect
> play on both sides. Therefore it seems to me that the bots evaluation of
> 'luck' is based on their own evaluation of the best move.

No, it's much simpler than that. If the equity before rolling was
0.000 and after the roll (playing the best move) it was 0.200, I was
0.200 'lucky'. The quality of the other rolls is irrelevant. Suppose I
played the second best move and my current evaluation is only 0.150
(so I made a 0.050 mistake), the luck I gained is still 0.200 as that
is what the roll would have allowed.

Albert Silver

unread,

Jul 10, 2003, 5:27:58 PM7/10/03

to

"Michael Howard" <mi...@howard666.freeserve.co.uk> wrote in message news:<bek8pt$1o1$1...@news7.svr.pol.co.uk>...

> And your line of argument insinuates that the Wimbledon champioship should
> be decided by the IBM statistics and not the result. I don't need to be a
> mathmo to know that an evaluation of luck based on equity is only a
> subjective 'guess'. Even rollouts are only approximations. There is no
> substitute for real results. Just 'cos you like to pretend cleverness with
> figures doesn't mean I have to think you're clever.
>
> I fully expect GNU to win because it is a better player. But I don't need a
> computer to tell me how much luck or how many bad moves each made to see the
> result. The game is played for results not to analyse to death. Why can't
> you just say how many games are necessary to prove which is better? You
> seem happy with a puny 1296 game rollout to verify the strongest move.

Do you know why a puny 1296 game rollout is accepted? It is only due
to Variance Reduction, which.... factors in the luck factor into the
results! The same outrageous idea being presented here to analyze the
game results.

Albert Silver

Michael Howard

unread,

Jul 11, 2003, 4:54:04 AM7/11/03

to

But how do you 'measure' lucky sequences fairly where maybe 3 rolls in
succession turn the game round.... each move only increasing equity by a
small amount. Adding up the individual changes probably won't be good
enough to express the overall effect.
M

Michael Howard

unread,

Jul 11, 2003, 6:36:07 AM7/11/03

to

Exactly, and that's why the results of rollouts vary from bot to bot and
change when you add a few more thousand games to the rollout or tinker with
the settings.
I understand the principal but it's all an approximation.

From what you said below it's also clear that the 'measurement' of luck in
games is simplistic and primitive. It takes no account of lucky sequences
for a start.

In any case I could envisage a situation where a technically inferior bot
could play badly but in such a way that it consistently beats 'stronger
bots. What would it prove to show by maths that it was actually a useless
player. Surely winning is the object of the game not pleasing the analysts.

We are told that by playing the strongest moves we will win in the
'long-run'. My question to you is:- in money games - What is the long-run??
Will you going on record to say that the result of the 5,000 games I am
running will prove nothing?

Albert Silver

unread,

Jul 11, 2003, 9:28:53 AM7/11/03

to

"Michael Howard" <mi...@howard666.freeserve.co.uk> wrote in message news:<bem3ur$dsc$1...@newsg4.svr.pol.co.uk>...

> Exactly, and that's why the results of rollouts vary from bot to bot and
> change when you add a few more thousand games to the rollout or tinker with
> the settings.
> I understand the principal but it's all an approximation.
>
> From what you said below it's also clear that the 'measurement' of luck in
> games is simplistic and primitive. It takes no account of lucky sequences
> for a start.

It includes all moves unless you are speaking of something else.

> In any case I could envisage a situation where a technically inferior bot
> could play badly but in such a way that it consistently beats 'stronger
> bots.

Short of being luckier, how?

> What would it prove to show by maths that it was actually a useless
> player. Surely winning is the object of the game not pleasing the analysts.

Not quite. The object is to establish who the stronger player is,
which in this case may or may not be the winner of 5000 games.
Furthermore, suppose the best man wins, you still have two questions
unanswered:

1) Did the best man win? Only analysis factoring in the luck could
say.

2) Does the score represent the difference in strength? In other words
if JellyFish won 3750 to 1250 is it really 3x stronger or was that
score further enhanced by better luck too?

> We are told that by playing the strongest moves we will win in the
> 'long-run'. My question to you is:- in money games - What is the long-run??
> Will you going on record to say that the result of the 5,000 games I am
> running will prove nothing?

Absolutely not. I think it will certainly be representative of their
playing strengths... AFTER the luck has been factored in. That is the
result I will look at personally. Just because you will not, doesn't
mean I or others will not, so by all means the series will not have
been for nothing.

Albert Silver

Jørn Thyssen

unread,

Jul 11, 2003, 9:40:39 AM7/11/03

to

Michael Howard wrote:

> Question: I just saw GNU take a game which was totally
> lost. late in bearoff with two men on the bar. Had maybe 10% winning
> chances. He threw 6-6 which brought him on and maybe gave him 20% chances.
> He then threw another 6-6 which probably increased his chance to 30%.
> Finally ( there may have been one more roll I can't remember) he rolled 4-4
> to win the game. How do you calculate luck like that? On a roll by roll
> basis it was lucky.

Yes, that is the luck for each roll

> As a sequence of rolls it was phenomenal.

Yes, that is the total luck of that game, i.e., the sum of the luck for
each move.

OK, let's start by assuming this is a cubeless game.

The equity from the starting position is 0. At 10% winning chance your
equity is -0.8, so in order to get in this situation you need a net
(bad) luck of -0.8 (or make errors during play, but let's ignore that).

Very simplistic, the luck calculation would go like this:

On roll equity After roll equity Luck
-0.8 (10%) -0.6 (20%) +0.2
-0.6 (20%) -0.4 (30%) +0.2
-0.4 (30%) +1.0 (100%) +1.4
--------------------------------------------
Total luck for the sequence 66,66,44 +1.8

That is, the total luck for this game is -0.8 + 1.8 = +1.0

So, do you agree that gnubg had -0.8 bad luck until it starting rolling
good numbers? Do you agree that the sequence 66, 66, 44 improved it's
euqity by 1.8 due to good luck? And finally, do you agree that the total
luck for that particular game is +1?

The actual result of this game is +1 (in gnubg's favour). Since gnubg
had good luck of +1, the luck adjusted result is 0.

Jørn

Jørn Thyssen

unread,

Jul 11, 2003, 9:47:26 AM7/11/03

to

Michael Howard wrote:
> But how do you 'measure' lucky sequences fairly where maybe 3 rolls in
> succession turn the game round.... each move only increasing equity by a
> small amount. Adding up the individual changes probably won't be good
> enough to express the overall effect.

Why is that? You're actually stating that the total luck of a sequence
(or game) is *NOT* the sum of the luck for each roll!?!

Jørn

Jørn Thyssen

unread,

Jul 11, 2003, 10:08:11 AM7/11/03

to

Michael Howard wrote:
[snip]

>
> From what you said below it's also clear that the 'measurement' of luck in
> games is simplistic and primitive.

Simplistic: yes; primitive: no.

> It takes no account of lucky sequences
> for a start.

Again, I do not understand that. You really have to explain why you
think that the total luck of a sequence or game is *NOT* equal to the
sum of the luck for each roll in the sequence or game.

>
> In any case I could envisage a situation where a technically inferior bot
> could play badly but in such a way that it consistently beats 'stronger
> bots.

I guess you mean that the inferior bot can seek into positions that the
stronger bot play badly, say backgammons. Aha, but since the 'stronger'
bot play bad, the luck adjusted result will eventually show the inferior
bot to be the stronger one -- even though the luck analysis is caried
out by the stronger bot!

> What would it prove to show by maths that it was actually a useless
> player.

No, I think you're still mistaking an analysis of the moves with the
luck analysis. The luck analysis don't care when was played in the
actual position.

> Surely winning is the object of the game not pleasing the analysts.
>
> We are told that by playing the strongest moves we will win in the
> 'long-run'. My question to you is:- in money games - What is the long-run??
> Will you going on record to say that the result of the 5,000 games I am
> running will prove nothing?

We don't know that yet! If one of the games in your 5,000 game money
session was determined by gnubg winning a gammon on a 128 cube then:
yes, your experiment will prove nothing! If the net luck of the session
is close to zero then we can actually use the non-adjusted result to
something.

Jørn

Jørn Thyssen

unread,

Jul 11, 2003, 10:19:37 AM7/11/03

to

Michael Howard wrote:
> Jorn,
> All this is very impressive but of no real use. The luck measurement is a
> fictional figure created by a bot. How can you put a figure on luck without
> a dubious value judgement based on an intrinsically flawed evaluation.

Do you agree that if we had a perfect bot and calculated the luck
analysis with this bot, then it would be a great improvement, since it
dramatically reduces the number of matches needed to be played?

> Only the results of head-to-head are of real interest to the community
> outside you 'boffins'.

No, the community would ask: what is the 95% confidence interval of your
head-to-head session? For example, you could easily obtain a result
saying that Jellyfish is better than gnubg by, say, +0.02 ppg. However,
if 95% CI is 0.03, then the result is worthless!

By running a luck analysis on the session I can reduce the 95% CI
interval. In my original post this factor was around 8, so in my example
above I could tell the community: "by applying luck adjustment to the
result I obtain the Jellyfish is better than gnubg by +0.02 ppg with a
95% CI interval of 0.004", which is much better than the 0.02 ppg +/-
0.03 obtained without luck adjustment.

Also, as Albert Silver points out: the community uses rollouts with
variance reduction every day, so they do know it's important to factor
in luck. I'm just applying the same technique to an entire session
instead of a single game.

> You simply confuse everyone with numbers to 3
> decimal places and kid us they mean something. Have a good time playing
> with it (and I look forward to seeing the analysis) but I would say that if
> two players go for 5,000 games (or more if possible) the winner is the
> better player - end of story.

Dream on! A single crazy game may destroy the entire experiment!

> You'll have a hard time proving otherwise to
> me with fancy computations.

Yes, I understand that know.

> No offence intended

None taken.

Jørn

Jørn Thyssen

unread,

Jul 11, 2003, 10:31:08 AM7/11/03

to

John R MacLeod wrote:
> "Albert Silver" <silver...@hotmail.com> wrote in message
> news:f9846eb9.03071...@posting.google.com...
>
>>Still, your line of argument is similar to that of Creationists who
>>claim that since no one saw us evolve from primitive creatures similar
>>to apes, that evolution can't be 'proved' to exist.
>
>
> I think you are being a little over-generous to the Creationists here.
> There is an argument in the backgammon case with some logic behind it which
> is extremely doubtful in the Creationist case.
> If the purpose of these matches is to prove which is the 'strongest' bot, it
> does seem questionable to use a luck rating calculated by the bots
> themselves.

Note that the luck calculation is not biased towards any of the
contestants. I don't think you can argue that gnubg will consistently
think that gnubg is luckier than jellyfish and vice versa.

> I don't know the full details of the calculation but I assume
> the 'luck' associated with an individual roll is something like the
> difference between the MWC after best playing of the actual roll less the
> average MWC chances that would have resulted from best play of the other 35
> rolls that didn't occur?

Yes

> Presumably too, MWC chances derive from the bots
> evaluation of the positions which could occur after n rolls assuming perfect
> play on both sides. Therefore it seems to me that the bots evaluation of
> 'luck' is based on their own evaluation of the best move.

Yes, but the calculation is not biased.

> It would seem
> theoretically that one should evaluate their playing strength without any
> assumptions about their playing strength which is negated if you use the
> luck factor.

The calculation of luck basically relies on the bot's ability to produce
_relative_ equities. Fortunately for us this is specifically what the
bot's are good at!

> However, I think in practice the bots have proved their playing strength
> sufficiently to use their estimates of luck to get better evaluations than
> any human could do without running many, many, more test games.

Yes, exactly! The calculation of luck is not perfect, but it's so good
that it's worthwhile to include!

Jørn

s.w.a....@hccnet.nl

unread,

Jul 11, 2003, 2:21:38 PM7/11/03

to

On Fri, 11 Jul 2003 16:31:08 +0200, Jørn Thyssen <j...@nospam.com>
wrote:

> > John McLeod wrote:
> > Presumably too, MWC chances derive from the bots
>> evaluation of the positions which could occur after n rolls assuming perfect
>> play on both sides. Therefore it seems to me that the bots evaluation of
>> 'luck' is based on their own evaluation of the best move.
>

>Jorn answered:

>Yes, but the calculation is not biased.
>>>>>

The calculation may not be biased, but the input certainly is.
**

>> John McLeod wrote:
>> However, I think in practice the bots have proved their playing strength
>> sufficiently to use their estimates of luck to get better evaluations than
>> any human could do without running many, many, more test games.

>>>>>>>>>>>>>>.
>Jorn answered:

>Yes, exactly! The calculation of luck is not perfect, but it's so good
>that it's worthwhile to include!
>>>>>>>>

It is only as 'good' as the quality of the equities that are used.
If you let the october 2002 version of GNU play against the july 2003
version, and let the 2002 version calculate the 'luck', you are off by
the same amount as the equity improvements that have been added to the
2003 version.
In other words and in the same vein.....if f.e. Snowie is indeed a
better player than GNU at this moment , GNU will not be able to figure
that out in those cases where Snowie's equities are superior....GNU
will just see Snowie's actually 'better moves' as 'inferior moves',
and it will rate Snowie's gained equity as 'luck'.

Jorn, your set up only works when a bot with an overall superior
equity table is used to 'judge' an inferior one.....sort of Catch
22.....you will first have to prove that the bot that does the
calculation is indeed using an >overall< better equity table.
If you don't, you have "anecdote" at best , and maybe not even
anecdote. The old motto: "bullshit in - bullshit out" comes to mind.

Peter.

Jørn Thyssen

unread,

Jul 11, 2003, 3:28:49 PM7/11/03

to

s.w.a....@hccnet.nl wrote:
[snip]

> It is only as 'good' as the quality of the equities that are used.
> If you let the october 2002 version of GNU play against the july 2003
> version, and let the 2002 version calculate the 'luck', you are off by
> the same amount as the equity improvements that have been added to the
> 2003 version.

No, that's not correct.

As I don't have the 2002 version lying around I did something else: I
analysed the luck in the 14 game session from my original posting with
the "intermediate" setting:

non-adjusted: -0.9 ppg +/- 1.5
luck-adjusted with "expert": +0.0 ppg +/- 0.2
luck-adjusted with "intermediate": +0.2 ppg +/- 0.5

So with a worse luck calculation I only reduce the standard error by a
factor of 3 compared to 8 with the "expert" setting.

Do you agree that even though the luck calculation is now worse, it's
still better than not doing it?

> In other words and in the same vein.....if f.e. Snowie is indeed a
> better player than GNU at this moment , GNU will not be able to figure
> that out in those cases where Snowie's equities are superior

Why not? Remember I'm merely calculating luck: how much was the equity
improvement by luck in each position. Note that it doesn't matter if
gnubg's equities are off by 0.1 because I'm only looking at equity
differences so systematic errors will be cancelled out.

> ....GNU
> will just see Snowie's actually 'better moves' as 'inferior moves',
> and it will rate Snowie's gained equity as 'luck'.
>
> Jorn, your set up only works when a bot with an overall superior
> equity table is used to 'judge' an inferior one

I hope my example with luck analysis on "intermediate" setting indicates
that this is certainly not true.

Jørn

John R MacLeod

unread,

Jul 11, 2003, 3:41:01 PM7/11/03

to

"Michael Howard" <mi...@howard666.freeserve.co.uk> wrote in message

news:beka0o$t9l$1...@newsg3.svr.pol.co.uk...

.... but how many
> games is enough. Our clever mathmos won't tell me. I am ready to do it
if
> 5,000 isn't sufficient. Unless I die of boredom of course!

Unfortunately, Michael, I think you are asking a question that can't be
answered unless you allow the 'boffins' to measure luck in some way. I
think everyone would agree that a one game match between two players of
about the same strength would give you almost no confidence that the winner
was the strongest player. On the other had a match of a hundred thousand
million million games would certainly engender a lot of confidence that the
winner was stronger unless the margin was only a few million points or so.
But to say how many games before you are say 99% sure that the winner is
stronger you need to know how luck distributes itself in backgammon
matches - or even more specifically how the luck distribution is distributed
in games between only these two players as style may affect luck
distribution. Unless you are willing to use them or have some better way of
measuring luck distribution I'm afraid your question cannot be answered.

John R MacLeod

unread,

Jul 11, 2003, 4:02:10 PM7/11/03

to

"Jørn Thyssen" <j...@nospam.com> wrote in message

news:3F0ECA2C...@nospam.com...

>
> Note that the luck calculation is not biased towards any of the
> contestants. I don't think you can argue that gnubg will consistently
> think that gnubg is luckier than jellyfish and vice versa.
>

I'm having a bit of fun trying to work out if I agree with this or not. The
example I'm thinking of is BOT A and BOT B which are so alike that there is
only one possible position where their evaluations disagree but on that
particular position their evaluations disagree strongly. I imagine you
could create such a situation by cloning GNUBG and putting a deliberate
'bug' in at some low level. Now this would come into play whenever that
position was one of the possible positions within the horizon of search. Set
them playing each other continuously and in the fullness of time one would
expect the one with the correct (or better) evaluation to creep slowly
ahead. Let us say BOT A turns out to be better than BOT B.
Now how would the two bots analyze the games? For BOTA its easy - its own
moves are perfect while BOTB makes the occasional error. As luck gradually
evens out over time everything is as expected - the stronger player wins.
But for BOTB it is more difficult. Clearly, its own moves are perfect but
BOTA makes the occasional mistake. Nevertheless, BOTA is winning. The only
explanation is that BOTA is slightly luckier than BOTB.
On this basis I would expect the weaker bot to slightly overestimate the
luck of the stronger bot.
I suspect that the only real conclusion one can draw from these luck
adjusted matches may be:
Snowie evaluates GNUBG as stronger than JF (or vice versa)
Gnubg evaluates JF as stronger than Snowie (or vice versa)
JF evaluates snowie as stronger than Gnubg (or vice versa)
and so on.

Kees van den Doel

unread,

Jul 11, 2003, 6:05:23 PM7/11/03

to

In article <ben4up$6p95t$1...@ID-73584.news.uni-berlin.de>,

John R MacLeod <jrma...@consultant.com> wrote:

>> games is enough. Our clever mathmos won't tell me. I am ready to do
>> it if 5,000 isn't sufficient. Unless I die of boredom of course!

[impossible to figure out]

What's wrong with the method described below?:

First of all, instead of dealing with matches, just play a money game
session of N games as it's easier to analyse.

Let x_i denote the point gain of player A on the i'th game (negative if
a loss). Let M be the expectation value of x (which we don't know) and V
the variance (which we don't know). Assuming the probability
distribution of x is not pathological (i.e., M and V are non infinite,
which sounds very reasonable to me) the central limit theorem applies,
and the probability distribution for p = SUM(x_i)/n, i.e., the
points-per-game edge of player A, is normal, with average M and variance
V/n.

So run a money session for N games, and estimate the average p and the
variance V from the data. Then it is trivial to compute the confidence
in the measured value of p, which is the points per game.

The only iffy part I can think of is the possible error in the measured
variance, but all we need is an upper limit so this may not be so bad
for large enough N.

Kees (Gotta sit on FIBS, and teach you dense, or so, but here that dont
know was living conditions there Mr nerd dog genoemd is S.)

Jørn Thyssen

unread,

Jul 11, 2003, 6:20:47 PM7/11/03

to

John R MacLeod wrote:
> "Jørn Thyssen" <j...@nospam.com> wrote in message
> news:3F0ECA2C...@nospam.com...
>
>>Note that the luck calculation is not biased towards any of the
>>contestants. I don't think you can argue that gnubg will consistently
>>think that gnubg is luckier than jellyfish and vice versa.
>>
>
> I'm having a bit of fun trying to work out if I agree with this or not. The
> example I'm thinking of is BOT A and BOT B which are so alike that there is
> only one possible position where their evaluations disagree but on that
> particular position their evaluations disagree strongly. I imagine you
> could create such a situation by cloning GNUBG and putting a deliberate
> 'bug' in at some low level. Now this would come into play whenever that
> position was one of the possible positions within the horizon of search. Set
> them playing each other continuously and in the fullness of time one would
> expect the one with the correct (or better) evaluation to creep slowly
> ahead. Let us say BOT A turns out to be better than BOT B.
> Now how would the two bots analyze the games?

Well, I'm *not* analysing the moves of the match: I'm analysing the luck
which is something totally different!

> For BOTA its easy - its own
> moves are perfect while BOTB makes the occasional error. As luck gradually
> evens out over time everything is as expected - the stronger player wins.
> But for BOTB it is more difficult. Clearly, its own moves are perfect but
> BOTA makes the occasional mistake. Nevertheless, BOTA is winning. The only
> explanation is that BOTA is slightly luckier than BOTB.

No, the only explanation is that Bot A is better!

Say, you played 5,000 7pt matches. Assuming both Bot A and Bot B
produces relatively good estimates of luck, both bots will deduce that
Bot A is better than Bot B because it on average only need, say, 49% MWC
luck to win a match. If you only need 49% MWC luck to beat Bot B, then
you must be a 51-49 favourite!

Both bots will arrive at exactly the same result -- the 95% CI interval
will differ, though.

If you analyse the matches with Bot A it's will say that Bot B makes
mistakes. Similarly, Bot B will say that Bot A makes mistakes, and will
think that it's a favourite against Bot A. The beauty of the luck
adjustment is that we merely look at luck -- not errors made by any bot.
They may disagree on the moves made during the match but this doesn't
enter the luck calculation. Nis Jørgensen posted a nice analysis of this
on the gnubg mailing list:

http://mail.gnu.org/archive/html/bug-gnubg/2003-06/msg00186.html

In short: if, for some reason, gnubg misevaluates a certain roll in some
position, this introduces errors in the luck analysis. However, the
average error is zero!

> On this basis I would expect the weaker bot to slightly overestimate the
> luck of the stronger bot.

No, I don't see why.

Jørn

amni

unread,

Jul 12, 2003, 2:51:26 AM7/12/03

to

> Dream on! A single crazy game may destroy the entire experiment!

That's why I asked for "raw results" uploaded
every one can make his own conclusion.

I personal will ignore all games with cube raid to more than
8 points, so that crazy games will be ignored.

Kees van den Doel

unread,

Jul 12, 2003, 3:09:05 AM7/12/03

to

In article <e3504803.03071...@posting.google.com>,
amni <am...@hotmail.com> wrote:

You might as well ignore all games which JellyFish won because you think
they are "crazy".

Or ignore all games with more than 2 doubles in a row.

Or just play tic-tac-toe.

Kees (Not strawberries, unless it's Oeroeboeroe wrote: Pardon?)

Douglas Zare

unread,

Jul 12, 2003, 8:03:55 AM7/12/03

to

Michael Howard wrote:

> Yes John, agree entirely with this articulate reasoning.... but how many
> games is enough. Our clever mathmos won't tell me. I am ready to do it if
> 5,000 isn't sufficient. Unless I die of boredom of course!

Someone said that if you can't explain yourself to the man on
the street in 5 minutes, then you don't really know what you
are doing. My response is that I don't feel like limiting
myself to studying what can be explained to the man on the
street in 5 minutes. (To a related question, a reported asked
Feynmann to say, "If I could explain it in 5 minutes, it
wouldn't be worth a Nobel Prize, would it?") I'll add
that more can be explained in 5 minutes to someone paying
attention than a man plugging his ears while screaming,
"I'm not listening!" However, whether you are better
described by the former or the latter does not affect whether
I know what I am doing.

I'm sorry that you don't like or haven't followed the
explanation for why a set number of games will not work.
However, that explanation is the same one given by many
people within backgammon years ago, statisticians
decades ago, and common sense centuries ago. Let me try
to make it more intuitive: You need only a few examples
to be pretty sure than adults tend to be taller than toddlers,
more examples to be confident that men tend to be taller
than women, and hundreds or thousands of examples to
see that children at 101 months are taller than children
at 100 months. Is that intuitive for you? If not, I recommend
revising your intuition rather than insulting people trying to
explain things to you.

Douglas Zare

John R MacLeod

unread,

Jul 12, 2003, 10:34:50 AM7/12/03

to

"Kees van den Doel" <kvan...@xs1.xs4all.nl> wrote in message
news:3f0f34a3$0$142$e4fe...@dreader5.news.xs4all.nl...

>
> What's wrong with the method described below?:
>

I don't know enough about either the central limit theorem or backgammon to
guess if the variance in results between roughly equal players is covered by
the theorem but it might be a good hypotheses. However it still doesn't
answer Michael's question I don't think. He wants to know how many games he
has to play to get a certain confidence. If I'm not wrong you are
suggesting playing 'n' games to work out the variance and then calculating
the confidence. But he wants to know the confidence first.

John R MacLeod

unread,

Jul 12, 2003, 10:49:01 AM7/12/03

to

"Jørn Thyssen" <j...@nospam.com> wrote in message

news:3F0F383F...@nospam.com...

> Nis Jørgensen posted a nice analysis of this
> on the gnubg mailing list:
>
> http://mail.gnu.org/archive/html/bug-gnubg/2003-06/msg00186.html
>

I've had a read of this and also Douglas Zare's paper on a similar subject
cited in another post. I'm still not quite convinced though I recognise the
force of the arguments. The example quoted in this one seems based on the
idea that the -.200 * 1/18 error in evaluation on position P(n-1) will not
result in any different move whereas if it did then position P(n) would not
be reached and the self-correcting actual play of 4/3 to reach P1(n+1)
wouldn't happen. Am I missing something?

s.w.a....@hccnet.nl

unread,

Jul 12, 2003, 1:06:36 PM7/12/03

to

On Fri, 11 Jul 2003 21:28:49 +0200, Jørn Thyssen <j...@nospam.com>
wrote:

>Do you agree that even though the luck calculation is now worse, it's

>still better than not doing it?
>>>>>>

Yes, agree.
My point is not so much that I think it doesn't give a better
indication of game outcomes between top bots and lower ranking
opponents.....I only think that this method is not reliable when it is
used to measure the luck of opponents of practically equal ranking,
like maybe GNU, Snowie and BgBlitz.
**

> In other words and in the same vein.....if f.e. Snowie is indeed a
> better player than GNU at this moment , GNU will not be able to figure
> that out in those cases where Snowie's equities are superior
>>>>>>>>>>>

Jorn answered:

>Remember I'm merely calculating luck: how much was the equity
>improvement by luck in each position. Note that it doesn't matter if
>gnubg's equities are off by 0.1 because I'm only looking at equity
>differences so systematic errors will be cancelled out.
>>
>>

Yes, you may be right, I think I am mixing up mwc and luck in the
above statement.
But let's look at the following position...just an example I stumbled
over recently:
---------------------------------------------------------------------------
Move number 5: X to play 26

GNU Backgammon Position ID: fAEAALjvhgEAAA
Match ID : cAn5AFAAAAAA
+13-14-15-16-17-18------19-20-21-22-23-24-+ O: GNU
| | | O O | OO 5 points
| | | O | OO
| | | O | OO
| | | O | OO
| | | O | O
v| |BAR| | 7 point match (Cube:
1)
| | | X |
| | | X |
| X | | X X |
| X X X | | X X | Rolled 26
| X X X | | X X | 0 points
+12-11-10--9--8--7-------6--5--4--3--2--1-+ X: user
Pip counts: O 19, X 106

Output generated Sat Jul 12 12:46:18 2003
by GNU Backgammon 0.14-devel 1.1136 030706 (Text Export version 1.39)

Hint:
==================================================
1. Cubeless 0-ply 12/6 7/5 Eq.: -1,790
0,000 0,000 0,000 - 1,000 0,477 0,000
0-ply cubeless
2. Cubeless 0-ply 12/6 8/6 Eq.: -1,882 (
-0,092)
0,000 0,000 0,000 - 1,000 0,533 0,000
0-ply cubeless
==================================================

I checked this equity out with another program (not because of the
luck factor, but because this move lowered my rating from world class
to intermediate...grin) and it seems that GNU's estimate of [ -0,092 ]
is off-track by [ 0,080 ].
Assuming for the moment that this is indeed correct.....
Do you agree that if player X takes the >second< choice, that GNU's
rating of the resulting position is off by 80 too ?
And that -if- in the next few rolls a backgammon is prevented, and the
match saved for the moment, that the luck needed by X for this to
happen is going to be estimated 80 too high too ?
I think it was Albert -maybe someone else- who recently wrote that
bots' moves diverge too often to enable duplicate games.....if this is
so, every diversion must lead to a different position and a different
evalution of the equities of those positions, although they of course
not necessarily lead to differences of thie magnitude.
Every time this happens, the estimation of the amount of luck needed
from that point to the final move of a game must be affected too ?
I understand from the text on the URL you posted that it is assumed
that an error in equity evaluation 1 ply deep will be corrected via
examination 2 ply deep....but is this always so ? What if, as in the
case as described at the end of this email, the 'wrong' evaluation
leads to a 'double->drop' action that was not necessary ?

But maybe I am totally off-track, it would not be the first time....if
so, sorry to have wasted your time...:-))
**

Jorn wrote earlier in this message:

>As I don't have the 2002 version lying around I did something else: I
>analysed the luck in the 14 game session from my original posting with
>the "intermediate" setting:
>>
>>

But that is not what I think I am addressing....your scenario
represents a different instance of the same bot that uses the same
equity values.
Just for fun I installed the oct.2002/GNU-12 version and played a
duplicate game with GNU-14. They both aknowledge each other's error
free supernatural checker play, nevertheless after the 14th ply, the
following scenario occured:
=======================================
GNU 14
Evaluator: CONTACT
snip!
Win W(g) W(bg) L(g) L(bg) Equity (cubeful)
static: 0,691 0,362 0,031 0,089 0,005 (+0,706 (+1,000))

Redouble, pass : +1,000
Redouble, take : +1,075 (+0,075)
No redouble : +0,897 (-0,103)

Correct cube action: Redouble, pass

=======================================
GNU 12
Evaluator: CONTACT
snip!
Win W(g) W(bg) L(g) L(bg) Equity (cubeful)
static: 0,652 0,346 0,024 0,090 0,006 (+0,601 (+0,843))

Redouble, take : +0,843
Redouble, pass : +1,000 (+0,157)
No redouble : +0,779 (-0,064)

Correct cube action: Redouble, take
====================================================

So for whatever this very simple example (in the very first game) is
worth...
GNU-12 estimates the positional equity as 105 lower than
GNU-14....which is maybe an acceptable difference for your sort of
calculation ?, but.....in this example it also marks the difference
between a take and a drop....does that count for anything ?

If GNU-12 were able to, it seems obvious that it would rate it's own
luck as less than what GNU-14 thinks it is.

I have saved the above game in sfg format if you want to see it, but I
think that if you get the old 12 version of GNU, you will be able to
perform as many tests as you like yourself.

Peter

amni

unread,

Jul 12, 2003, 2:45:26 PM7/12/03

to

Or better, I would ignore Kees van den duel
because of the nonsense he talks.

I wrote in another place where this magic number 8 comes
from. For someone who demands more I would offer
10000 money games and ignoring games with cube more than 16.
Games with very cube do not indicate much the strength
because very high cube games are rare.
Ignoring crazy games reduce the factor of luck.

kvan...@xs1.xs4all.nl (Kees van den Doel) wrote in message news:<3f0fb411$0$2241$e4fe...@dreader6.news.xs4all.nl>...

Jørn Thyssen

unread,

Jul 12, 2003, 3:30:37 PM7/12/03

to

s.w.a....@hccnet.nl wrote:
> On Fri, 11 Jul 2003 21:28:49 +0200, Jørn Thyssen <j...@nospam.com>
> wrote:
>
>
>>Do you agree that even though the luck calculation is now worse, it's
>>still better than not doing it?
>>
> Yes, agree.
> My point is not so much that I think it doesn't give a better
> indication of game outcomes between top bots and lower ranking
> opponents.....I only think that this method is not reliable when it is
> used to measure the luck of opponents of practically equal ranking,
> like maybe GNU, Snowie and BgBlitz.

You have to explain why you think this is the case?

[snip]

> But let's look at the following position...just an example I stumbled
> over recently:
> ---------------------------------------------------------------------------
> Move number 5: X to play 26
>
> GNU Backgammon Position ID: fAEAALjvhgEAAA
> Match ID : cAn5AFAAAAAA

[snip]

> Hint:
> ==================================================
> 1. Cubeless 0-ply 12/6 7/5 Eq.: -1,790
> 0,000 0,000 0,000 - 1,000 0,477 0,000
> 0-ply cubeless
> 2. Cubeless 0-ply 12/6 8/6 Eq.: -1,882 (
> -0,092)
> 0,000 0,000 0,000 - 1,000 0,533 0,000
> 0-ply cubeless
> ==================================================
>
> I checked this equity out with another program (not because of the
> luck factor, but because this move lowered my rating from world class
> to intermediate...grin) and it seems that GNU's estimate of [ -0,092 ]
> is off-track by [ 0,080 ].

You should also be very careful comparing normalised money equities from
one program with another, since they'll most likely use different match
equity tables which may affect the normalised money equities by quite a bit.

Another thing, I get some different equities than you:

1. Cubeful 0-ply 12/6 8/6 Eq.: -1.8003
0.0000 0.0000 0.0000 - 1.0000 0.4830 0.0000
0-ply cubeful [expert]
2. Cubeful 0-ply 12/6 7/5 Eq.: -1.8249 (-0.0246)
0.0000 0.0000 0.0000 - 1.0000 0.4978 0.0000
0-ply cubeful [expert]

Furthermore, the equity difference betwen 12/6 7/5 and 12/6 8/6 doesn't
enter the luck calculation at all! What enters the luck calculation is
the difference between the 1-ply equity for the position and the 0-ply
equity after 62.

> Assuming for the moment that this is indeed correct.....
> Do you agree that if player X takes the >second< choice, that GNU's
> rating of the resulting position is off by 80 too ?

Yes, but it doesn't matter, because errors made by either player does
not -- and I repeat -- does not enter the luck calculation. The luck
calculation is independent of whether you moved 12/6 7/5 or 12/6 8/6!!!

I hope you have understood that now. Perhaps you mean that the luck
calculation is off by 80 -- and not the chequerplay rating?

Assuming that: yes, the luck calculation will be off by -80/18 when you
don't roll 62, and off by +80 when you do roll 62. However, this only
happens with probability 1/18, so the net effect is zero.

> And that -if- in the next few rolls a backgammon is prevented, and the
> match saved for the moment, that the luck needed by X for this to
> happen is going to be estimated 80 too high too ?
> I think it was Albert -maybe someone else- who recently wrote that
> bots' moves diverge too often to enable duplicate games.....if this is
> so, every diversion must lead to a different position and a different
> evalution of the equities of those positions, although they of course
> not necessarily lead to differences of thie magnitude.
> Every time this happens, the estimation of the amount of luck needed
> from that point to the final move of a game must be affected too ?

Yes, this is because we do not have a perfect bot. If I return to my
original 14 game money session:

Game Actual Luck adj.

0 +1.000 -0.040
1 -2.000 -0.671
2 +2.000 +0.243
3 -8.000 +0.600
4 -2.000 +0.027
5 +2.000 -0.087
6 +4.000 +0.050
7 -2.000 +0.200
8 -1.000 -0.223
9 -4.000 +0.780
10 -1.000 -0.220
11 -1.000 -0.395
12 -2.000 +0.289
13 +2.000 -0.519
Sum -12.000 +0.034
Ave. -0.857 +0.002
95%CI 1.519 0.205

If gnubg estimated the luck perfectly, then the luck adjusted result
would be 0 for all games. Note that for all games the luck adjusted
result is much closer to the correct value of zero, hence the 95% CI
interval is much smaller.

As I stated elsewhere: even though gnubg's luck calculation is not
perfect (which is evident from the example above), it's still so good
that is greatly reduces the std. errors -- in this case by a factor 8
which translates into a factor of 64 in the numbers of games, e.g.., to
obtain the same std.error without luck adjustment I should sample
approximately 900 (14*64) games.

Also, as I stated elsewhere, even if I do the luck adjustment with a
worse bot (in my example it was gnubg "intermediate") I still get a
reduction of the std. error with a factor of 3. The conclusion is that
it makes perfectly sense to use gnubg luck rating to do the luck
adjustment of any head-to-head experiment! I think this is the crucial
point to understand!!!

> I understand from the text on the URL you posted that it is assumed
> that an error in equity evaluation 1 ply deep will be corrected via
> examination 2 ply deep....but is this always so ? What if, as in the
> case as described at the end of this email, the 'wrong' evaluation
> leads to a 'double->drop' action that was not necessary ?

This just means that the luck calculation for that particular move will
be off by 0.157 (1.000 - 0.843). The following moves are not directly
affected by this, since each move is handled independently. whatever
happened earlier in the match does not directly affect the luck
calculation; it's a function solely of the position, cube value and
ownership, and match score.

>
> But maybe I am totally off-track, it would not be the first time....if
> so, sorry to have wasted your time...:-))
> **
>
>
> Jorn wrote earlier in this message:
>
>>As I don't have the 2002 version lying around I did something else: I
>>analysed the luck in the 14 game session from my original posting with
>>the "intermediate" setting:
>>
>>>
> But that is not what I think I am addressing....your scenario
> represents a different instance of the same bot that uses the same
> equity values.

No, that was exactly why I chose this experiment. Perhaps I should have
indicated more clearly there "expert" will generate different equities
and moves than "intermediate".

The intermediate setting introduces deterministic noise into the
evaluations, so there will be equity differences compared to the
"expert" setting. Try playing a game against gnubg "intermidiate" and
analyse it on "expert". You'll see that gnubg "intermediate" makes
errors! GamesGrid is using this technique for GGotter, GGweasal, and
GGchipmunk (see http://www.gamesgrid.com/faq-GGrobots.html). This is why
the amentioned bots have lower ratings than GGraccoon that plays on
"expert".

> Just for fun I installed the oct.2002/GNU-12 version and played a
> duplicate game with GNU-14. They both aknowledge each other's error
> free supernatural checker play,

Yes, but again: whatever gnubg thinks about the contestants' actual play
do *NOT* enter the luck calculation. This is why Douglas Zare's writes
that this method is *NOT* biased!

Besides the luck adjusted result, gnubg also outputs "mwc against
current opponent" in the match statistics. However, since the
calculation of this one is based on the error rates, this number is
biased towards the analysing bot. I hope you understand the difference
between "mwc against current opponent" and "luck adjusted result": the
former is biased towards the analysing bot, where as the latter is not!

Yes, no problems there. We do still obtain good improvements by applying
luck adjustment compared to not applying it.

> but.....in this example it also marks the difference
> between a take and a drop....does that count for anything ?

No, this just means the luck calculation is wrong for this particular
position. The following positions in the match may, or may not be
affected. Assuming that the luck calculation was perfect for all other
moves then gnubg 0.12 will calculate that the luck adjusted result for
this game was, say, 0.150 but gnubg 0.14 will calculate that the luck
adjusted result was 0.005. In both cases this is a way better result
than +1 or -1 (assuming the true result is 0).

You'll often see that there errors cancel out, e.g., if gnubg
overestimates the equity of this position, it probably underestimates
the equity after the move where the opponent is on roll. Hence, errors
are like to cancel out.

>
> If GNU-12 were able to, it seems obvious that it would rate it's own
> luck as less than what GNU-14 thinks it is.

No, not necessarily.

Jørn

Jørn Thyssen

unread,

Jul 12, 2003, 3:38:36 PM7/12/03

to

amni wrote:
> Or better, I would ignore Kees van den duel
> because of the nonsense he talks.
>
> I wrote in another place where this magic number 8 comes
> from. For someone who demands more I would offer
> 10000 money games and ignoring games with cube more than 16.
> Games with very cube do not indicate much the strength
> because very high cube games are rare.
> Ignoring crazy games reduce the factor of luck.

Assuming the equity varies continously and that each player doubles at
exactly the take point you'd expect to see 1.6% of all games ending in
16 or higher cubes. That's an astonishing 160 games in a 10,000 games
sample. In the real world the equity doesn't vary continously and the
doubles are not perfect so you'd expect to see fewer (or many more if
you participage in the chouette in my local bg club -- yes, I do
actually play real life backgammon!).

In fact, if you page through this newsgroup (go back to July 1) you'll
find a jellyfish rollout that indicates that 0.1% of all games end in a
16 cube, so with 10,000 you would expect to see 10 (or so) games with a
16 cube. Removing those will certainly introduce bias into the result
(unless of course you do the luck adjustment....)

Jørn

Jørn Thyssen

unread,

Jul 12, 2003, 4:01:04 PM7/12/03

to

Perhaps there is a small typo:

"[...] This affects the value of P(n-1) [...]"

should be

"[...] This affects the value of P(n) [...]"?

Nis, can you confirm?

The luck at P(n) is calculated as

P'(n+1) - P(n)

where P'(n+1) is the equity of the luck analyser's best move for the
roll 43. Errors at P'(n+1) should not affect P(n-1) as far as I can see.

Jørn

Kees van den Doel

unread,

Jul 12, 2003, 5:15:06 PM7/12/03

to

In article <bep6rv$7l779$1...@ID-73584.news.uni-berlin.de>,

John R MacLeod <jrma...@consultant.com> wrote:

>> What's wrong with the method described below?:

>I don't know enough about either the central limit theorem or backgammon to
>guess if the variance in results between roughly equal players is covered by
>the theorem but it might be a good hypotheses. However it still doesn't
>answer Michael's question I don't think. He wants to know how many games he
>has to play to get a certain confidence. If I'm not wrong you are
>suggesting playing 'n' games to work out the variance and then calculating
>the confidence. But he wants to know the confidence first.

That is obviously impossible.

Best you can do is try (say) 1000 games, work out the confidence, if too
low, do 5000 games, work out the confidence again, etc. till you either
get a good result or you give up. Obviously the closer in skill the
players are the more games are needed. If they are equal you will go on
forever, never finding an edge of the one or the other.

Kees (Hoe ging toch grappig bij is obviously you canrelate to
Bleurvissetskade, then explain nick name; does to say?)

Kees van den Doel

unread,

Jul 12, 2003, 5:18:16 PM7/12/03

to

In article <e3504803.03071...@posting.google.com>,
amni <am...@hotmail.com> wrote:

>Or better, I would ignore Kees van den duel
>because of the nonsense he talks.

>I wrote in another place where this magic number 8 comes
>from. For someone who demands more I would offer
>10000 money games and ignoring games with cube more than 16.
>Games with very cube do not indicate much the strength
>because very high cube games are rare.
>Ignoring crazy games reduce the factor of luck.

Sure, let's also ignore all games in which 6-6 was thrown more than
once.

Kees (Ze rijden vele nonsense answers almost surprisingly, Van
Cleemput.)

Nis Jorgensen

unread,

Jul 12, 2003, 9:27:40 PM7/12/03

to

Jørn Thyssen <j...@nospam.com> wrote in message news:<3F106900...@nospam.com>...

> > I've had a read of this and also Douglas Zare's paper on a similar subject
> > cited in another post. I'm still not quite convinced though I recognise the
> > force of the arguments. The example quoted in this one seems based on the
> > idea that the -.200 * 1/18 error in evaluation on position P(n-1) will not
> > result in any different move
> > whereas if it did then position P(n) would not
> > be reached and the self-correcting actual play of 4/3 to reach P1(n+1)
> > wouldn't happen. Am I missing something?
>
> Perhaps there is a small typo:
>
> "[...] This affects the value of P(n-1) [...]"
>
> should be
>
> "[...] This affects the value of P(n) [...]"?
>
> Nis, can you confirm?

Yes. I remember that I changed the indexes of the examples several
times during the writing of that article.

> The luck at P(n) is calculated as
>
> P'(n+1) - P(n)
>
> where P'(n+1) is the equity of the luck analyser's best move for the
> roll 43. Errors at P'(n+1) should not affect P(n-1) as far as I can see.

Correct.

--
Nis Jorgensen
Your man in Hoofddorp

Kees van den Doel

unread,

Jul 13, 2003, 4:19:27 AM7/13/03

to

In article <3F0FFB66...@math.columbia.edu>,

Nobel prize for Backgammon theory contender Douglas Zare <za...@math.columbia.edu> wrote:

>> Yes John, agree entirely with this articulate reasoning.... but how many
>> games is enough. Our clever mathmos won't tell me. I am ready to do it if
>> 5,000 isn't sufficient. Unless I die of boredom of course!

>Someone said that if you can't explain yourself to the man on
>the street in 5 minutes, then you don't really know what you
>are doing. My response is that I don't feel like limiting
>myself to studying what can be explained to the man on the
>street in 5 minutes. (To a related question, a reported asked
>Feynmann to say, "If I could explain it in 5 minutes, it
>wouldn't be worth a Nobel Prize, would it?") I'll add

Actually, Witgenstein said

"Alles was u:berhaupt gedacht werden kann, kann klar gedacht
werden. Alles was sich aussprechen la:Bt, la:Bt sich klar aussprechen".

Kees (BE TRAVELLING BETWEEN INFINITY AND YES YES I burned my Lord hath
power to download v2.30.007 of pompous ass.)

Nis Jorgensen

unread,

Jul 13, 2003, 9:23:45 AM7/13/03

to

"John R MacLeod" <jrma...@consultant.com> wrote in message news:<ben5bg$75qni$1...@ID-73584.news.uni-berlin.de>...

> I'm having a bit of fun trying to work out if I agree with this or not. The
> example I'm thinking of is BOT A and BOT B which are so alike that there is
> only one possible position where their evaluations disagree but on that
> particular position their evaluations disagree strongly. I imagine you
> could create such a situation by cloning GNUBG and putting a deliberate
> 'bug' in at some low level. Now this would come into play whenever that
> position was one of the possible positions within the horizon of search. Set
> them playing each other continuously and in the fullness of time one would
> expect the one with the correct (or better) evaluation to creep slowly
> ahead. Let us say BOT A turns out to be better than BOT B.
> Now how would the two bots analyze the games? For BOTA its easy - its own
> moves are perfect while BOTB makes the occasional error. As luck gradually
> evens out over time everything is as expected - the stronger player wins.
> But for BOTB it is more difficult. Clearly, its own moves are perfect but
> BOTA makes the occasional mistake. Nevertheless, BOTA is winning. The only
> explanation is that BOTA is slightly luckier than BOTB.
> On this basis I would expect the weaker bot to slightly overestimate the
> luck of the stronger bot.

Let's call the position where the bots disagree "P". For simplicity,
let's only look at the cases where this position is reached by playing
4-3 from another position, "Q". The only alternative way of playing
4-3 we will denote "R". Say we have the following evaluations by the
two bots:

Position BOTA BOTB
P 0.68 0.32
R 0.50 0.50

Combining this, we get these best-move equities for the roll:

4-3 0.68 0.50

If we assume that all other rolls in position Q have a best-move
equity of 0.50 (agreed by both bots), we get the 1-ply equity for Q:

Q 0.51 0.50

If we reach position Q 18 times during our session, and roll 4-3 one
of those times, the sum of the luck for these positions calculated by
BOTA is:

17 * (0.50 - 0.51) + 1 * (0.68 - 0.51) = 0

(BOTB has 0 luck no matter the roll)

Both bots will think the other bot errs on the position, if asked to
evaluate skill. But this does not change the fact that they will have
average luck of zero in the long run.

Again: The acual move played has NO effect on the calculated luck of
the roll. We could calculate the luck even if no move was played (for
instance a resignation) or if an illegal move was made. Since the luck
of a game/match/session is just the sum of the luck for the individual
rolls, it cannot be biased unless the luck of an indivual roll is
biased. And the luck of the single roll is constructed so that it
CANNOT be biased for or against anything.

Message has been deleted

Back4U2 BBL

unread,

Jul 13, 2003, 11:25:10 AM7/13/03

to

"Murat Kalinyaprak" wrote in message
news:2831c30c.0307...@posting.google.com...
> Albert Silver wrote f9846eb9.03070...@posting.google.com
>
> > In any case, I honestly don't believe playing back some
> > 15k .wav file slows the playing down any.
>
> Talking about speed, what is that square moving back and
> forth (in the bottom left corner of the screen) good for
> anyway...??
>
> Displaying graphics, especially constantly moving ones,
> used to be among the slowest and most resource consuming
> operations in computing. Is it not the case anymore...??
>
> I would guess that it is still a considerable waste of
> CPU time but even if it wasn't, what's the fucking use
> for it...? Trying to get the human opponent cross-eyed
> and distracted...??

It's part of the magician's trick.
Check your wallet when it stops moving.

Nardy

John R MacLeod

unread,

Jul 13, 2003, 11:26:40 AM7/13/03

to

"Nis Jorgensen" <n...@dkik.dk> wrote in message >

I am getting more and more impressed by the thinking behind this and
certainly for what I use GNUBG for (evaluating my own play) I think its
great. Any errors in GNUBGs evaluation will be trivial compared to the
errors in my play.
Obviously for evaluating equal strength players though the luck calculation
must be neutral even when wrong evaluations are around.
I think I'm convinced now that in the example I quoted (which you expanded)
it works. I still have a niggling doubt in a very similar situation where
the wrong evaluation is sufficient to cause a different move. I have
slightly changed your example below. I feel I'm probably misssing something
but it seems to me that in this example Bs error causes its real luck to be
missed just because it didn't play the best move.
Totally unrealistic example of course.

> Let's call the position where the bots disagree "P". For simplicity,
> let's only look at the cases where this position is reached by playing
> 4-3 from another position, "Q". The only alternative way of playing
> 4-3 we will denote "R". Say we have the following evaluations by the
> two bots:
>
>
> Position BOTA BOTB

> P 0.68 0.30
> R 0.30 0.30
Note that R is now also 'bad luck' so position P is undesireable from BOT
B's point of view. .

>
> Combining this, we get these best-move equities for the roll:
>

> 4-3 0.68 0.30
>
> <snipped a sound calculation >
Lets assume the potential for position P turns up twice in the match - once
for BOT A and once for BOT B and that by chance the rolls are identical from
position Q-1.
Lets also assume that all other rolls for several turns before and after Q
are totally neutral so that the luck remains static. That is all positions
derived from P are correctly evaluated by both bots at 0.68, all from R as
.30 and all that don't go through either position as 0.50. Q and all
positions prior to Q are evaluated as 0.50.
Now at Q-1, BOT B will see Q as a bad position to get into while actually
making the play because looking 1 ply further ahead it sees position P on
the horizon so it plays something that doesn't get it to position Q.
However BOT A sees position P as desirable and takes steps to increase its
chances of getting there. Lets assume it gets the rolls it needs to achieve
position Q and then P. And remember that we have said in the same situation
BOT B got exactly the same rolls.

Looking at the situation from BOT Bs point of view:
As we have defined all other moves as resulting in 0.50 its own luck over
this sequence is going to be zero.
BOT B though sees BOT A's luck has decreased (on move Q to P) (and it has
made an error on move Q-1 to Q which is irrelevant to the calculation). .
However, as all positions derivable from P are .68 its luck will be
corrected on the next roll.
So that should work from BOT B's calculation.

Looking at the situation from BOT As point of view:
As we have defined all other moves as resulting in 0.50 BOT B's luck over
this sequence is going to be zero. (It never gets to Q where the two BOTs
disagree about luck). (It has made an error but thats irrelevant).
BOT A though sees its own luck has increased (on move Q to P). Which it has.

So both bots agree that A got lucky - which it did. However neither spotted
that BOT B was equally lucky and didn't take advantage of it.

Derek Ray

unread,

Jul 13, 2003, 11:34:11 AM7/13/03

to

In message <2831c30c.0307...@posting.google.com>,
mu...@compuplus.net (Murat Kalinyaprak) mumbled something about:

>Albert Silver wrote f9846eb9.03070...@posting.google.com
>
>> In any case, I honestly don't believe playing back some
>> 15k .wav file slows the playing down any.
>
>Talking about speed, what is that square moving back and
>forth (in the bottom left corner of the screen) good for
>anyway...??

It lets you know the program hasn't died.

This is called "good user interface design"; displaying an indicator to
let you know the program is busy conducting operations, instead of
frozen. It is considered to be a basic tenet of such.

>Displaying graphics, especially constantly moving ones,
>used to be among the slowest and most resource consuming
>operations in computing. Is it not the case anymore...??

An operation such as sliding a box back and forth in a small area is a
trivial use of resources, given today's computing speeds AND given the
amount of "work" already done underneath the scenes for you with regard
to graphics display.

For such an accomplished programmer as yourself, you really don't seem
to be up to speed on even the basics. I guess that's why you can't go
read gnubg's source code yourself to discover that it doesn't cheat.

Oh, wait. I'm wrong. You're scared to do it -- you'd rather hide
behind your meaningless religious rhetoric and other pointless blather,
because you know you'd be shown up. Just like you know if you ever
showed up anywhere in person with these theories, you'd leave with your
wallet empty... that is if you were brave enough to bet, which you
aren't.

Again, until you read the source code, Mr. Programmer, and can point out
to me exactly where gnubg is cheating, then you need to pipe down or
break out your wallet and show up somewhere.

I think you're just too much of a pussy.

>I would guess that it is still a considerable waste of
>CPU time but even if it wasn't, what's the fucking use

You would guess incorrectly. Have you ever done ANY sort of work on
modern equipment at all, or with modern languages and operating systems?

>for it...? Trying to get the human opponent cross-eyed
>and distracted...??

See above. If you're weak-minded enough to be distracted by a small
sliding box, well, let's just say I've known a lot of people like you...
who would complain about ANYTHING to hide their lack of skill.

The answers are available for you in gnubg's source code. As always,
please feel free to browse it and identify where it's cheating before
you respond. Again, you're too much of a pussy to do this, I think.

-- Derek

"Ignorance more frequently begets confidence than does knowledge."
- C. Darwin, 1871

s.w.a....@hccnet.nl

unread,

Jul 13, 2003, 12:20:00 PM7/13/03

to

On Sat, 12 Jul 2003 21:30:37 +0200, Jørn Thyssen <j...@nospam.com>
wrote:

>> My point is not so much that I think it doesn't give a better
>> indication of game outcomes between top bots and lower ranking
>> opponents.....I only think that this method is not reliable when it is
>> used to measure the luck of opponents of practically equal ranking,
>> like maybe GNU, Snowie and BgBlitz.
>>>>>>>>>>>>>>>
>You have to explain why you think this is the case?
>>
>>

I am afraid I can't anymore after having read this last answer of
yours, plus the entire conversation on the subject on
http://mail.gnu.org/....
Everything that vaguely bothered me is covered. In short: I
surrender...:-)
The only consolation is that I saw I was not the only one who walked
this path...grin...
**

Jorn wrote:
>Another thing, I get some different equities than you:
>
> 1. Cubeful 0-ply 12/6 8/6 Eq.: -1.8003
> 0.0000 0.0000 0.0000 - 1.0000 0.4830 0.0000
> 0-ply cubeful [expert]
> 2. Cubeful 0-ply 12/6 7/5 Eq.: -1.8249 (-0.0246)
> 0.0000 0.0000 0.0000 - 1.0000 0.4978 0.0000
> 0-ply cubeful [expert]
>>
>>

I have tried to replicate your equities with different settings, but I
still get the same.....??...
...this here is a compilation of the 'lite' download of the build july
6th executable plus libraries, and the june 20th full download.
I guess I better get a full july 6th build ?
**

Jorn wrote:
>Besides the luck adjusted result, gnubg also outputs "mwc against
>current opponent" in the match statistics. However, since the
>calculation of this one is based on the error rates, this number is
>biased towards the analysing bot. I hope you understand the difference
>between "mwc against current opponent" and "luck adjusted result": the
>former is biased towards the analysing bot, where as the latter is not!

8><CUT--------

>You'll often see that there errors cancel out, e.g., if gnubg
>overestimates the equity of this position, it probably underestimates
>the equity after the move where the opponent is on roll. Hence, errors
>are like to cancel out.
>>
>>

Yes, I see the light now....thanks for the explanations and your
patience.

One more question, is it possible to add an (optional) display of the
rounded-off percentages of luck for each individual move in the game
record window ? I would like to be able to see the entire 'luck
development' during a game via refreshing the game record window
occasionally.

Peter

Jørn Thyssen

unread,

Jul 13, 2003, 12:48:41 PM7/13/03

to

s.w.a....@hccnet.nl wrote:

[snip]

> One more question, is it possible to add an (optional) display of the
> rounded-off percentages of luck for each individual move in the game
> record window ? I would like to be able to see the entire 'luck
> development' during a game via refreshing the game record window
> occasionally.

Something like this has been on the wishlist for a while, but nobody is
working on it right now.

Jørn

Albert Silver

unread,

Jul 13, 2003, 1:54:58 PM7/13/03

to

mu...@compuplus.net (Murat Kalinyaprak) wrote in message news:<2831c30c.0307...@posting.google.com>...

> Albert Silver wrote f9846eb9.03070...@posting.google.com
>
> > In any case, I honestly don't believe playing back some
> > 15k .wav file slows the playing down any.
>
> Talking about speed, what is that square moving back and
> forth (in the bottom left corner of the screen) good for
> anyway...??

A number of functions are de-activated when the program is thinking,
however if you don't realize it is thinking, you could end up thinking
the program froze or something. This assures the user it didn't
suddenly stop working.

> Displaying graphics, especially constantly moving ones,
> used to be among the slowest and most resource consuming
> operations in computing. Is it not the case anymore...??

No, thank goodness, at least not for such minor things. You might wish
to open a cmput4er magazine to see the types of graphics that are
common nowadays. That little thing is no challenge to the machine.

> I would guess that it is still a considerable waste of
> CPU time but even if it wasn't, what's the fucking use

> for it...? Trying to get the human opponent cross-eyed
> and distracted...??

See above.

>
> To ice your cake, let me say that I think it fits the
> irrelevant/useless garbage loaded gnudung perfectly...!
>
> So please don't get rid of it... :))

I happen to agree that it is distracting. An advancing or increasing
bar might be better.

Albert Silver

>
> MK

Albert Silver

unread,

Jul 13, 2003, 1:58:28 PM7/13/03

to

mu...@compuplus.net (Murat Kalinyaprak) wrote in message news:<2831c30c.03071...@posting.google.com>...
> Michael Howard wrote bek9iu$6hd$1...@newsg4.svr.pol.co.uk
>
> > ..... The game is being reduced to computer analysed numers
> > and nobody has creative thoughts about backgammon anymore,
>
> Michael, I have tried very had to defend this for a very long
> time and I can immediately relate to your comments...
>
> Unfortunately, the world of bacgammon has been innondated by
> "sick gambkers" who would be just well of betting money on
> just rolling the dice alone... :((
>
> And even more unfortunately, nobody from out of their "little
> incestuous sick gamblers circle" has the time, means and/or the
> motivation to counter them... :(((
>
> > How do you calculate luck like that? On a roll by roll basis
> > it was lucky. As a sequence of rolls it was phenomenal.
>
> Questions like this one has been raised by myself and others
> for a long time... As a result, it looks like they implemented
> some joke of a multi-ply luck analysis... :))

Actually, that isn't what the Temperature Map (which I know amazes
you) was designed for, although it can serve for that purpose as well.
It's really meant to help a player see in a graphic manner the
potentially dangerous rolls a move might allow. Sometime when you see
what may lie ahead you can better understand what is going on. Imagine
conisdering a doubling decision and worried about the number of market
losers. Here you'd know offhand.

Albert Silver

>
> What if a similar "phenomenal" sequence of rolls were to come
> just shy of winning the game...??
>
> You better bet that the sick scum can cheat at 3 decimal place
> accuracy...!! :)) But, luckily, people like you and I are here
> to shove it up their sick asses... Hallelujah and praise the
> father of the faggot...
>
> MK

s.w.a....@hccnet.nl

unread,

Jul 13, 2003, 3:16:16 PM7/13/03

to

On Sun, 13 Jul 2003 18:48:41 +0200, Jørn Thyssen <j...@nospam.com>
wrote:

>> One more question, is it possible to add an (optional) display of the

Oh well...we'll wait.....after all: 'Luck is for Losers', isn't it ?
:-)

Peter

Michael Sullivan

unread,

Jul 14, 2003, 1:12:15 PM7/14/03

to

> It is only as 'good' as the quality of the equities that are used.
> If you let the october 2002 version of GNU play against the july 2003
> version, and let the 2002 version calculate the 'luck', you are off by
> the same amount as the equity improvements that have been added to the
> 2003 version.

> In other words and in the same vein.....if f.e. Snowie is indeed a
> better player than GNU at this moment , GNU will not be able to figure

> that out in those cases where Snowie's equities are superior....GNU
> will just see Snowie's actually 'better moves' as 'inferior moves',
> and it will rate Snowie's gained equity as 'luck'.

But the whole point is that it *doesn't* look at the moves, it only
looks at it's estimated equity swings from the roll in the position. Any
errors in this calculation are *much* less likely to biased in favor of
a particular bot's play.

> Jorn, your set up only works when a bot with an overall superior
> equity table is used to 'judge' an inferior one.....sort of Catch
> 22.....you will first have to prove that the bot that does the
> calculation is indeed using an >overall< better equity table.

That's true if the bot doing the analysis is analyzing the moves for
errors. But any bot with a reasonable equity table can make a luck
analysis that will bring the result closer to the truth than the raw
match. It doesn't have to be as strong as either of the participants,
for the variance reduction to be helpful.

You can demonstrate this fairly easily by running a dueller match
between bot A and B for, say 100 games and then letting a bunch of
different bots analyze the results. If each bot looks only at errors
(and doesn't rollout), then you find that bot A will rate bot A as
having perfect play and bot B as having x errors, even thought bot B may
have won the match. Same with bot B doing the analysis. But if you ask
them to calculate the luck adjusted result, they will probably come up
with very similar results. Not only that, but other bots calculating
the luck adjusted result will *also* have similar results. At the very
least these luck adjusted results will be much closer together than the
total error analysis results.

Try it and see!

In fact, I wonder if you can judge a bot's strength by looking at the
difference in the calculation it makes for luck adjusted results, and
expected zero-luck result based on the error rates. The smaller the
difference, the stronger the bot. If the difference is near zero over a
lot of analyzed matches, you may have a bot that approaches perfect
play.

Does that make sense?

Michael

Message has been deleted

Jørn Thyssen

unread,

Jul 14, 2003, 5:13:00 PM7/14/03

to

Michael Sullivan wrote:

[snip]

> In fact, I wonder if you can judge a bot's strength by looking at the
> difference in the calculation it makes for luck adjusted results, and
> expected zero-luck result based on the error rates. The smaller the
> difference, the stronger the bot. If the difference is near zero over a
> lot of analyzed matches, you may have a bot that approaches perfect
> play.
>
> Does that make sense?

We do have that "a perfect bot" produces "luck adjusted result = error
rate result". That is, A => B.

The question is if we do have B => A? That is, if the "luck adjusted
result" is equal to the "error rate result", do we then have a perfect bot?

A simple counter example is a bot that simply says that the luck
adjusted result is zero and so is the error rate result. This bot is
clearly not perfect, hence A does not follow from B in general.

Jørn

Ron Barry

unread,

Jul 14, 2003, 5:30:21 PM7/14/03

to

Murat,

That's the answer! When you stuck that 5' copper tube up your ass, it
went all the way up into your brain! That's what is wrong with you
today! And all along, I thought it was those cactus thorns that you got
caught in you, riding the range up there in Montana! That explains why
you think you can find bugs in GnuBG without being able to read the
source code! My advice is not to try to remove it without professional
assistance, because if the tube is more than about 1/2" in diameter, it
could take most of your brain with it when it comes out! Good luck!

Best regards, Ron Barry.

Albert Silver

unread,

Jul 14, 2003, 10:03:25 PM7/14/03

to

mu...@compuplus.net (Murat Kalinyaprak) wrote in message news:<2831c30c.03071...@posting.google.com>...

> Albert Silver wrote f9846eb9.03071...@posting.google.com
>
> > Michael Howard wrote bejolg$q82$1...@newsg4.svr.pol.co.uk
>
> >> All this is very impressive but of no real use..... Only the
> >> results of head-to-head are of real interest to the community
> >> outside you 'boffins'. You simply confuse everyone with numbers
> >> to 3 decimal places and kid us they mean something..... if two
> >> players go for 5,000 games (or more if possible) the winner is
> >> the better player - end of story.
>
> > Just because you don't understand them, doesn't make them wrong.
>
> He didn't say "wrong", he said "useless"...!! Obviously you are
> not smart enough to differentiate the difference just because it
> wasn't presented to you in 3 decimal place accuracy... :)))

Actually, the only way the data would be useless is if they were
wrong, unless you don't understand it. If you did understand it then
you'd *know* that correct it could never be useless.

Albert Silver

>
> > Still, your line of argument is similar to that of Creationists
>
> Prime example that irrelevant bullshit has no limits... :( Puke...!
>
> > ... Still, a luck analysis would better shed light on this, no?
>
> If I walked into a hardware store and asked for 5' of copper
> tubing and if the Moronian at the counter started bullshitting
> me about how that 5' of copper tubing would measure 4.926' based
> on current temperature, atmospheric pressure, wind speed and the
> moisture in his mother's cunt, I would punch the "mother fucker",
> oops make that "father fucker", and I would walk out... :((
>
> MK

Kees van den Doel

unread,

Jul 15, 2003, 3:15:22 AM7/15/03

to

In article <2831c30c.03071...@posting.google.com>,
Murat Kalinyaprak <mu...@compuplus.net> wrote:

>>> I personal will ignore all games with cube raid to more than
>>> 8 points, so that crazy games will be ignored.

>> You might as well ignore all games which JellyFish won because
>> you think they are "crazy".
>> Or ignore all games with more than 2 doubles in a row.
>> Or just play tic-tac-toe.

>Way to go "Oeroeboeroe"... :)) For a change, I like what you
>wrote enough that I spelled "Oeroeboeroe" properly... :))

Oh yeah that's a major flaw of the bot's "luck adjusted equity
evaluation" because they should really check if the equity was dropper
by someone who spells Oeroeboeroe correctly, or not as it has major
rammifications.

Kees (Borumand radif, part there such things?)

Message has been deleted

Kees van den Doel

unread,

Jul 15, 2003, 5:23:20 AM7/15/03

to

In article <25ebfa84.03071...@posting.google.com>,
Nis Jorgensen <n...@dkik.dk> wrote:

>Again: The acual move played has NO effect on the calculated luck of
>the roll. We could calculate the luck even if no move was played (for
>instance a resignation) or if an illegal move was made. Since the luck
>of a game/match/session is just the sum of the luck for the individual
>rolls, it cannot be biased unless the luck of an indivual roll is
>biased. And the luck of the single roll is constructed so that it
>CANNOT be biased for or against anything.

For some reason it seems unintuitive that an imperfect bot can help to
decide who's the better player, even if they are both better than the
bot.

It may help to look at a human analogy to correct this failure of common
sense.

Suppose Murat and Zare play a 61 point match and Murat wins by 62-11.

Who's the better player?

An ignoramus will be able to contribute nothing beyond the obvious that
Murat is probably better but what the confidence is God only Knows.

An average player can do better by analysing the match, noticing all
those 66's Murat miraculously got, and all the bad obvious blunders he
made which went unpunished and this player will conclude Murat had a lot
of luck so the result could be just lucky.

An expert player will be able to refine this analysis and point out
conceptual errors in Murats play and provide an even more substantiated
estimate of the players relative strenghts based on his imperfect
understanding of the game.

God (The Lord, Creator of Heaven in Earth, you know Who) with His
Perfect Play Will simply just Analyse the games for errors and Tally Up
Murats and Zares equity drops and Conclude the player with the least
equity drop was the stronger.

So you see that an average player can do better than an ignoramus in
correcting for luck, even though the Perfect Player will do even better.

Similarly a bot can help estimate the relative strenght between players
better than just looking at the raw score even if the bot itself is
imperfect.

I hope this analogy helps selling the variance reduction to the ignorant
masses (the plebs, the commons, the footfolk, cannonfodder, lower
classes, servants, toiletcleaners, the silent majority, the G. W. Bush
voters).

Kees (If everyone for May what is FORBIDDEN on drugs, good thing or
whatever, It hat zum bedeutendsten Industriezweig der Doel, you
talked, yet you've always try scanning in evangelische site is
gespannen, waardoor xs uitspreekt doet kun ook helpen als lo en
antwoorden komen Door crossposten genaamd, verwijderen maar eerst
mijn werk, lekker zelfs, maar alcoholisten die richting, zodat ie
zwart gaatje wegens Uw FAQspecialist writes: Poost alleen dingen
gedaan, of Portuguese/Indian parents but chaffing you fucking
tedious argument.)

Albert Silver

unread,

Jul 15, 2003, 9:41:35 AM7/15/03

to

mu...@compuplus.net (Murat Kalinyaprak) wrote in message news:<2831c30c.03071...@posting.google.com>...

> Jørn Thyssen wrote 3F0EBFEE...@nospam.com
>
> > Michael Howard wrote:
>
> >> ... Adding up the individual changes probably won't
> >> be good enough to express the overall effect.
>
> > Why is that? You're actually stating that the total
> > luck of a sequence (or game) is *NOT* the sum of the
> > luck for each roll!?!
>
> What kind of bullshit question is this Jorn...? The
> guy seems to be saying that a "1-ply" luck analysis
> isn't enough and you are asking him back "why so"...
>
> Let me then ask you why the gnudung makes an attempt
> at "multi-ply" luck analysis...?
>
> I will get more satisfaction if you just shove your
> answer up your dumb ass instead of cluttering this
> newsgroup any more...

A reading disability is no doubt the source of your reply since it was
answered yesterday in this very same thread. Here is the quoted
relevant part (the first part being your own self):

> Questions like this one has been raised by myself and others
> for a long time... As a result, it looks like they implemented
> some joke of a multi-ply luck analysis... :))

Actually, that isn't what the Temperature Map (which I know amazes
you) was designed for, although it can serve for that purpose as well.
It's really meant to help a player see in a graphic manner the
potentially dangerous rolls a move might allow. Sometime when you see
what may lie ahead you can better understand what is going on. Imagine
conisdering a doubling decision and worried about the number of market
losers. Here you'd know offhand.

Albert Silver

>
> MK

Steve Bortnyck

unread,

Jul 15, 2003, 9:55:19 AM7/15/03

to

Jorn,

First I would like to thank you for all your time with this, the program as
well as the discussion. It is very nice of you to offer a free program (I
believe you are (the/one of the) person behind GNUBG.)

I think perhaps Michael has a point though. I use Snowie, sorry, I know a
competitor so to speak. And I also get to speak to Malcolm Davis often
about the program and moves. He is constantly reminding me that I need to
roll out every single error if I want to get the very best analysis of my
play. My computer is unfortunately slow, and so, I find that as an
intermediate, it is better to perhaps get the second best answer. (I
realize the fallacy of this, but will continue)

Michael was stating what for many of us is a valid point. Every time a 'bot
performs a shortcut for us, it is in fact introducing a bias to the
analysis. An evaluation by Snowie is not 100% accurate. Hence the many
rollouts discussed online. Variance reduction is also a shortcut.
Instituted to provide a usable program, not to be MORE accurate, correct?
Bearoff databases on Snowie are not accurate either right? (Well known
flaws hence Sconyers database which is not compatible with Snowie. Well
compatible, but not interchangeable.)

So.... why not just let the bots play. Whoever wins, wins. Surely
letting them play for an infinite number of games would be no less accurate
than using a luck evaluation, variance reduction, etc.? Right?

I mean the use of all the shortcuts is just that. To simulate a larger
number of games with a smaller sample size? So, if one had the CPU
available, and the time, and was willing wouldn't a 50 million match "duel"
be enough to tell which is the stronger version of any bot? And if not,
would it then matter? If an inordinate number of games could not "safely"
declare a victor then call them "close enough for government work" (an
American joke) and let people know the results and decide what they would
like to do.

The reason I say this is that many times I see writers on Gammonvillage do
rollouts on 2ply with snowie. Huh? Why not 3 ply? I mean it doesn't take
that much longer right? Not for a measly 1296 games? I realize that this
is a separate argument, but many people feel the need to get the second best
answer for speed's sake, and perhaps y'all are doing this also?

So, to make an overly long post somewhat simpler...

1. With no luck factor, no bearoff databases, no truncation, no modifying
cubes down (after all big cubes ARE a function of BG, no matter how rarely),
no shortcuts of any kind, won't someone be able to play enough games to get
a reasonable degree of accuracy about the play of two bots?

2. If you want to get an answer quicker with a more intuitive feel for the
numbers, use whatever settings you like, and report them for all who enjoy
the knowledge and effort you put in.

3. Make an evaluation of what the bots are in fact doing. If number 2 is
more accurate (I doubt it, I would admit quicker however, much quicker) then
make a case for it. If 1 is more accurate, but slower admit that also.

4. Make an estimation at the very beginning about how many games would be
reasonable to draw a conclusion from. If I play my entire life against my
brother, 8 hours a day, every day, and then my sons do the same against his
sons, and so on and so on, surely if there is no "winner" after 500
generations, then maybe we are equal? Just a thought :)

Thanks again for all your time on behalf of the BG community, I really
appreciate it.

steve

Douglas Zare

unread,

Jul 15, 2003, 12:49:33 PM7/15/03

to

Steve Bortnyck wrote:

> Michael was stating what for many of us is a valid point. Every time a 'bot
> performs a shortcut for us, it is in fact introducing a bias to the
> analysis. An evaluation by Snowie is not 100% accurate. Hence the many
> rollouts discussed online. Variance reduction is also a shortcut.
> Instituted to provide a usable program, not to be MORE accurate, correct?

No, the point is to be more accurate for any fixed amount of
computing time. If you look at a rollout with variance
reduction with 1296 trials, then it might be equivalent
(in terms of accuracy) as a rollout with 100,000 trials and
no variance reduction. Another way of saying it is that
with variance reduction, you may find that play A was better
than play B by 0.050 +- 0.010. Without variance reduction,
you might get 0.050 +- 0.100.

You make it sound like a shortcut is evil. However, you may
wish to distinguish between the shortcuts that decrease the
quality of the results and those that are simply faster ways to
arrive at the same answer. Is image compression bad or good
for image quality? It depends. Some techniques are reversible.
All of the information is still there. Other techniques throw
away some of the information.

Error analysis introduces biases. Variance reduction does not.
This is counterintuitive to many people, which is why it is
emphasized in every introduction to variance reduction.

> So.... why not just let the bots play. Whoever wins, wins. Surely
> letting them play for an infinite number of games would be no less accurate
> than using a luck evaluation, variance reduction, etc.? Right?

Sure, that would be ok. Since no one has an infinite amount
of computing power no one does that.

The level of accuracy needed to distinguish two players
obviously depends on how close their playing strengths are.
Others have simply asserted that an incorrect number of games
is sufficient.

If you don't want to use variance reduction, then the standard
deviation per game is about 3 points. To get a result that is
within 0.1 points of the correct value about 95% of the time,
you need to play until the standard deviation of the average is
1/60 of the 3 points, 0.05 points, so that being off by +- 0.1 points
is +-2 standard deviations. That requires about 60^2 = 3600
games.

A tenth of a point per game is huge. It could be that one
program is that much stronger than the other, but keep in
mind that Snowie 4 is supposed to be better than Snowie 3
by less than 0.04 ppg. In order to determine that it is 0.04
rather than 0.03 or 0.05, you can use 360,000 games. There
are 525,600 minutes in the year, and the games may take
more than a minute, depending on the settings. It makes
sense to use variance reduction to cut the number of games
needed by a factor of 100, so that it takes days or weeks
rather than years to get results accurate to within 0.01 ppg.

> The reason I say this is that many times I see writers on Gammonvillage do
> rollouts on 2ply with snowie. Huh? Why not 3 ply? I mean it doesn't take
> that much longer right? Not for a measly 1296 games? I realize that this
> is a separate argument, but many people feel the need to get the second best
> answer for speed's sake, and perhaps y'all are doing this also?

First, 3-ply rollouts are much slower than 2-ply rollouts,
particularly for bots that are slow to begin with, such as
Snowie 4. It is a serious drawback of gnu that its analogue
of 2-ply is weak, making its analogue of 2-ply rollouts
unreliable in many positions. (Gnu calls this 1-ply.)

I think some of the rollouts there have been too short. Part
of the problem is that someone may wish to roll out many
positions, perhaps every blunder and cube action in a long
match. You might not care about whether a single rollout
takes 45 minutes or 12 hours, but if you have to perform 20
such rollouts, the difference in time is substantial.

I performed rollouts of more than 100 positions for my last
column, and even though the positions were of a type that
rolled out very quickly I didn't use 3-ply rollouts for most of
them. I tested that 2-ply (and 3-ply) handles most of the
decisions correctly. In some situations, there is little reason
to use 3-ply rather than 2-ply.

Douglas Zare

Albert Silver

unread,

Jul 15, 2003, 1:08:48 PM7/15/03

to

"Steve Bortnyck" <bort...@comcast.net> wrote in message news:<bJTQa.58166$wk6....@rwcrnsc52.ops.asp.att.net>...

> Jorn,
>
> First I would like to thank you for all your time with this, the program as
> well as the discussion. It is very nice of you to offer a free program (I
> believe you are (the/one of the) person behind GNUBG.)
>
> I think perhaps Michael has a point though. I use Snowie, sorry, I know a
> competitor so to speak. And I also get to speak to Malcolm Davis often
> about the program and moves. He is constantly reminding me that I need to
> roll out every single error if I want to get the very best analysis of my
> play. My computer is unfortunately slow, and so, I find that as an
> intermediate, it is better to perhaps get the second best answer. (I
> realize the fallacy of this, but will continue)
>
> Michael was stating what for many of us is a valid point. Every time a 'bot
> performs a shortcut for us, it is in fact introducing a bias to the
> analysis. An evaluation by Snowie is not 100% accurate. Hence the many
> rollouts discussed online. Variance reduction is also a shortcut.
> Instituted to provide a usable program, not to be MORE accurate, correct?

No, variance reduction makes it more accurate. It isn't a shortcut, it
is a mathematical improvement.

> Bearoff databases on Snowie are not accurate either right? (Well known
> flaws hence Sconyers database which is not compatible with Snowie. Well
> compatible, but not interchangeable.)

Is there a connection? BTW, one can now enjoy the 12 DVD set by Hugh
Sconyers from within GNU.

> So.... why not just let the bots play. Whoever wins, wins. Surely
> letting them play for an infinite number of games would be no less accurate
> than using a luck evaluation, variance reduction, etc.? Right?

Variance reduction *is* factoring in the luck, it isn't something
separate. But yes, one can very well get the same results with more
game and no variance reduction. For example, if you wish to get the
same precision as 5000 games analyzed with Variance Reduction, you can
play 500,000 without. Both will yield the same results more or less.

> I mean the use of all the shortcuts is just that. To simulate a larger
> number of games with a smaller sample size? So, if one had the CPU
> available, and the time, and was willing wouldn't a 50 million match "duel"
> be enough to tell which is the stronger version of any bot? And if not,
> would it then matter? If an inordinate number of games could not "safely"
> declare a victor then call them "close enough for government work" (an
> American joke) and let people know the results and decide what they would
> like to do.

Perhaps the results would be the same, but why would one want to play
100,000 games or more for example when only 5000 were necessary?

>
> The reason I say this is that many times I see writers on Gammonvillage do
> rollouts on 2ply with snowie. Huh? Why not 3 ply? I mean it doesn't take
> that much longer right? Not for a measly 1296 games?

Snowie 4's 3-ply is a LOT slower than it's 2-ply. Maybe 3-5 times
slower.

Albert Silver

s.w.a....@hccnet.nl

unread,

Jul 15, 2003, 1:30:42 PM7/15/03

to

On 14 Jul 2003 12:33:30 -0700, mu...@compuplus.net (Murat Kalinyaprak)
wrote:

>s.w.a.l.l.o.w wrote eoutgvoir66a81sqe...@4ax.com

>> Jorn, your set up only works when a bot with an overall superior
>> equity table is used to 'judge' an inferior one.....sort of Catch
>> 22.....you will first have to prove that the bot that does the
>> calculation is indeed using an >overall< better equity table.

>> If you don't, you have "anecdote" at best, and maybe not even
>> anecdote. The old motto: "bullshit in - bullshit out" comes to mind.
>>>>>>>>>>>>>

>Music to my ears... :)) Truth will eventually set even Morons
>free and the sick bastards will attain climax as they will get
>pissed on...
>>
>>
You're wrong. You are quoting a single passage (mine) taken from the
middle of a longer conversation, and you present it here out of
context.
I already stated that I (occasionally) confused mwc with luck.
After having read all the arguments, both here as well as in the GNUBG
list archive, I now accept the idea that even a bot with imperfect
equity data can reasonably estimate the statistical luck in a game or
match, no matter who the opponent is.
I can only advise you to read Jorn's earlier suggestion to read the
messages on this webpage:
http://mail.gnu.org/archive/html/bug-gnubg/2003-06/msg00164.html
Maybe you learn something too.

Peter

Steve Bortnyck

unread,

Jul 15, 2003, 9:46:03 PM7/15/03

to

"Douglas Zare" <za...@math.columbia.edu> wrote in message
news:3F1432CF...@math.columbia.edu...

>
> Steve Bortnyck wrote:
>
> > Michael was stating what for many of us is a valid point. Every time a
'bot
> > performs a shortcut for us, it is in fact introducing a bias to the
> > analysis. An evaluation by Snowie is not 100% accurate. Hence the many
> > rollouts discussed online. Variance reduction is also a shortcut.
> > Instituted to provide a usable program, not to be MORE accurate,
correct?
>
> No, the point is to be more accurate for any fixed amount of
> computing time. If you look at a rollout with variance
> reduction with 1296 trials, then it might be equivalent
> (in terms of accuracy) as a rollout with 100,000 trials and
> no variance reduction. Another way of saying it is that
> with variance reduction, you may find that play A was better
> than play B by 0.050 +- 0.010. Without variance reduction,
> you might get 0.050 +- 0.100.
>

> Error analysis introduces biases. Variance reduction does not.
> This is counterintuitive to many people, which is why it is
> emphasized in every introduction to variance reduction.

Douglas, hate to disagree with you but the link below goes to a discussion
Jorn had about GNUBG in one of the earlier versions:

http://mail.gnu.org/archive/html/bug-gnubg/2003-07/msg00055.html

"This is most likely due to the wrong variance reduction, now fixed.

I've seen examples where the variance reduction results in negative
gammon rates or other artifacts. gnubg will correct this on the
fly (e.g., applying a sanity check to the result of the rollout so far).
However, the calculated mwc is not corrected, hence you may see examples
where a manual calculation of the MWC differs from the MWC calculated by
gnubg.

For example,

The first trial returns: 51% -1% 0% - 49% 0% 0%. The mwc calculated from
this is: 25%. gnubg applies the sanity check to the gwc and arrives at
51% 0% 0% - 49% 0% 0%. A "manual" mwc calculation gives 25.5%.

A negative number of gammons are possible due to imperfections in the
luck analysis.

Jørn"

So, I will stand by my original thought. If a human being programs
something, there is a possible bias introduced. If you let the die roll,
without the variance reduction, you MAY (I emphasize this to admit that
there may be a "best" circumstance of no loss in accuracy) eventually get
the "correct" answer. It would depend on the programs themselves of course!

Steve

PS By the way, I like the idea of a Faster, MORE ACCURATE result. My
comment about 2 ply rollouts had more to do with how little extra time would
it take sometimes to have a 3 ply rollout instead of 2 ply? Even if 2 ply
were exactly the same, there is only one way to tell, to roll it out all
over again right?

your comment was I believe

> You make it sound like a shortcut is evil. However, you may
> wish to distinguish between the shortcuts that decrease the
> quality of the results and those that are simply faster ways to

> arrive at the same answer. .......

>I tested that 2-ply (and 3-ply) handles MOST (emphasis added by steve) of

the
>decisions correctly. In some situations, there is little reason
>to use 3-ply rather than 2-ply.

Most doesn't connote "exactly the same" does it?

Message has been deleted

Jørn Thyssen

unread,

Jul 16, 2003, 5:58:41 AM7/16/03

to

I can't see how my mail supports your arguements.

Anyway, I'll answer myself below.

> "This is most likely due to the wrong variance reduction, now fixed.

I'm refering to a bug in the implementation of variance reduction in
gnubg for certain rollouts (rolled out as initial position, i.e., no
doubles allowed). Luck analysis and "normal" rollouts was not affected,
though.

Please not I'm *NOT* refering to the variance reduction as a theory,
which could be the impression of the reader from the snippet above taken
out of context.

>
> I've seen examples where the variance reduction results in negative
> gammon rates or other artifacts. gnubg will correct this on the
> fly (e.g., applying a sanity check to the result of the rollout so far).
> However, the calculated mwc is not corrected, hence you may see examples
> where a manual calculation of the MWC differs from the MWC calculated by
> gnubg.
>
> For example,
>
> The first trial returns: 51% -1% 0% - 49% 0% 0%. The mwc calculated from
> this is: 25%. gnubg applies the sanity check to the gwc and arrives at
> 51% 0% 0% - 49% 0% 0%. A "manual" mwc calculation gives 25.5%.
>
> A negative number of gammons are possible due to imperfections in the
> luck analysis.
>
> Jørn"

Yes, of course there are imperfections in the luck analysis, but it
doesn't matter because the variance reduction is unbiased!

Maybe I should have expressed myself clearer: for a *single* trial in a
certain rollout there may be artifacts with negative gammon rates

I can add that I've never seen variance reduction result in negative
gammon rates *except* for a *single* trial in a rollout. It's very
unlikely that every trial in the rollouts gives negative gammon rates.

>
> So, I will stand by my original thought. If a human being programs
> something, there is a possible bias introduced. If you let the die roll,
> without the variance reduction, you MAY (I emphasize this to admit that
> there may be a "best" circumstance of no loss in accuracy) eventually get
> the "correct" answer. It would depend on the programs themselves of course!

Are you refering to bias or bugs?

But I don't see any relevance to the discussion, unless of course you
point is that we should not use a bot for the luck analysis or variance
reduction simply because there are likely to be bugs in their
implementation!?

Jørn

Albert Silver

unread,

Jul 16, 2003, 8:49:25 AM7/16/03

to

"Steve Bortnyck" <bort...@comcast.net> wrote in message news:<v72Ra.73218$N7.8481@sccrnsc03>...

You are taking the above quote out of context (or misunderstanding
it). It wasn't Variance Reduction per se that was faulty, rather
*only* for the initial opening roll where doubles aren't possible.
Other rollouts using Variance Reduction were not affected by this,
thus their results still stand.

This came up because Neil Kazaross and Ian Shaw are developing a new
Match Equity Table using GNU rollouts ... with Variance Reduction. No
one had been rolling out the first play of the game with its rollouts,
so the problem hadn't presented itself.

Albert Silver

Derek Ray

unread,

Jul 16, 2003, 8:51:03 AM7/16/03

to

In message <2831c30c.03071...@posting.google.com>,
mu...@compuplus.net (Murat Kalinyaprak) mumbled something about:

>Derek Ray wrote vdu2hvojh82n66512...@4ax.com
>
>> displaying an indicator to let you know the program is
>> busy conducting operations, instead of frozen. It is
>> considered to be a basic tenet of such.
>
>What's wrong with more generally used and less taxing
>methods like the "hour-glass icon"...??

The hourglass icon lets you know that Windows isn't frozen, not that the
program isn't frozen. It is independent of the program itself.

The form of the indicator is not critical, except to distinguish it from
the Windows hourglass icon... in other words, anything BUT.

>> An operation such as sliding a box back and forth in a
>> small area is a trivial use of resources, given today's
>> computing speeds AND given the amount of "work" already
>> done underneath the scenes for you with regard to
>> graphics display.
>> For such an accomplished programmer as yourself, you
>> really don't seem to be up to speed on even the basics.
>
>Some of the "basics" of programming that you can shove
>up your dumb arrogant ass are:

You forgot "right" between "dumb" and "arrogant".

>1) Waste is waste... The amount and/or the ratio doesn't
>matter. In the long, a drop here and a drop there will
>add up to a bucketfull...

It's not waste. Next!

>2) You use widely used and easily recognized interface
>elements, appropriate for the purpose... For example,
>progress bars are good to let people know how much of
>a process of known or estimated length is completed...
>Otherwise an operating system supplied and very widely
>used and recognized interface like the "hour-glass icon"
>may be best. Heck, even one of them "das blinken lichts
>wurst be besser"...

No, the hourglass icon would actually be WORST, for reasons stated
above. I would not personally have chosen the bouncing bar, myself.
However, it is not a critical design decision, and I am not the
programmer; the sliding box serves its purpose quite effectively.

>Anyway, why do I waste my time with this bullshit. Make
>monkeys dance on the screen to indicate that the gnudung
>hasn't frozen. What do I care...

Because you're too scared to read the source code, pussy.

That's why you have to care about bouncing boxes... you have to do
SOMETHING to draw attention away from the real issue, which is that you
are avoiding the one true way to prove your claims beyond a shadow of a
doubt... or disprove them completely, showing yourself up as a net.kook
for all eternity.

-- Derek

"Ignorance more frequently begets confidence than does knowledge."
- C. Darwin, 1871

Steve Bortnyck

unread,

Jul 16, 2003, 9:16:14 AM7/16/03

to

Jorn,

Sorry I took it out of context, but that was how I found it on Google,
incomplete.

My point really is just a basic idea that mistakes can and do get made. No
big deal, we all know that. But when a machine comes up with an "answer"
many people take it at face value, as absolutely correct. Even when they
are told there are mistakes being made by the machine, people still have a
tendency to forget, and take the result at face value. (How many Snowie
users "know" that evaluations are not exactly perfect, yet they don't roll
any/many positions out?)

Now, I believed (incorrectly perhaps) that variance reduction was put in to
make GNUBG and Snowie faster, not MORE accurate. Douglas Z told us that
variance reduction actually INCREASED accuracy, as well as speed.

I pulled your quote up (out of context, I admit) to show that variance
reduction does not ALWAYS increase accuracy. Now you may want to argue that
variance redux works, but was implemented incorrectly. I would respond that
PERHAPS there is a still undiscovered flaw in this version? (not saying
there is, just suggesting) I would also respond with a question.

"Would GNUBG play rolls differently without variance reduction in its
program?"

In other words, if the play is the same without it, then variance reduction
can not, by definition, increase accuracy! If the play would be different,
how can we be sure (and not that you should feel any need to prove it to us,
by the way) that the now changed move (post redux) IS more accurate?
Wouldn't you need to test the move out? And how would that happen with/ or
without variance redux?

This seems like a Catch-22 sort of issue. And in some ways the whole bot
versus bot duel seems the same. Snowie 3 says Snowie 2 makes mistakes. But
Snowie 4 says Snowie 3 makes mistakes. And on and on and on. Until we find
the perfect bot, and how would we even know?

"Jørn Thyssen" <j...@nospam.com> wrote in message
news:3F1521D1...@nospam.com...

Douglas Zare

unread,

Jul 16, 2003, 10:24:00 AM7/16/03

to

Steve Bortnyck wrote:

> "Would GNUBG play rolls differently without variance reduction in its
> program?"

Of course not.

> In other words, if the play is the same without it, then variance reduction
> can not, by definition, increase accuracy!

No, you are simply wrong. The rollout limit is the same.
You get closer to the rollout limit for any fixed length of
rollout (or duel). This means it is more accurate.

Or do you have access to an infinite amount of computing
power? If so, I'd like half, please.

> If the play would be different,
> how can we be sure (and not that you should feel any need to prove it to us,
> by the way) that the now changed move (post redux) IS more accurate?
> Wouldn't you need to test the move out? And how would that happen with/ or
> without variance redux?
>
> This seems like a Catch-22 sort of issue. And in some ways the whole bot
> versus bot duel seems the same. Snowie 3 says Snowie 2 makes mistakes. But
> Snowie 4 says Snowie 3 makes mistakes. And on and on and on. Until we find
> the perfect bot, and how would we even know?

Are you serious? You can use rollouts rather than relying
on the evaluations, as Michael Depreli has done. You can
let the bots play each other, or better yet, use variance
reduction on that.

Is your contention that any programming is fundamentally
flawed? If so, why would you comment on a thread about
testing two programs against each other, connected by a
third program (and use programs to post your comment)?

There is a time to ask, "How can we really know anything?"
That time is in a classroom full of pseudo-solipsists, not
now.

Douglas Zare

Michael Sullivan

unread,

Jul 16, 2003, 10:27:55 AM7/16/03

to

Jørn Thyssen <j...@nospam.com> wrote:

Okay, so that's an obvious counterexample to my first pass hypothesis.
But it seems to be "cheating" if you know what I mean. My fuzzy
intuition is that if the bot is "trying to analyze correctly" at a
reasonable standard of play and acheives this (luck adjusted result
matches error rate result), that perfect play has been found. So an
interesting question becomes "What kind of formal constraint is
necessary for it to be true that B => A?" Is there any that will work?
Can it be proved that there isn't any? Can it be proven that one exists
even we don't know it?

I'm quite new to backgammon game theory, so forgive me if this has been
fully hashed to all the experts' satisfaction. It seems like this line
of thought could conceivably lead to a test for perfect play, which
would be a pretty spectacular result. Has this already been shown
conclusively to be a dead end, or might it be something worth pondering
if/when I get up to speed on the state of the art?

Michael

Albert Silver

unread,

Jul 16, 2003, 11:02:18 AM7/16/03

to

mu...@compuplus.net (Murat Kalinyaprak) wrote in message news:<2831c30c.0307...@posting.google.com>...
> Albert Silver wrote f9846eb9.03071...@posting.google.com
>
> > Murat wrote 2831c30c.03071...@posting.google.com

>
> >> He didn't say "wrong", he said "useless"...!! Obviously you are
> >> not smart enough to differentiate the difference just because it
> >> wasn't presented to you in 3 decimal place accuracy... :)))
>
> > Actually, the only way the data would be useless is if they were
> > wrong, unless you don't understand it. If you did understand it
> > then you'd *know* that correct it could never be useless.
>

> This isn't true at all... What about relevancy...?? Irrelevant
> data may be correct but, err, irrelevant and thus useless...

I think my explanation was not properly understood. Let me put it
differently. Suppose you ask me about a car you're thinking about
buying to drive to work, etc. You tell me what it costs, how fast it
goes, and how many miles it has on the odometer. It sounds like a
great deal to you. I tell you there are other factors to consider as
well, such as miles per gallon. I note that the car in question can
only do 1 mile/gallon so it is probably a very bad deal. Just for the
sake of argument, let's presume you have no idea what that is nor how
that fits into the equation, so you claim it is irrelevant or
unimportant. If you understood this information you would know,
without a doubt, that it is entirely relevant. The only way anyone
would claim it was irrelevant was because they didn't understand it,
or planned to transform the car into garden furniture.

Albert Silver

>
> Yet, in the land of "zero-sum bullshit", I do actually agree
> with you and in the past I had made arguments like "if every
> player doubled after rolling a 2-1 in Thursday afternoons and
> after rolling 5-2 on Sunday mornings, the errors would eventually
> cancel out each other" and that all the dumb fuckers would live
> happily ever after in the "zero-sum heaven"... :))
>
> If you don't like my days of the week approach and would like
> to use your local weather *data* or whatever else in doubling,
> dropping, please be my guest. Don't worry, summer heats will
> even out winter colds, darkness of the night will even out the
> light of the day, etc... Just bullshit and be happy... :))
>
> MK

Jørn Thyssen

unread,

Jul 16, 2003, 11:07:48 AM7/16/03

to

Steve Bortnyck wrote:
> Jorn,
>
> Sorry I took it out of context, but that was how I found it on Google,
> incomplete.

This is because the original post by Ian Shaw appeared the previous month:

http://mail.gnu.org/archive/html/bug-gnubg/2003-06/msg00440.html

Apparently the archiving software for the mailing list does not
recognise threads spanning over more than one month.

>
> My point really is just a basic idea that mistakes can and do get made. No
> big deal, we all know that. But when a machine comes up with an "answer"
> many people take it at face value, as absolutely correct. Even when they
> are told there are mistakes being made by the machine, people still have a
> tendency to forget, and take the result at face value. (How many Snowie
> users "know" that evaluations are not exactly perfect, yet they don't roll
> any/many positions out?)
>
> Now, I believed (incorrectly perhaps) that variance reduction was put in to
> make GNUBG and Snowie faster, not MORE accurate. Douglas Z told us that
> variance reduction actually INCREASED accuracy, as well as speed.

Variance reduction does cost something, but the cost is much smaller
than the gain.

For example, (fictional numbers) a 1296 game rollout without variance
reduction costs "x". The same 1296 game rollouts costs, say, "10x" with
variance reduction, but it gives the same accuracy as a 100,000 game
rollout, that is, you've saved a factor of 8. It's now you to up to
choose whether to use the saving to perform a equally accurate rollout
in less time or perform a more accurate rollout in the same time (or a
combination of both).

>
> I pulled your quote up (out of context, I admit) to show that variance
> reduction does not ALWAYS increase accuracy.

I don't agree with that.

The only conclusion you can draw from my posting is that applying wrong
formulae for the variance reduction gives wrong results!

> Now you may want to argue that
> variance redux works, but was implemented incorrectly. I would respond that
> PERHAPS there is a still undiscovered flaw in this version? (not saying
> there is, just suggesting) I would also respond with a question.

I still don't see the relevance of this.

Do you or do you not believe that variance reduction improves rollouts?

If your only point is that we should not use variance reduction because
there may be bugs in the programs then there is really nothing to
discuss. Why use rollouts at all? If the programs are buggy then there
is no value of using them?

>
> "Would GNUBG play rolls differently without variance reduction in its
> program?"

Do you mean: if we roll out all moves of a match (or series of matches)
would the same rollout with and without variance reduction result in
gnubg changing it's mind for any move?

>
> In other words, if the play is the same without it, then variance reduction
> can not, by definition, increase accuracy!

Yes, it may.

Suppose you have two moves A and B. A rollout without variance reduction
produces

A: 0.1 +/- 0.05
B: 0.0 +/- 0.06

and a rollout with variance reduction produces

A: 0.1 +/- 0.01
B: 0.0 +/- 0.01

The second rollout is obviously more accurate due to the smaller
confidence intervals, but both rollouts indicate that move A is the
better one. However, for the rollout without variance reduction the
confidence intervals overlap and you can really not say whether move A
is better than move B. The rollouts just happens (by chance) to select
move A over move B.

> If the play would be different,
> how can we be sure (and not that you should feel any need to prove it to us,
> by the way) that the now changed move (post redux) IS more accurate?

Because it has tighter confidence interval!

Which of the two numbers 0.1 +/- 0.05 and 0.1 +/- 0.01 is the most
accurate?

> Wouldn't you need to test the move out? And how would that happen with/ or
> without variance redux?
>
> This seems like a Catch-22 sort of issue. And in some ways the whole bot
> versus bot duel seems the same. Snowie 3 says Snowie 2 makes mistakes. But
> Snowie 4 says Snowie 3 makes mistakes. And on and on and on. Until we find
> the perfect bot, and how would we even know?

No, you've got that totally wrong! We are *NOT* analysing the moves --
we're applying variance reduction!!!!

I don't care if Snowie3 says that Snowie2 is making mistakes because I
know that this analysis is biased, but I would be happy to use Snowie2
for the variance reduction because I know that Snowie2 will *NOT*
produce luck adjusted results biased towards itself.

You can try using a bot that produces random equities! In this case
you'll not see a reduction of the variance, though, but it won't destroy
the result either.

Jørn

Jørn Thyssen

unread,

Jul 16, 2003, 11:29:10 AM7/16/03

to

Michael Sullivan wrote:
>
>
> Okay, so that's an obvious counterexample to my first pass hypothesis.
> But it seems to be "cheating" if you know what I mean. My fuzzy
> intuition is that if the bot is "trying to analyze correctly" at a
> reasonable standard of play and acheives this (luck adjusted result
> matches error rate result), that perfect play has been found.

Yes, it seesm intuitively correct.

> So an
> interesting question becomes "What kind of formal constraint is
> necessary for it to be true that B => A?" Is there any that will work?
> Can it be proved that there isn't any? Can it be proven that one exists
> even we don't know it?

I don't think so, although I'm not sure.

We know that the luck adjusted result must be equal to the error rate
result for a perfect bot but I don't think we can say anything about the
difference.

For example, I imagine that a bot producing random equities will give
error rate results equal to luck adjusted results. If this is true, this
may cause problems since many bots seems to produce random equities for
certain positions that is just doesn't understand.

>
> I'm quite new to backgammon game theory, so forgive me if this has been
> fully hashed to all the experts' satisfaction. It seems like this line
> of thought could conceivably lead to a test for perfect play, which
> would be a pretty spectacular result. Has this already been shown
> conclusively to be a dead end, or might it be something worth pondering
> if/when I get up to speed on the state of the art?

I haven't seen this discussed before, and yes, I agree that it would be
great to have a test to determine a bot's strength without access to
another bot of known strengh (e.g., a perfect bot).

Jørn