Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

BGBlitz 1.7 vs. GNUbg 0.14: GNUBG analyses

2 views
Skip to first unread message

Zorba

unread,
Dec 10, 2003, 6:07:13 PM12/10/03
to
I analysed matches 1,2,3 and 5 (match 4 seems incomplete) using GNUbg
0.14, Jacobs MET, 2-ply 100%. It seems to me like BGBlitz was somehow
blundering with incorrect passes regulary; something must be wrong
there (some are 0.7 takes, GNUBG thinks probably correctly). The
checker play verdict is much better; gnubg disagrees quite strongly on
a few decisions only.

Roughly speaking, GNUBG rates BGBlitz around 40 rating points lower
than itself at my settings.

Of course GNUBG's 2-ply verdict isn't perfect...some rollouts in
progress.

But I think there's no doubt BGBlitz made a few really huge errors
with wrong passes.

GNU_Backgammon BGBlitz
Chequer Play Statistics:
Total moves 140 143
Unforced moves 130 124
Moves marked doubtful 0 1
Error rate (total) +0.000 -0.231 (
-1.228%)
Error rate (per move) -0.0 -1.9 (
-0.010%)
Chequerplay rating Supernatural
Supernatural

Cube Statistics:
Total cube decisions 143 105
Close or actual cube decisions 49 35
Missed doubles (below CP) 2 (-0.022 ( -0.118 1 (-0.079
( -0.364
Wrong doubles (below DP) 2 (-0.035 ( -0.179 1 (-0.051
( -0.229
Wrong passes 0 3 (-1.203
( -6.286

Error rate (total) -0.057 ( -0.297%) -1.333 (
-6.879%)
Error rate (per cube decision) -1.2 ( -0.006%) -38.1 (
-0.197%)
Cube decision rating Supernatural Awful!

Overall Statistics:
Error rate (total) -0.057 ( -0.297%) -1.564 (
-8.107%)
Error rate (per decision) -0.3 ( -0.002%) -9.8 (
-0.051%)
Equiv. Snowie error rate -0.2 -5.5
Overall rating Supernatural Advanced
Actual result +50.00% -50.00%
Luck adjusted result +8.09% -8.09%
Luck based FIBS rating diff. +78.60
Error based abs. FIBS rating 2049.0 1998.6
Chequerplay errors rating loss 0.0 20.0
Cube errors rating loss 1.0 31.4
===============================================================================

GNU_Backgammon BGBlitz
Chequer Play Statistics:
Total moves 207 208
Unforced moves 167 173
Moves marked doubtful 0 2
Moves marked bad 0 1
Error rate (total) +0.000 -0.484 (
-4.270%)
Error rate (per move) -0.0 -2.8 (
-0.025%)
Chequerplay rating Supernatural World class

Cube Statistics:
Total cube decisions 136 86
Close or actual cube decisions 41 32
Missed doubles (below CP) 1 (-0.016 ( -0.120 2 (-0.157
( -0.863
Wrong doubles (below DP) 0 2 (-0.185
( -1.560
Wrong passes 1 (-0.062 ( -0.963 2 (-0.297
( -2.564
Error rate (total) -0.077 ( -1.083%) -0.639 (
-4.987%)
Error rate (per cube decision) -1.9 ( -0.026%) -20.0 (
-0.156%)
Cube decision rating Supernatural Casual
player

Overall Statistics:
Error rate (total) -0.077 ( -1.083%) -1.122 (
-9.257%)
Error rate (per decision) -0.4 ( -0.005%) -5.5 (
-0.045%)
Equiv. Snowie error rate -0.2 -2.7
Overall rating Supernatural Expert
Actual result +50.00% -50.00%
Luck adjusted result -4.44% +4.44%
Luck based FIBS rating diff. -42.91
Error based abs. FIBS rating 2048.4 2003.5
Chequerplay errors rating loss 0.0 30.1
Cube errors rating loss 1.6 16.4
================================================================================
GNU_Backgammon BGBlitz
Chequer Play Statistics:
Total moves 179 178
Unforced moves 151 154
Moves marked doubtful 0 3
Moves marked bad 0 1
Error rate (total) -0.001 ( -0.006%) -0.488 (
-3.330%)
Error rate (per move) -0.0 ( -0.000%) -3.2 (
-0.022%)
Chequerplay rating Supernatural World class

Cube Statistics:
Total cube decisions 175 125
Close or actual cube decisions 68 34
Missed doubles (below CP) 0 2 (-0.045
( -0.262
Wrong doubles (below DP) 0 2 (-0.211
( -1.123
Wrong passes 0 3 (-0.356
( -1.783
Error rate (total) +0.000 -0.612 (
-3.167%)
Error rate (per cube decision) -0.0 -18.0 (
-0.093%)
Cube decision rating Supernatural
Intermediate

Overall Statistics:
Error rate (total) -0.001 ( -0.006%) -1.100 (
-6.498%)
Error rate (per decision) -0.0 ( -0.000%) -5.8 (
-0.035%)
Equiv. Snowie error rate -0.0 -3.1
Overall rating Supernatural Expert
Actual result -50.00% +50.00%
Luck adjusted result +0.97% -0.97%
Luck based FIBS rating diff. +9.34
Error based abs. FIBS rating 2049.9 2001.1
Chequerplay errors rating loss 0.1 34.1
Cube errors rating loss 0.0 14.8
================================================================================
NO MATCH 4
================================================================================

GNU_Backgammon BGBlitz
Chequer Play Statistics:
Total moves 222 225
Unforced moves 190 188
Moves marked doubtful 0 4
Moves marked bad 0 1
Error rate (total) +0.000 -0.536 (
-3.778%)
Error rate (per move) -0.0 -2.8 (
-0.020%)
Chequerplay rating Supernatural World class

Cube Statistics:
Total cube decisions 147 163
Close or actual cube decisions 29 58
Missed doubles (below CP) 0 1 (-0.004
( -0.015
Wrong doubles (below DP) 2 (-0.007 ( -0.033 1 (-0.029
( -0.145
Wrong doubles (above TG) 0 1 (-0.079
( -0.435
Wrong passes 1 (-0.060 ( -0.272 1 (-0.145
( -0.667
Error rate (total) -0.068 ( -0.305%) -0.257 (
-1.262%)
Error rate (per cube decision) -2.3 ( -0.011%) -4.4 (
-0.022%)
Cube decision rating World class World class

Overall Statistics:
Error rate (total) -0.068 ( -0.305%) -0.792 (
-5.040%)
Error rate (per decision) -0.3 ( -0.001%) -3.2 (
-0.020%)
Equiv. Snowie error rate -0.2 -1.8
Overall rating Supernatural World class
Actual result -50.00% +50.00%
Luck adjusted result +7.67% -7.67%
Luck based FIBS rating diff. +74.51
Error based abs. FIBS rating 2048.1 2015.7
Chequerplay errors rating loss 0.0 30.7
Cube errors rating loss 1.9 3.6

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Unfortunately I don't know how to add up these statistics from four
matches to get an "overall" statistics output (would be a nice feature
to be able to let GNUbg add several matches into some "match session",
maybe?)

Achim Mueller

unread,
Dec 10, 2003, 10:41:29 PM12/10/03
to
Zorba wrote:

> I analysed matches 1,2,3 and 5 (match 4 seems incomplete) using GNUbg

Match 4 is indeed incomplete. Here I blundered missing to save the last
game, when we continued the match on the next morning.

Ciao

Achim

Frank Berger

unread,
Dec 11, 2003, 3:18:28 AM12/11/03
to
zo...@chello.nl (Zorba) wrote in message news:<8e504b59.03121...@posting.google.com>...

> I analysed matches 1,2,3 and 5 (match 4 seems incomplete) using GNUbg
> 0.14, Jacobs MET, 2-ply 100%. It seems to me like BGBlitz was somehow
> blundering with incorrect passes regulary;
Yes indeed.

>something must be wrong
> there (some are 0.7 takes, GNUBG thinks probably correctly). The
> checker play verdict is much better; gnubg disagrees quite strongly on
> a few decisions only.

In matchplay it uses currently no variance nor cube vigorish.
Obviously this is the biggest flaw right now. In preparation to Graz I
very much concentrated on improving the checker play, and I guess I
was pretty successfull with that.
I always have known there was a problem, but delayed it. Now it has
the priority No 1.



> Roughly speaking, GNUBG rates BGBlitz around 40 rating points lower
> than itself at my settings.

Keep in mind, that GNU-BG isn't very well suited to find it's own
checker play errors. E.g. there this is an serious chekerplay error
(according to Snowie and BGBlitz

+12-11-10--9--8--7-------6--5--4--3--2--1-+ O: GNU_Backgammon 137
| X X O | | O X O |
| X O | | O O |
| X O | | O |
| X O | | O |
| X | | O |
^| |BAR| |
| | | |
| | | X |
| | | X |
| O X | | X X |
| O X | | X X O O |
+13-14-15-16-17-18------19-20-21-22-23-24-+ X: BGBlitz 147

GNU_Backgammon - 0 BGBlitz - 5 in a 13 point match. Position-ID:
g8+DASiMZ/AJBA
GNU_Backgammon to move 1-6

where GNU has choosen 8-2,6-5 It was in the 4th game of the 2nd match
where GNU attested itself no checker error ;)


> Of course GNUBG's 2-ply verdict isn't perfect...some rollouts in
> progress.

Especially to ask a bot about it's own errors. Even if you would use
one additional ply it's probably biased. Snowie as a "neutral" bot is
more appropriate to judge in this case... (Naturally only because its
different, not because I would believe it is better (what I actually
don't do))



> But I think there's no doubt BGBlitz made a few really huge errors
> with wrong passes.

Yes indeed

> Unfortunately I don't know how to add up these statistics from four
> matches to get an "overall" statistics output (would be a nice feature
> to be able to let GNUbg add several matches into some "match session",
> maybe?)

I would suggest simply taking the average, weighting the several
numbers with the number of relevant decisions (i.e. moves and cube
decisions)
(rating_1 * number_unforced_moves_1 + ..... rating_n *
number_unforced_moves_n) / (number_unforced_moves_1 + ... +
number_unforced_moves_n)

BTW even if the last game of the 4th match is lost :( I see no reason
to don't use the numbers.

Zorba

unread,
Dec 11, 2003, 10:52:05 PM12/11/03
to
Achim Mueller <in...@acepoint.de> wrote in message news:<br8p3g$7ij$02$2...@news.t-online.com>...

Too bad, but at least it's just 1 game missing here so overall
statistics still figure to be meaningful.

What settings did GNUBG use? 2-ply 100% I assume, but what move
filters and which MET did you use?

--
_
/
_ orba

Zorba

unread,
Dec 11, 2003, 11:10:25 PM12/11/03
to
fr...@bgblitz.com (Frank Berger) wrote in message news:<aab8a7e5.0312...@posting.google.com>...

> zo...@chello.nl (Zorba) wrote in message news:<8e504b59.03121...@posting.google.com>...
> > Roughly speaking, GNUBG rates BGBlitz around 40 rating points lower
> > than itself at my settings.

> Keep in mind, that GNU-BG isn't very well suited to find it's own
> checker play errors.

Very true. In fact using the same settings for analysis and for play
it would never find any error at all!

Any errors on GNUBG's side my analysis found are probbably mostly due
to me using a different MET (Jacobs) and probably some slightly
different search filters.

So GNUBG's verdict about itself here is really meaningless.

Letting GNUBG judge BGBlitz' play is meaningful I think, but GNUBG
2-ply is no God. I think however that a full rollout with GNUBG would
be able to judge both BGBlitz' and GNUBG's skills quite good. Maybe
there'd still be a bias towards GNUBG but in general I don't think it
would be very significant.

> [position] GNU_Backgammon to move 1-6


>
> where GNU has choosen 8-2,6-5 It was in the 4th game of the 2nd match
> where GNU attested itself no checker error ;)

Will be interesting to see how a GNUBG rollout judges GNUBG's play on
evaluation here!

> > Of course GNUBG's 2-ply verdict isn't perfect...some rollouts in
> > progress.
> Especially to ask a bot about it's own errors. Even if you would use
> one additional ply it's probably biased. Snowie as a "neutral" bot is
> more appropriate to judge in this case... (Naturally only because its
> different, not because I would believe it is better (what I actually
> don't do))

Absolutely. Maybe I shouldn't have included the statistics for GNUBG
as juding evaluation by evaluation is pretty much meaningless. I was
interested however in any errors my GNUBG analysis would show because
of me using a different MET (and probably slightly different filters
too). As could be expected, this only attributes very few and very
tiny errors to GNUBG overall.

Using SW to judge both BGBlitz and GNUBG is better for determining
relative skill, however I think what we really need is rollouts. SW
could easily make a similar error on evaluation as either bot,
incorrectly not calling something an error, etc.

I don't have SW so I can only do rollouts with GNUBG. However I think
GNUBG as a rollout engine is strong enough not to show much bias to
GNUBG. I'm not completely sure about that though. There's probably
some exceptions where even a good settings GNUBG rollout would favor
GNUBG's eval too much or punish BGBlitz' choice too much.

> BTW even if the last game of the 4th match is lost :( I see no reason
> to don't use the numbers.

I left it out initially since I don't trust GNUBG's statistics on
incomplete matches. The luck adjusted result f.i. seems to be wrong
(often?).

Achim Mueller

unread,
Dec 12, 2003, 3:57:14 AM12/12/03
to
Zorba wrote:

> Achim Mueller <in...@acepoint.de> wrote in message

> What settings did GNUBG use? 2-ply 100% I assume, but what move


> filters and which MET did you use?

Move filter "large", Woolseys met.

At the moment I'm doing a large simualtion on different mets. I let gnubg
play against itself 5p matches via socket. 3000 between woolsey and snowie
are nearly finished with no significant difference.

Ciao

Achim

jthyssen

unread,
Dec 12, 2003, 6:31:44 AM12/12/03
to
zo...@chello.nl (Zorba) wrote in message news:<8e504b59.03121...@posting.google.com>...
> fr...@bgblitz.com (Frank Berger) wrote in message news:<aab8a7e5.0312...@posting.google.com>...
> > zo...@chello.nl (Zorba) wrote in message news:<8e504b59.03121...@posting.google.com>...
> > > Roughly speaking, GNUBG rates BGBlitz around 40 rating points lower
> > > than itself at my settings.
>
> > Keep in mind, that GNU-BG isn't very well suited to find it's own
> > checker play errors.
>
> Very true. In fact using the same settings for analysis and for play
> it would never find any error at all!
>
> Any errors on GNUBG's side my analysis found are probbably mostly due
> to me using a different MET (Jacobs) and probably some slightly
> different search filters.
>
> So GNUBG's verdict about itself here is really meaningless.
>
> Letting GNUBG judge BGBlitz' play is meaningful I think, but GNUBG
> 2-ply is no God. I think however that a full rollout with GNUBG would
> be able to judge both BGBlitz' and GNUBG's skills quite good. Maybe
> there'd still be a bias towards GNUBG but in general I don't think it
> would be very significant.

Since both programs are really strong, you need to invoke the luck
analysis.

I did the luck adjustment (reported in gnubg's favour):

Unadjusted : 40% +/- 42%
Luck adj 0-ply : 51% +/- 6% (Equiv. number of matches: ~250)
Luck adj 1-ply : 55% +/- 12% (Equiv. number of matches: ~60)

Weird that 1-ply is worse than 0-ply.

[snip]


> Using SW to judge both BGBlitz and GNUBG is better for determining
> relative skill,

I don't think so. We know from Michael Depreli's series that gnubg and
Snowie4 are equally strong, so it's rather meaningless to have snowie
evaluate gnubg. The error rate produced by snowie is likely to be
wrong.

In the absence of a perfect bot we need to use luck analysis.

Jørn

MuffinHead

unread,
Dec 12, 2003, 1:07:15 PM12/12/03
to
In article <36775ed0.0312...@posting.google.com>,

j...@chem.sdu.dk (jthyssen) wrote:
> I did the luck adjustment (reported in gnubg's favour):
>
> Unadjusted : 40% +/- 42%
> Luck adj 0-ply : 51% +/- 6% (Equiv. number of matches: ~250)
> Luck adj 1-ply : 55% +/- 12% (Equiv. number of matches: ~60)
>
> Weird that 1-ply is worse than 0-ply.

Actually, that's not that wierd. The Gnubg evaluation function may be
extremely consistent, but it does exhibit strong oscillations between
ply. This is called the odd-even effect; odd ply searches are optimistic
because it's they end with the player-on-turn, and even ply searches are
pessimistic because the opponent is moving at the leaf.

Of course in Gnubg's case, this should be the other way around, since
you guys count ply differently than the rest of the universe. 8)

One of my supervisors asked me to run some experiments to see how strong
this effect is between ply. In my small runs I've seen that d=3 versus
d=1 searches (that's gnu1-ply vs gnu0-ply) have quite a large absolute
average difference and std. deviation, and d=5 versus d=1 has a
relatively small absolute average difference and std. deviation. The
experiments are still running... I thought they would be done by now...
so nothing to report yet.

Zorba

unread,
Dec 12, 2003, 4:13:18 PM12/12/03
to
j...@chem.sdu.dk (jthyssen) wrote in message news:<36775ed0.0312...@posting.google.com>...

> zo...@chello.nl (Zorba) wrote in message news:<8e504b59.03121...@posting.google.com>...

> Since both programs are really strong, you need to invoke the luck


> analysis.
>
> I did the luck adjustment (reported in gnubg's favour):
>
> Unadjusted : 40% +/- 42%
> Luck adj 0-ply : 51% +/- 6% (Equiv. number of matches: ~250)
> Luck adj 1-ply : 55% +/- 12% (Equiv. number of matches: ~60)
>
> Weird that 1-ply is worse than 0-ply.

I don't see any real value in these figures. Even apart from the big
difference between 0- and 1-ply, given the confidence interval it's
basically meaningless isn't it? It doesn't really give any
significance as to which bot was better...

I think rollouts of errors are the only way to really get a grip on
how both bots compare. Isn't luck analysis basically just a form of
evaluation too?

> [snip]
> > Using SW to judge both BGBlitz and GNUBG is better for determining
> > relative skill,
>
> I don't think so. We know from Michael Depreli's series that gnubg and
> Snowie4 are equally strong, so it's rather meaningless to have snowie
> evaluate gnubg. The error rate produced by snowie is likely to be
> wrong.

True, but for determining the RELATIVE skills of both programs I'd
think it is clearly more suited than just using GNUBG, also if you use
its luck analysis.

> In the absence of a perfect bot we need to use luck analysis.

I don't understand this. Isn't the luck analysis dependent on which
bot you use?

Zorba

unread,
Dec 12, 2003, 5:07:41 PM12/12/03
to
Achim Mueller <in...@acepoint.de> wrote in message news:<brc00v$j89$01$1...@news.t-online.com>...

Interesting! I think Joseph Heled did something similar once and found
some MET to win ~50.03% (?) against some other MET. For overall match
winning chances, there's hardly anything to be gained by using a
better ("perfect" even) MET than f.i. the Woolsey MET (which, at least
for GNUBG 0-ply playing itself, seems to be worse than the SW 2.1,
Jacobs or mec26 MET. Zadeh is probably worst).

So a better MET seems to increase match winning chances by a very tiny
amount only. The number of decisions in f.i. a 5pt match where using a
different MET will lead to a different decision is very low, and even
when it happens the supposedly inferior MET would often give up only a
tiny amount of equity.

For individual positions, mainly cube decisions, it can occasionaly
make a significant difference though which MET you use.

I'm almost done (recursively) building a custom "GNUBG" MET upto 7
points, that is solely based on GNUBG full rollouts of the opening
position (0-ply play, 2-ply 100%/25% cube). It's closest to the Jacobs
MET it seems, and pretty close to SW 2.1 and mec26, but with a few
small interesting differences. It's in very close agreement with the
Kazaross-Shaw MET (which is finished upto 4-away 4-away now and uses a
similar but more rigorous method).

I'll post it here when it's done (only 6-away 7-away left to be rolled
out).

Louis Nardy Pillards

unread,
Dec 13, 2003, 9:04:23 AM12/13/03
to
Frank Berger wrote:
> > Roughly speaking, GNUBG rates BGBlitz around 40 rating points lower
> > than itself at my settings.
> Keep in mind, that GNU-BG isn't very well suited to find it's own
> checker play errors. E.g. there this is an serious chekerplay error
> (according to Snowie and BGBlitz
>
> +12-11-10--9--8--7-------6--5--4--3--2--1-+ O: GNU_Backgammon 137
> | X X O | | O X O |
> | X O | | O O |
> | X O | | O |
> | X O | | O |
> | X | | O |
> ^| |BAR| |
> | | | |
> | | | X |
> | | | X |
> | O X | | X X |
> | O X | | X X O O |
> +13-14-15-16-17-18------19-20-21-22-23-24-+ X: BGBlitz 147
>
> GNU_Backgammon - 0 BGBlitz - 5 in a 13 point match. Position-ID:
> g8+DASiMZ/AJBA
> GNU_Backgammon to move 1-6
>
> where GNU has choosen 8-2,6-5 It was in the 4th game of the 2nd match
> where GNU attested itself no checker error ;)

Can you post the 'best move', and equity loss according Snowie and
BgBlitz?
Rollouts of gnubg show 8/2 6/5 within -0.015

ty


--
Louis Nardy Pillards

Jørn Thyssen

unread,
Dec 13, 2003, 9:12:14 AM12/13/03
to
Zorba wrote:
> j...@chem.sdu.dk (jthyssen) wrote in message news:<36775ed0.0312...@posting.google.com>...
>
>>zo...@chello.nl (Zorba) wrote in message news:<8e504b59.03121...@posting.google.com>...
>
>
>>Since both programs are really strong, you need to invoke the luck
>>analysis.
>>
>>I did the luck adjustment (reported in gnubg's favour):
>>
>>Unadjusted : 40% +/- 42%
>>Luck adj 0-ply : 51% +/- 6% (Equiv. number of matches: ~250)
>>Luck adj 1-ply : 55% +/- 12% (Equiv. number of matches: ~60)
>>
>>Weird that 1-ply is worse than 0-ply.
>
>
> I don't see any real value in these figures. Even apart from the big
> difference between 0- and 1-ply, given the confidence interval it's
> basically meaningless isn't it? It doesn't really give any
> significance as to which bot was better...

No, and I guess this is exactly the reason behind this very looooong
thread :-)

Anyway, you'll have precisely that same problem if you do an
error-analysis with a perfect bot (or in the absence of such a bot: long
rollouts). You cannot conclude anything from 5 matches. The confidence
intervals will be large due to the low number of matches and the
interval will almost certainly include the 50%-mark.

Suppose the skill difference between gnubg and bgblitz is around 1% in
either bot's favour. To get a meaningful result you need approximately
10,000 matches. Variance reduction using luck analysis may reduce this
requirement to 200-1,000 matches.

>>In the absence of a perfect bot we need to use luck analysis.
>
>
> I don't understand this. Isn't the luck analysis dependent on which
> bot you use?

Yes, it depends on the strength of the bot, but the beauty about the
luck analysis is that it can be shown mathematically that it's not
biased, so there is no problem using gnubg to analyse luck in a match
where gnubg is one of the players.

You can do the luck analysis with Snowie if you want to. In fact, it
would be interesting to know Snowie's opinion. Be careful with game 4
since it's incomplete.

Jørn


Zorba

unread,
Dec 14, 2003, 3:13:58 AM12/14/03
to
"Louis Nardy Pillards" <nardy dot pillards at skynet dot be> wrote in message news:<3fdb1c66$0$295$ba62...@reader5.news.skynet.be>...
> Frank Berger wrote:

> > Keep in mind, that GNU-BG isn't very well suited to find it's own
> > checker play errors. E.g. there this is an serious chekerplay error
> > (according to Snowie and BGBlitz
> >

[GNUBG position:]

GNU Backgammon Position ID: jGfwCQSDz4MBKA
Match ID : MAGnAQAAKAAA
+-1--2--3--4--5--6-------7--8--9-10-11-12-+ O: GNU_Backgammon
| O X O | | O X X | 0 points
| O O | | O X | Rolled 61


| O | | O X |
| O | | O X |

| O | | X |
| |BAR| |^ 13 point match (Cube:
1)
| | | |
| X | | |
| X | | |
| X X | | X O |
| O O X X | | X O | 5 points
+24-23-22-21-20-19------18-17-16-15-14-13-+ X: BGBlitz


> > GNU_Backgammon to move 1-6
> >
> > where GNU has choosen 8-2,6-5 It was in the 4th game of the 2nd match
> > where GNU attested itself no checker error ;)
>
> Can you post the 'best move', and equity loss according Snowie and
> BgBlitz?
> Rollouts of gnubg show 8/2 6/5 within -0.015

I just finished mine. Too close to call with these standard errors,
but very unlikely that 8/2 6/5 is more than a 0.02 EMG error.

It looks like some reduced cube efficiency for opponent after GNUbg's
bold move might be a factor here.

1. Rollout 8/2 6/5 Eq.: -0.1499
43.80% 15.67% 0.42% - 56.20% 16.25% 1.08% CL -0.1218
CF -0.1499
[ 0.12% 0.10% 0.04% - 0.12% 0.12% 0.07% CL 0.0036
CF 0.0084]
Full cubeful rollout with var.redn.
2592 games, Mersenne Twister dice gen. with seed 1190206464
and quasi-random dice
Play: 0-ply cubeful [expert]
Cube: 2-ply cubeful 25% speed
2. Rollout 24/23 8/2 Eq.: -0.1537 (
-0.0037)
43.92% 13.82% 0.28% - 56.08% 13.22% 0.64% CL -0.1074
CF -0.1537
[ 0.12% 0.10% 0.03% - 0.12% 0.11% 0.04% CL 0.0033
CF 0.0078]
Full cubeful rollout with var.redn.
2592 games, Mersenne Twister dice gen. with seed 1190206464
and quasi-random dice
Play: 0-ply cubeful [expert]
Cube: 2-ply cubeful 25% speed

Zorba

unread,
Dec 14, 2003, 4:27:38 AM12/14/03
to
Jørn Thyssen <j...@nospam.com> wrote in message news:<3FDB1E3E...@nospam.com>...

Agreed, but combining strict statistical evidence with some human bg
know-how, I think we might still be able to conclude something from
f.i. an error analysis done by a perfect bot, and hopefully also from
rollouts.

Which bot makes more errors? Which big errors are made? In which
positions do they happen? Etc.

> >>In the absence of a perfect bot we need to use luck analysis.
> >
> >
> > I don't understand this. Isn't the luck analysis dependent on which
> > bot you use?
>
> Yes, it depends on the strength of the bot, but the beauty about the
> luck analysis is that it can be shown mathematically that it's not
> biased, so there is no problem using gnubg to analyse luck in a match
> where gnubg is one of the players.

Hmm. It depends on the strength of the bot (=GNUBG). Even if it's not
biased, isn't that pretty inaccurate then to judge both GNUBG and
BGBlitz?

Aren't rollouts with variance reduction much more accurate?

> You can do the luck analysis with Snowie if you want to. In fact, it
> would be interesting to know Snowie's opinion. Be careful with game 4
> since it's incomplete.

Any volunteers? I'll stick with my error based appraoch for now, I
don't fully understand this luck analysis. It seems to me that since
it uses evaluations it has to be quite unreliable, even if it's not
biased?

Jørn Thyssen

unread,
Dec 14, 2003, 6:04:08 AM12/14/03
to
Zorba wrote:

[snip]


> Agreed, but combining strict statistical evidence with some human bg
> know-how, I think we might still be able to conclude something from
> f.i. an error analysis done by a perfect bot, and hopefully also from
> rollouts.
>
> Which bot makes more errors? Which big errors are made? In which
> positions do they happen? Etc.

That's true.

>
>
>>>>In the absence of a perfect bot we need to use luck analysis.
>>>
>>>
>>>I don't understand this. Isn't the luck analysis dependent on which
>>>bot you use?
>>
>>Yes, it depends on the strength of the bot, but the beauty about the
>>luck analysis is that it can be shown mathematically that it's not
>>biased, so there is no problem using gnubg to analyse luck in a match
>>where gnubg is one of the players.
>
>
> Hmm. It depends on the strength of the bot (=GNUBG). Even if it's not
> biased, isn't that pretty inaccurate then to judge both GNUBG and
> BGBlitz?

No, the strength of the bot is reflected in the reduction of the
variance. We could use Snowie1 or gnubg 0.01 for the luck analysis if we
wanted. We would most likely not see the same reduction of the variance,
but the result would still be unbiased and valid. In the limit where a
random neural net is used, there will be no reduction at all, so the
luck adj result will be equal to the unadjusted result (I've actually
tried this!).

> Aren't rollouts with variance reduction much more accurate?

Possibly, but they are generally biased towards the bot. If you want we
can use rollouts to estimate the luck!


>>You can do the luck analysis with Snowie if you want to. In fact, it
>>would be interesting to know Snowie's opinion. Be careful with game 4
>>since it's incomplete.
>
>
> Any volunteers? I'll stick with my error based appraoch for now, I
> don't fully understand this luck analysis.

It's quite simple (http://math.columbia.edu/~zare/vrskill.html):

final - initial = net luck + net skill,

or

net skill = (final - initial) - net luck.

Final - initial is typically 50% MWC (except for match 4, since it's
incomplete, hence my warning), so

net skill = 50% - net luck

Both snowie and gnubg can calculate the unnormalised luck. In the first
match gnubg estimates the luck to 40.12% MWC, so the net skill is 9.88% MWC.

Jørn


David Montgomery

unread,
Dec 14, 2003, 1:05:25 PM12/14/03
to
On Sun, 14 Dec 2003 12:04:08 +0100, Jørn Thyssen <j...@nospam.com>
wrote:

>No, the strength of the bot is reflected in the reduction of the
>variance. We could use Snowie1 or gnubg 0.01 for the luck analysis if we
>wanted. We would most likely not see the same reduction of the variance,
>but the result would still be unbiased and valid. In the limit where a
>random neural net is used, there will be no reduction at all, so the
>luck adj result will be equal to the unadjusted result (I've actually
>tried this!).

A vr evaluator that I've wanted to experiment is this:

equity-delta = K*(pips-rolled - average-pips-rolled)

where K might be set to make a pip worth something like
1.5-2% gwc.

This would be extremely easy to calculate, and so could
be used in 0-ply rollouts with essentially no cost.

Slightly better would be:

equity-delta = K*(pips-moved - average-pips-moved)

To do this generally would require generating a list
of moves for each roll, but that might still be
negligible compared to the 0-ply evaluation cost.
OTOH, it might not.

But what certainly could be done very cheaply is to
adjust for dances, since that only requires a scan
of the bar and the opponents home board. And that
would probably get 95% of the benefit of using
pips-moved rather than pips-rolled.

Since this can be done so cheaply, I think 0-ply
rollouts should do this by default. Assuming that
it's effective, of course, but I can't see why
it wouldn't be.

If anyone tries this idea, I would love to hear
how effective it is (say, what kind of effective
game multipler one gets for untruncated rollouts
of opening positions, and how that compares with
1-ply lookahead vr).

David Montgomery

Frank Berger

unread,
Dec 14, 2003, 4:24:58 PM12/14/03
to
"Louis Nardy Pillards" <nardy dot pillards at skynet dot be> wrote in message

Hi Nardy,

> Can you post the 'best move', and equity loss according Snowie and
> BgBlitz?
> Rollouts of gnubg show 8/2 6/5 within -0.015

The best move should have been 24-23,8-2..... Should have been? Yes, a
rollout with variance reduction led to the result that both moves are
pretty much equal, as Zorba found out too. The next presumed error was
after the rollout a draw too. I will collect the assumed (by BGBlitz
and Snowie) errors and we'll see whether a rollout shows something.

I will probably need some time...

ciao
Frank

Frank Berger

unread,
Dec 14, 2003, 4:31:41 PM12/14/03
to
j...@chem.sdu.dk (jthyssen) wrote in message

Hi Jørn,

> [snip]
> > Using SW to judge both BGBlitz and GNUBG is better for determining
> > relative skill,
>
> I don't think so. We know from Michael Depreli's series that gnubg and
> Snowie4 are equally strong, so it's rather meaningless to have snowie
> evaluate gnubg. The error rate produced by snowie is likely to be
> wrong.

We'll I think you are right, that the results judgeing BGBlitz moves
with GNU-BG is meaningful, although a little bias maybe if a bot
handels a situation different, but this is surely not the rule.

The only difficult thing is to use a bot to find it's own errors.
Therefore having not yet ;) an analysis in BGBlitz Snowie is a good
choice. O.K. if one spends a lot of time, one could overcome this too,
but under practical time conditions it's difficult

>
> In the absence of a perfect bot we need to use luck analysis.

A perfect bot... wouldn't it be booring if we were already there ;)
(At least for you and me and the rest of the GNU-Team and somne others
...)

jthyssen

unread,
Dec 15, 2003, 2:58:08 AM12/15/03
to
fr...@bgblitz.com (Frank Berger) wrote in message news:<aab8a7e5.0312...@posting.google.com>...
> j...@chem.sdu.dk (jthyssen) wrote in message
>
> Hi Jørn,
>
> > [snip]
> > > Using SW to judge both BGBlitz and GNUBG is better for determining
> > > relative skill,
> >
> > I don't think so. We know from Michael Depreli's series that gnubg and
> > Snowie4 are equally strong, so it's rather meaningless to have snowie
> > evaluate gnubg. The error rate produced by snowie is likely to be
> > wrong.
> We'll I think you are right, that the results judgeing BGBlitz moves
> with GNU-BG is meaningful, although a little bias maybe if a bot
> handels a situation different, but this is surely not the rule.

Doing an error analysis with gnubg on BGBlitz' moves is also close to
meaningless. We really need to do the luck analysis....

> > In the absence of a perfect bot we need to use luck analysis.
> A perfect bot... wouldn't it be booring if we were already there ;)
> (At least for you and me and the rest of the GNU-Team and somne others
> ...)

Well, we're close: I don't know about BGBlitz but in gnubg you can (in
principle) set an arbitrary number of plies. Set it to 100 and you'll
get something very close to the true result [*]. Unfortunately the
answer will generally not arrive in the lifetime of this galaxy.

Jørn

[*] To get the exact result we need to abandon the forward-pruning

Frank Berger

unread,
Dec 15, 2003, 1:42:43 PM12/15/03
to
j...@chem.sdu.dk (jthyssen) wrote in message news:<36775ed0.03121...@posting.google.com>...

Hi Jørn,

> Doing an error analysis with gnubg on BGBlitz' moves is also close to
> meaningless. We really need to do the luck analysis....

Agree, that better...

>
> > > In the absence of a perfect bot we need to use luck analysis.
> > A perfect bot... wouldn't it be booring if we were already there ;)
> > (At least for you and me and the rest of the GNU-Team and somne others
> > ...)
>
> Well, we're close: I don't know about BGBlitz but in gnubg you can (in
> principle) set an arbitrary number of plies. Set it to 100 and you'll
> get something very close to the true result [*]. Unfortunately the
> answer will generally not arrive in the lifetime of this galaxy.

LOL.... Yes, having an average cubeless game taking 55 moves 100 ply
is sure
;))

> [*] To get the exact result we need to abandon the forward-pruning

What will end first our interest in the result or the Galaxy LOL

A little speedup might be a programmable HW. I saw Brutus in Graz (a
programmable FPGA(?)) playing chess. I planned to talk with Dollinger
about it to get an idea whether a programmable HW is possible for BG
too. Does anyone knows something about it? (The idea would be
naturally 4-ply)

Further I had the impression that 3 ply plays a lot stronger than 2
ply (taking care for some tactical thing BGBlitz likes to oversee and
seing the effects of diversifikation) Has anyone measure it for GNU
(or 1-ply against 2-ply)(I mean rest-of-the-world plies ;))

Did anyone ever looked at 4-ply sytematically? Is there an decreasing
effect like in chess?

0 new messages