Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Some statistics on the BigBrother Matches (till November)

2 views

Skip to first unread message

Peter Fankhauser

unread,

Jan 19, 1996, 3:00:00 AM1/19/96

Due to a problem at our local site, all of my postings since about
2 months seem to have never left our site. Thus I repost these statistics.
If they have already shown up in December I apologize for the load.

Peter Fankhauser

---------------------------------------------------------------------

Hi all,

Ever since Mark Damish's bot Big_Brother has started to record
matches between the top 150 players on fibs I wanted to extract
some statistics from them. Now that the database at Patti Beadle's
ftp site contains over a 1000 matches from August 95 to October 95
there are enough data to derive something statistically significant.

So I wrote a small awk-script to derive match equities, initial cube
actions, and influence of the fibs rating difference on the outcome of
matches at certain lengths. As a fairly homogeneous control set for the
script I used the collection of 100 mloner vs. idiot 5pters from their
recent battle on fibs (mloner won 55 to 45).

In part the results are surprising. My first explanation for the
occurring deviations from "official" data was of course bugs in my
script. But even after extensive debugging (there is no bug free
software), the data don't behave as they should. The explanations left
may be a combination of lack of examples, undetected noise in the
matches (I did some consistency checks and the data seem to be of good
quality - I had to rule out just 2 matches out of 1035),
principle problems in methodology, or a significant
difference between real life backgammon and that virtual world of
fibs. I don't know. Well, maybe we're really testbunnies afterall -
and the data is completely skewed by the real Big Brothers:-)

Anyway here are the data:

(A) Match Equities

To compute the match equities I simply counted the number of wins from
a particular score (in terms of m-away,n-away), and the number of
losses. The equity[m-away,n-away], that is, the average number of
matches won (or lost) from a certain score, can then be derived by
(win[m-away,n-away]-loss[m-away,n-away])/(win[m-away,n-away]+loss[m-away,n-away]).
Post-Crawford scores I did not count. This leads to the following table
(Table 1).

Match Equities (Table 1)

1 2 3 4 5 6 7
1: 0.000 0.333 0.526 0.590 0.765 0.826 0.824
2: -0.333 0.000 0.143 0.222 0.526 0.421 0.622
3: -0.526 -0.143 0.000 0.116 0.269 0.565 0.535
4: -0.590 -0.222 -0.116 0.000 0.170 0.282 0.309
5: -0.765 -0.526 -0.269 -0.170 0.000 0.296 0.267
6: -0.826 -0.421 -0.565 -0.282 -0.296 0.000 0.062
7: -0.824 -0.622 -0.535 -0.309 -0.267 -0.062 0.000

Or in more familiar terms of the percentage matches won (Table 2):

Match Percentages (Table 2)

1 2 3 4 5 6 7
1: 50.0 66.7 76.3 79.5 88.2 91.3 91.2
2: 33.3 50.0 57.1 61.1 76.3 71.1 81.1
3: 23.7 42.9 50.0 55.8 63.4 78.3 76.7
4: 20.5 38.9 44.2 50.0 58.5 64.1 65.5
5: 11.8 23.7 36.6 41.5 50.0 64.8 63.3
6: 8.7 28.9 21.7 35.9 35.2 50.0 53.1
7: 8.8 18.9 23.3 34.5 36.7 46.9 50.0

While the table seems to be fairly consistent in itself, that is, the
chance of winning a match increases with the lead in the match,
the actual numbers differ in part significantly from Kit
Woolsey's table derived from Hal Heinrich's database of real life
experts (to my knowledge first published in Inside
Backgammon/Vol.2/No.2/March-April 1992). For example, for
(1-away,2-away) the fibs table gives just 67% chance, while Woolsey's
table give 70% chance. Even more striking, the percentage for
(2-away,4-away) differs by a whopping 9% from the figure in Woolsey's
table. Mark Damish, whome I have sent the table a few days ago,
suggested that this indicates that the fibs experts double way too late
when being behind 4-away/2-away. Admittedly, I did not know about the
proper cube-action at this score (but I wouldn't consider myself as an
expert), so this might well be (see also below - Table 4).

The other reason for the deviations may of course be a lack of examples.
However, at least up to scores 7-away/7-away the number of examples
almost reaches the numbers in Heinrich's database (see Table 3, divide
numbers for n-away/n-away by 2). (I started with determining the degree
of error for a 95% confidence interval - but still have to freshen up
my poor background in statistics to give any hard data here).

Number of examples for a particular score (Table 3)

1 2 3 4 5 6 7
1: 240 57 152 39 119 23 34
2: 57 212 203 90 114 38 37
3: 152 203 550 138 227 46 86
4: 39 90 138 118 212 39 55
5: 119 114 227 212 872 54 120
6: 23 38 46 39 54 62 128
7: 34 37 86 55 120 128 554

For comparison Table 2a gives the percentages from the mloner vs. idiot matches:

Match Percentages from mloner vs. idiot (Table 2a)

1 2 3 4 5
1: - 50.0 83.3 90.0 96.3
2: 50.0 50.0 68.8 88.9 66.7
3: 16.7 31.2 50.0 48.1 70.5
4: 10.0 11.1 51.9 50.0 63.3
5: 3.7 33.3 29.5 36.7 50.0

Obviously, these figures even increase the observed trend in deviation.
The leader appears to have much better chances then usual. However,
100 matches do not give that many examples (see Table 3a):

Number of examples for a particular score from mloner vs. idiot (Table 3a)

1 2 3 4 5
1: 0 4 6 10 27
2: 4 12 16 18 12
3: 6 16 24 27 44
4: 10 18 27 16 49
5: 27 12 44 49 200

(B) Cube action

(B.A) Initial Cubes depending on matchscore

The next thing I investigated was the number of initial cubes issued by
the leader vs. the number of initial cubes issued by the trailer in a
match (see Table 4). Again I switched off Post-Crawford games, thus
there are no figures for 1-away scores.

Initial Cubes by Leader depending on the match-score (Table 4)

2 3 4 5 6 7
2: - 26.3 22.6 22.2 19.4 22.9
3: 73.7 - 48.1 41.8 45.5 43.4
4: 77.4 51.9 - 48.1 43.2 50.0
5: 77.8 58.2 51.9 - 50.0 41.2
6: 80.6 54.5 56.8 50.0 - 56.3
7: 77.1 56.6 50.0 58.8 43.7 -

The table seems to be fairly consistent in itself, that is, the
larger the lead, the less initial cubes are issued. If Mark Damish's
hypothesis is true, the percentage for 4-away/2-away should probably be
larger. The inconsistency for some 6-away or 7-away scores may be due
to the lack of examples (see Table 5 - again divide n-away/n-away by 2).

Overall Number of Initial Cubes (Table 5)

2 3 4 5 6 7
2: 208 190 84 108 36 35
3: 190 534 133 225 44 83
4: 84 133 114 212 37 54
5: 108 225 212 848 54 119
6: 36 44 37 54 62 126
7: 35 83 54 119 126 546

And here are the data from the mloner vs. idiot matches.

Initial Cubes by Leader from mloner vs. idiot (Table 4a)

2 3 4 5
2: - 31.2 25.0 16.7
3: 68.8 - 48.1 39.5
4: 75.0 51.9 - 58.3
5: 83.3 60.5 41.7 -

Overall Number of Initial Cubes (Table 5a)

2 3 4 5
2: 12 16 16 12
3: 16 24 27 43
4: 16 27 16 48
5: 12 43 48 196

(B.B) Individual Cube Action

The mere number of initial cubes tells only part of the story.
Following a method developed by Kit Woolsey (see again Inside
Backgammon/Vol.2/No.2/March-April 1992 and Inside
Backgammon/Vol.2/No.3/ May-June 92) I counted the number of takes
vs. the number of passes of all initial doubles at scores >=
4-away/4-away. In his evaluation of Heinrich's database
Woolsey only considered scores >= 15-away/15-away and some close
scores >= 7-away, but as the maximum length of matches in
Mark's fibs-collection is 11pts I just had to lower this borderline.

Furthermore, to get an impression on the effectiveness of
the cube action, I determined the equity on takes as follows
(again following Woolsey).

For every game doubled by Player 1, taken by Player 2, and played
to conclusion I counted the number of points won or lost by Player 2.
Because of the limited length of the matches I counted backgammons
gammons. For every taken recube I gave Player 2 a settlement of
1.6 points, and for every recube/pass Player 2 got 2 points.

For each player who has at least issued one initial cube this procedure
gave entries like the following:

Examples of individual cube actions (Table 6)

Player doubles Opponent doubles
Take Eq-opp Pass Total Take Eq-play Pass Total
mloner : 94 ( 69%, -0.55) 42 ( 30%) 136 52 ( 48%, -0.49) 55 ( 51%) 107
idiot : 11 ( 47%, -0.95) 12 ( 52%) 23 17 ( 77%, -0.80) 5 ( 22%) 22
funk : 20 ( 60%, -1.56) 13 ( 39%) 33 23 ( 71%, -0.78) 9 ( 28%) 32
.........
Sum :843 ( 55%, -0.71)680 ( 44%)1523 843 ( 55%, -0.71)680 ( 44%) 1523

The first row gives the number of takes by the opponent, the second the
percentage of takes, the third the opponent's equity on a take, the 4th
the number of passes, the 5th their percentage, and the 6th the total
number of initial doubles issued by the player (rows 7 through 12 give
the figures for the initial doubles by the opponent). Obviously
take-equities below -1.00 indicate that the opponent or player has
taken too much, and take-equities close to 0 or even above indicate
that the player or opponent tend to double prematurely. So I certainly
like my cube action here, but the bots aren't doing that bad either :).
For the full list see the Appendix.

To get a feeling for the influence of the fairly small match-length on these data
I computed the summary data for slightly larger scores too:

>=4 : 843 ( 55%, -0.71) 680 ( 44%) 1523 (see above)
>=5 : 649 ( 56%, -0.80) 498 ( 43%) 1147
>=6 : 297 ( 56%, -0.72) 233 ( 43%) 530
>=7 : 200 ( 54%, -0.67) 165 ( 45%) 365

Applying the same procedure to the mloner vs. idiot matches lead to very surprising
data...(see Table 6a)

Individual cube actions from mloner vs. idiot (Table 6a)

Mloner doubles Idiot doubles
Take Eq-idiot Pass Total Take Eq-mloner Pass Total
49 ( 60%, -0.78) 32 ( 39%) 81 28 ( 38%, 0.29) 45 ( 61%) 73

It certainly looks like idiot has found its master. mloner seems to take very conservatively
(only 38%) but manages to get a positive equity on these takes!

Finally, I took a look at the relationship between rating difference,
match length, and outcome of the match. These data I find the hardest
to believe, so I'm still searching hard for the bug in my script.

At first I counted for each match length (ml) the number of matches (nm),
the number of points played (np = nm*ml), the number of points won by
the favorite - number of points won by the player with the smaller
rating (tp) (the size of the win was of course not regarded).
tp/np then gives the average number of points won by the
favorite. For comparison, I computed also the total rating gain (tg) for
the favorite based on the fibs rating formula used. In order to make
the rating changes comparable over the various match-lengths, I accumulated
also all rating changes divided by the sqrt(ml), giving a total rating gain
average (tga) and divided this number by the number of matches played for
each length, giving tga/nm.

And here are the results:

Gain/Loss of favorite according to matchlength (Table 7)

ml: nm ( np) tp tp/np; tg tga/nm
1: 120 ( 120) 24 0.200; 36.76 0.306
2: 10 ( 20) -4 -0.200; -6.81 -0.482
3: 185 ( 555) 27 0.049; 3.42 0.011
4: 3 ( 12) -12 -1.000; -12.62 -2.104
5: 405 (2025) -35 -0.017; -146.05 -0.161
6: 1 ( 6) -6 -1.000; -5.11 -2.086
7: 270 (1890) -14 -0.007; -143.61 -0.201
8: 1 ( 8) -8 -1.000; -5.83 -2.062
9: 25 ( 225) -45 -0.200; -43.51 -0.580
11: 13 ( 143) -33 -0.231; -28.17 -0.653
Su: 1033 (5004)-106 -0.103; -351.53 -0.115

It looks like for all matches longer than 3 pts the favorite is not favorite
at all! (and the 1-pters may mostly be due to one_pointer :))
If this was only true for the rating changes, one might argue that
the rating formula is unjust, but it also holds for the total points won or lost.
And for those the size of the rating difference is not taken into account at all.
I'm totally stuck here - is there any principle flaw in the evaluation?

Gain/Loss of favorite according to rating difference (Table 8)

rd: nm ( np) tp tp/np; tg tga/nm
0: 142 ( 709) 25 0.035; 9.93 -0.017
10: 146 ( 701) -37 -0.053; -32.84 -0.063
20: 104 ( 480) -26 -0.054; -26.80 -0.060
30: 106 ( 528) 32 0.061; 3.29 0.025
40: 86 ( 435) -73 -0.168; -78.72 -0.345
50: 78 ( 390) -66 -0.169; -86.64 -0.499
60: 63 ( 297) 1 0.003; -16.10 -0.061
70: 65 ( 321) -15 -0.047; -31.62 -0.155
80: 45 ( 200) -6 -0.030; -22.21 -0.152
90: 31 ( 130) 36 0.277; 21.60 0.371
100: 40 ( 210) 2 0.010; -25.14 -0.261
110: 15 ( 77) 15 0.195; 2.80 0.121
120: 13 ( 67) -17 -0.254; -22.57 -0.758
130: 17 ( 101) 53 0.525; 27.83 0.699
140: 11 ( 57) -25 -0.439; -25.34 -0.890
150: 10 ( 46) 4 0.087; -5.50 -0.353
160: 15 ( 73) -27 -0.370; -29.56 -0.508
170: 9 ( 43) 5 0.116; -0.52 0.267
180: 9 ( 43) -11 -0.256; -15.71 -0.646
190: 10 ( 32) 6 0.188; 1.91 0.449
200: 6 ( 24) 12 0.500; 4.13 0.241
210: 3 ( 13) -3 -0.231; -5.92 -1.135
220: 3 ( 15) -1 -0.067; -5.71 -1.188
240: 2 ( 2) 0 0.000; -0.56 -0.278
250: 3 ( 3) 3 1.000; 5.15 1.718
260: 1 ( 7) 7 1.000; 3.29 1.242
Sum: 1033 (5004)-106 -0.103; -351.53 -0.115

Maybe Table 8 sheds some further light on this. It gives the same data categorized
according to the size of the rating difference in steps of 10 pts (the fact that
tg=9.93 >0 whereas tga < 0 for rating differences between 0 and 10 pts comes from
dividing the individual rating changes by sqrt(matchlength).

Food for thought...

Further interpretations and suggestions for other kinds of
summaries are welcome. As soon as I've got the time I intend to some analysis
of checkerplays and gammon-rates at certain match-scores.

Enjoy,

Peter Fankhauser

APPENDIX: Individual cube actions for all experts :)

Player doubles Opponent doubles
Take Pass Total Take Pass Total
Albatross : 2 ( 66%, 1.80) 1 ( 33%) 3 2 ( 40%, -1.20) 3 ( 60%) 5
AlexanderHess : 0 ( 0%, 0.00) 2 (100%) 2 4 ( 57%, -1.20) 3 ( 42%) 7
Antti : 0 ( 0%, 0.00) 2 (100%) 2 0 ( 0%, 0.00) 3 (100%) 3
BadDice : 1 ( 50%, 1.60) 1 ( 50%) 2 0 ( 0%, 0.00) 0 ( 0%) 0
BarNaked : 0 ( 0%, 0.00) 0 ( 0%) 0 1 (100%, -4.00) 0 ( 0%) 1
Blake_ : 27 ( 47%, -0.53) 30 ( 52%) 57 25 ( 43%, -1.07) 32 ( 56%) 57
Bob_P : 2 (100%, -1.00) 0 ( 0%) 2 1 ( 50%, -4.00) 1 ( 50%) 2
Bogey : 1 (100%, 4.00) 0 ( 0%) 1 2 (100%, -0.20) 0 ( 0%) 2
Conrad : 1 (100%, -4.00) 0 ( 0%) 1 0 ( 0%, 0.00) 0 ( 0%) 0
DENNYL : 2 ( 40%, -3.00) 3 ( 60%) 5 2 ( 50%, 0.00) 2 ( 50%) 4
ElDopo : 1 (100%, -4.00) 0 ( 0%) 1 1 (100%, 2.00) 0 ( 0%) 1
FME : 2 ( 66%, 2.00) 1 ( 33%) 3 0 ( 0%, 0.00) 2 (100%) 2
Frank_Mogler : 5 ( 55%, -0.56) 4 ( 44%) 9 7 ( 53%, -0.69) 6 ( 46%) 13
FvanBeek : 3 ( 75%, -1.33) 1 ( 25%) 4 2 ( 33%, -0.20) 4 ( 66%) 6
Gator : 1 ( 50%, 2.00) 1 ( 50%) 2 0 ( 0%, 0.00) 1 (100%) 1
Grimator : 5 ( 71%, -0.88) 2 ( 28%) 7 5 ( 62%, 1.76) 3 ( 37%) 8
GuenterBernard : 3 ( 37%, -0.27) 5 ( 62%) 8 3 ( 42%, 1.87) 4 ( 57%) 7
HaraldJohanni : 1 ( 50%, -4.00) 1 ( 50%) 2 0 ( 0%, 0.00) 0 ( 0%) 0
JoeMontana : 1 ( 25%, 1.60) 3 ( 75%) 4 2 ( 66%, 2.00) 1 ( 33%) 3
Johnny : 0 ( 0%, 0.00) 0 ( 0%) 0 0 ( 0%, 0.00) 2 (100%) 2
JoshP : 2 ( 25%, 1.80) 6 ( 75%) 8 4 ( 44%, -3.00) 5 ( 55%) 9
KG : 3 (100%, -1.33) 0 ( 0%) 3 2 (100%, -3.00) 0 ( 0%) 2
L_Bosen : 0 ( 0%, 0.00) 0 ( 0%) 0 1 (100%, -4.00) 0 ( 0%) 1
Mark : 5 ( 50%, 0.00) 5 ( 50%) 10 5 ( 62%, 0.40) 3 ( 37%) 8
Masahito : 14 ( 53%, -0.89) 12 ( 46%) 26 19 ( 57%, -0.80) 14 ( 42%) 33
Mika : 2 ( 50%, -1.20) 2 ( 50%) 4 2 ( 66%, -1.00) 1 ( 33%) 3
Nadoll : 9 ( 64%, -1.78) 5 ( 35%) 14 7 ( 50%, -1.14) 7 ( 50%) 14
Neil : 0 ( 0%, 0.00) 0 ( 0%) 0 1 (100%, 1.60) 0 ( 0%) 1
OllieNorth : 0 ( 0%, 0.00) 0 ( 0%) 0 1 (100%, 1.60) 0 ( 0%) 1
PeterSchneider : 3 ( 75%, 1.73) 1 ( 25%) 4 5 ( 41%, -0.08) 7 ( 58%) 12
Ricky : 0 ( 0%, 0.00) 1 (100%) 1 1 (100%, 2.00) 0 ( 0%) 1
Rtrout : 5 ( 50%, -1.20) 5 ( 50%) 10 4 ( 28%, -1.00) 10 ( 71%) 14
SAP : 1 (100%, 1.60) 0 ( 0%) 1 3 ( 75%, 1.73) 1 ( 25%) 4
SallySue : 1 (100%, 1.60) 0 ( 0%) 1 1 (100%, -4.00) 0 ( 0%) 1
Seldon : 1 (100%, 4.00) 0 ( 0%) 1 0 ( 0%, 0.00) 0 ( 0%) 0
ToddC : 9 ( 81%, -1.38) 2 ( 18%) 11 6 ( 46%, 1.47) 7 ( 53%) 13
Tom_Weaver : 25 ( 65%, -0.94) 13 ( 34%) 38 21 ( 56%, -0.21) 16 ( 43%) 37
UNLUCKY : 0 ( 0%, 0.00) 1 (100%) 1 0 ( 0%, 0.00) 2 (100%) 2
USRobots : 1 (100%, 1.60) 0 ( 0%) 1 0 ( 0%, 0.00) 0 ( 0%) 0
Wolfgang_Scherer : 5 ( 71%, -1.20) 2 ( 28%) 7 1 ( 50%, -4.00) 1 ( 50%) 2
Yaz : 2 ( 66%, 0.00) 1 ( 33%) 3 1 ( 33%, 2.00) 2 ( 66%) 3
aaron : 16 ( 36%, 0.53) 28 ( 63%) 44 20 ( 39%, -0.50) 31 ( 60%) 51
acey_deucey : 0 ( 0%, 0.00) 0 ( 0%) 0 2 (100%, 0.00) 0 ( 0%) 2
ajp : 0 ( 0%, 0.00) 2 (100%) 2 1 ( 50%, 1.60) 1 ( 50%) 2
aki : 1 ( 50%, -2.00) 1 ( 50%) 2 2 ( 40%, -1.20) 3 ( 60%) 5
aljones : 21 ( 77%, -0.46) 6 ( 22%) 27 16 ( 84%, -0.80) 3 ( 15%) 19
alte : 0 ( 0%, 0.00) 2 (100%) 2 1 ( 50%, 1.60) 1 ( 50%) 2
axelM : 9 ( 47%, -0.58) 10 ( 52%) 19 8 ( 57%, 1.00) 6 ( 42%) 14
baobab : 1 ( 33%, -2.00) 2 ( 66%) 3 1 ( 33%, -2.00) 2 ( 66%) 3
bazooka : 30 ( 60%, -0.51) 20 ( 40%) 50 27 ( 65%, -1.02) 14 ( 34%) 41
blondie : 0 ( 0%, 0.00) 0 ( 0%) 0 1 (100%, 2.00) 0 ( 0%) 1
boxcars : 0 ( 0%, 0.00) 1 (100%) 1 1 (100%, -4.00) 0 ( 0%) 1
brendanb : 0 ( 0%, 0.00) 0 ( 0%) 0 1 (100%, -4.00) 0 ( 0%) 1
ch_lh : 1 (100%, -2.00) 0 ( 0%) 1 1 (100%, -2.00) 0 ( 0%) 1
chacal : 2 ( 33%, 1.80) 4 ( 66%) 6 6 ( 75%, -1.13) 2 ( 25%) 8
champion : 1 ( 50%, 1.60) 1 ( 50%) 2 1 (100%, 2.00) 0 ( 0%) 1
char : 7 ( 53%, 1.26) 6 ( 46%) 13 10 ( 47%, 0.72) 11 ( 52%) 21
charles : 3 ( 75%, 1.33) 1 ( 25%) 4 0 ( 0%, 0.00) 0 ( 0%) 0
chasM : 10 ( 83%, -0.88) 2 ( 16%) 12 6 ( 54%, -0.13) 5 ( 45%) 11
chela : 1 (100%, -4.00) 0 ( 0%) 1 3 (100%, 0.00) 0 ( 0%) 3
colgan : 16 ( 72%, 0.67) 6 ( 27%) 22 15 ( 65%, -1.15) 8 ( 34%) 23
copy : 1 (100%, 1.60) 0 ( 0%) 1 0 ( 0%, 0.00) 0 ( 0%) 0
coyne : 1 ( 50%, 2.00) 1 ( 50%) 2 1 (100%, 2.00) 0 ( 0%) 1
crlo : 4 ( 50%, -0.60) 4 ( 50%) 8 4 ( 50%, -2.00) 4 ( 50%) 8
dekmark : 35 ( 61%, -0.80) 22 ( 38%) 57 16 ( 39%, -0.02) 25 ( 60%) 41
desmond : 3 ( 60%, -0.80) 2 ( 40%) 5 4 ( 44%, 0.50) 5 ( 55%) 9
dh : 4 ( 80%, -2.10) 1 ( 20%) 5 2 ( 50%, -4.00) 2 ( 50%) 4
dorn : 6 ( 42%, -1.67) 8 ( 57%) 14 8 ( 57%, -1.05) 6 ( 42%) 14
dougie : 19 ( 35%, -0.88) 34 ( 64%) 53 29 ( 59%, -0.86) 20 ( 40%) 49
eagle : 0 ( 0%, 0.00) 2 (100%) 2 1 ( 33%, -2.00) 2 ( 66%) 3
ekw : 1 ( 16%, -4.00) 5 ( 83%) 6 7 ( 63%, -0.86) 4 ( 36%) 11
elmroth : 3 ( 42%, 0.67) 4 ( 57%) 7 2 ( 50%, -1.00) 2 ( 50%) 4
enger : 0 ( 0%, 0.00) 1 (100%) 1 3 ( 75%, 0.67) 1 ( 25%) 4
erick : 1 (100%, 2.00) 0 ( 0%) 1 0 ( 0%, 0.00) 0 ( 0%) 0
ernestho : 9 ( 50%, -1.60) 9 ( 50%) 18 11 ( 61%, 0.22) 7 ( 38%) 18
fibster : 1 (100%, -4.00) 0 ( 0%) 1 2 ( 50%, -2.00) 2 ( 50%) 4
figgis : 8 ( 57%, 0.20) 6 ( 42%) 14 7 ( 70%, 0.29) 3 ( 30%) 10
fletcher : 0 ( 0%, 0.00) 0 ( 0%) 0 1 (100%, -2.00) 0 ( 0%) 1
fnurt : 3 ( 37%, -1.33) 5 ( 62%) 8 5 ( 50%, -1.60) 5 ( 50%) 10
fredrikd : 1 ( 50%, -4.00) 1 ( 50%) 2 1 (100%, 4.00) 0 ( 0%) 1
funk : 20 ( 60%, -1.56) 13 ( 39%) 33 23 ( 71%, -0.78) 9 ( 28%) 32
gamnman : 1 (100%, -4.00) 0 ( 0%) 1 1 ( 50%, 2.00) 1 ( 50%) 2
gap : 0 ( 0%, 0.00) 1 (100%) 1 2 (100%, 0.00) 0 ( 0%) 2
gking : 5 ( 62%, 0.64) 3 ( 37%) 8 3 ( 37%, -3.33) 5 ( 62%) 8
haarbo : 0 ( 0%, 0.00) 0 ( 0%) 0 1 (100%, -4.00) 0 ( 0%) 1
haddad : 10 ( 66%, -1.04) 5 ( 33%) 15 4 ( 33%, -1.10) 8 ( 66%) 12
hak : 2 ( 50%, 2.00) 2 ( 50%) 4 1 ( 50%, 2.00) 1 ( 50%) 2
hanse : 24 ( 48%, -0.58) 26 ( 52%) 50 32 ( 68%, -0.35) 15 ( 31%) 47
harryc : 1 ( 25%, -4.00) 3 ( 75%) 4 0 ( 0%, 0.00) 2 (100%) 2
heinrich : 10 ( 76%, -0.68) 3 ( 23%) 13 14 ( 77%, -0.71) 4 ( 22%) 18
hoegh : 0 ( 0%, 0.00) 2 (100%) 2 2 (100%, -3.00) 0 ( 0%) 2
hrafnhildur_valbjoernsdottir :
1 (100%, -4.00) 0 ( 0%) 1 1 (100%, -4.00) 0 ( 0%) 1
hugh : 4 ( 44%, -0.10) 5 ( 55%) 9 7 ( 70%, -0.11) 3 ( 30%) 10
hydra : 0 ( 0%, 0.00) 0 ( 0%) 0 1 (100%, -2.00) 0 ( 0%) 1
idiot : 11 ( 47%, -0.95) 12 ( 52%) 23 17 ( 77%, -0.80) 5 ( 22%) 22
igor : 0 ( 0%, 0.00) 1 (100%) 1 1 (100%, 1.60) 0 ( 0%) 1
indianajones : 0 ( 0%, 0.00) 0 ( 0%) 0 1 (100%, 1.60) 0 ( 0%) 1
jake_ : 1 (100%, 2.00) 0 ( 0%) 1 0 ( 0%, 0.00) 3 (100%) 3
jarma : 0 ( 0%, 0.00) 3 (100%) 3 0 ( 0%, 0.00) 3 (100%) 3
jeremy : 2 ( 25%, -4.00) 6 ( 75%) 8 3 ( 60%, 1.73) 2 ( 40%) 5
jerrygodsey : 1 (100%, -4.00) 0 ( 0%) 1 1 ( 33%, 2.00) 2 ( 66%) 3
jimmer : 1 ( 50%, 1.60) 1 ( 50%) 2 0 ( 0%, 0.00) 1 (100%) 1
jokrjo : 1 ( 50%, 2.00) 1 ( 50%) 2 0 ( 0%, 0.00) 1 (100%) 1
jruss : 1 (100%, -2.00) 0 ( 0%) 1 0 ( 0%, 0.00) 1 (100%) 1
jwallace : 5 ( 50%, -1.68) 5 ( 50%) 10 15 ( 68%, -1.73) 7 ( 31%) 22
kandha : 4 ( 44%, -1.10) 5 ( 55%) 9 2 ( 40%, -1.00) 3 ( 60%) 5
kanta : 1 (100%, 1.60) 0 ( 0%) 1 0 ( 0%, 0.00) 0 ( 0%) 0
karendavis : 0 ( 0%, 0.00) 2 (100%) 2 1 ( 33%, 1.60) 2 ( 66%) 3
kbw : 3 ( 42%, -2.00) 4 ( 57%) 7 5 ( 45%, -2.80) 6 ( 54%) 11
kenarnold : 7 ( 70%, -0.97) 3 ( 30%) 10 2 ( 20%, -1.00) 8 ( 80%) 10
kimf : 0 ( 0%, 0.00) 0 ( 0%) 0 1 (100%, -4.00) 0 ( 0%) 1
kitsalisbury : 2 ( 40%, -2.00) 3 ( 60%) 5 6 ( 50%, -1.40) 6 ( 50%) 12
kitwoolsey : 23 ( 62%, -1.10) 14 ( 37%) 37 12 ( 48%, -1.47) 13 ( 52%) 25
lax : 4 ( 50%, -2.50) 4 ( 50%) 8 6 ( 54%, -1.80) 5 ( 45%) 11
lloydr : 2 (100%, -0.20) 0 ( 0%) 2 1 (100%, -2.00) 0 ( 0%) 1
madkap : 0 ( 0%, 0.00) 2 (100%) 2 1 ( 33%, -2.00) 2 ( 66%) 3
masahiro : 1 ( 50%, -2.00) 1 ( 50%) 2 2 ( 66%, -1.00) 1 ( 33%) 3
mburns : 0 ( 0%, 0.00) 1 (100%) 1 0 ( 0%, 0.00) 0 ( 0%) 0
md : 12 ( 57%, -1.23) 9 ( 42%) 21 11 ( 68%, -0.87) 5 ( 31%) 16
melpaul : 0 ( 0%, 0.00) 1 (100%) 1 3 ( 60%, -0.13) 2 ( 40%) 5
michaelch : 1 ( 50%, -4.00) 1 ( 50%) 2 2 ( 66%, -1.20) 1 ( 33%) 3
mklein : 1 ( 20%, 1.60) 4 ( 80%) 5 3 ( 75%, 0.00) 1 ( 25%) 4
mloner : 94 ( 69%, -0.55) 42 ( 30%)136 52 ( 48%, -0.49) 55 ( 51%)107
mml : 2 (100%, -1.00) 0 ( 0%) 2 0 ( 0%, 0.00) 0 ( 0%) 0
mrbg : 3 ( 37%, -2.00) 5 ( 62%) 8 1 ( 20%, -2.00) 4 ( 80%) 5
mws : 7 ( 58%, -0.63) 5 ( 41%) 12 6 ( 54%, -1.07) 5 ( 45%) 11
mxi : 2 ( 40%, 1.60) 3 ( 60%) 5 2 ( 66%, -1.00) 1 ( 33%) 3
oegger : 1 (100%, -2.00) 0 ( 0%) 1 0 ( 0%, 0.00) 1 (100%) 1
pascuzzi : 16 ( 64%, -1.20) 9 ( 36%) 25 18 ( 66%, -1.16) 9 ( 33%) 27
pb : 7 ( 58%, -0.74) 5 ( 41%) 12 3 ( 75%, -4.00) 1 ( 25%) 4
perola : 3 ( 60%, 0.00) 2 ( 40%) 5 0 ( 0%, 0.00) 3 (100%) 3
pghsteve : 1 ( 50%, 2.00) 1 ( 50%) 2 4 ( 66%, -1.20) 2 ( 33%) 6
rainerbirkle : 2 ( 40%, 0.00) 3 ( 60%) 5 6 ( 75%, -0.47) 2 ( 25%) 8
ring : 2 ( 50%, -1.00) 2 ( 50%) 4 1 ( 33%, -4.00) 2 ( 66%) 3
rjohnson : 8 ( 50%, -0.35) 8 ( 50%) 16 6 ( 31%, -1.00) 13 ( 68%) 19
rob : 1 (100%, -4.00) 0 ( 0%) 1 0 ( 0%, 0.00) 0 ( 0%) 0
roman : 55 ( 69%, -0.67) 24 ( 30%) 79 58 ( 67%, -0.69) 28 ( 32%) 86
ronkarr : 3 ( 42%, 0.67) 4 ( 57%) 7 7 ( 77%, -1.14) 2 ( 22%) 9
rotwANG : 0 ( 0%, 0.00) 1 (100%) 1 3 ( 75%, 2.53) 1 ( 25%) 4
rwm : 6 ( 85%, -0.40) 1 ( 14%) 7 3 ( 30%, -0.13) 7 ( 70%) 10
rybak : 2 ( 50%, -3.00) 2 ( 50%) 4 0 ( 0%, 0.00) 2 (100%) 2
ryukyu : 1 ( 50%, 2.00) 1 ( 50%) 2 1 (100%, -4.00) 0 ( 0%) 1
scats : 10 ( 66%, -0.84) 5 ( 33%) 15 13 ( 59%, 0.25) 9 ( 40%) 22
shelley : 1 ( 33%, -4.00) 2 ( 66%) 3 1 ( 20%, -2.00) 4 ( 80%) 5
sos : 0 ( 0%, 0.00) 1 (100%) 1 0 ( 0%, 0.00) 2 (100%) 2
t_bremer : 2 (100%, 0.00) 0 ( 0%) 2 2 (100%, -4.00) 0 ( 0%) 2
tedr : 3 ( 50%, -1.33) 3 ( 50%) 6 8 (100%, 0.60) 0 ( 0%) 8
thekid : 0 ( 0%, 0.00) 2 (100%) 2 0 ( 0%, 0.00) 1 (100%) 1
thrilos : 0 ( 0%, 0.00) 0 ( 0%) 0 1 ( 33%, -4.00) 2 ( 66%) 3
titus : 1 ( 50%, 2.00) 1 ( 50%) 2 0 ( 0%, 0.00) 2 (100%) 2
tk : 15 ( 68%, -0.35) 7 ( 31%) 22 7 ( 46%, -1.43) 8 ( 53%) 15
toothy : 1 (100%, 4.00) 0 ( 0%) 1 0 ( 0%, 0.00) 0 ( 0%) 0
towanda : 18 ( 38%, -2.22) 29 ( 61%) 47 21 ( 48%, -1.54) 22 ( 51%) 43
trice : 15 ( 51%, -1.23) 14 ( 48%) 29 16 ( 47%, -0.78) 18 ( 52%) 34
vema : 5 ( 50%, -2.00) 5 ( 50%) 10 7 ( 41%, 0.51) 10 ( 58%) 17
watss : 0 ( 0%, 0.00) 5 (100%) 5 3 ( 50%, -0.67) 3 ( 50%) 6
wells : 12 ( 52%, -0.23) 11 ( 47%) 23 11 ( 57%, -2.44) 8 ( 42%) 19
wesM : 0 ( 0%, 0.00) 0 ( 0%) 0 1 (100%, -2.00) 0 ( 0%) 1
wittmann : 0 ( 0%, 0.00) 0 ( 0%) 0 1 (100%, 2.00) 0 ( 0%) 1
wking : 4 (100%, 0.30) 0 ( 0%) 4 3 ( 75%, 0.53) 1 ( 25%) 4
www : 0 ( 0%, 0.00) 2 (100%) 2 1 ( 50%, 2.00) 1 ( 50%) 2
yogi : 2 ( 40%, 2.00) 3 ( 60%) 5 2 ( 66%, -1.20) 1 ( 33%) 3
zam : 10 ( 43%, -1.64) 13 ( 56%) 23 12 ( 80%, -1.03) 3 ( 20%) 15
zap : 0 ( 0%, 0.00) 3 (100%) 3 5 ( 83%, -1.28) 1 ( 16%) 6
zoro : 1 (100%, -2.00) 0 ( 0%) 1 0 ( 0%, 0.00) 0 ( 0%) 0

bobk

unread,

Jan 22, 1996, 3:00:00 AM1/22/96

to ko...@bobrae.bd.psu.edu, fankh...@darmstadt.gmd.de

Peter Fankhauser <fank...@darmstadt.gmd.de> wrote:

>...

>Ever since Mark Damish's bot Big_Brother has started to record
>matches between the top 150 players on fibs I wanted to extract
>some statistics from them. Now that the database at Patti Beadle's
>ftp site contains over a 1000 matches from August 95 to October 95
>there are enough data to derive something statistically significant.
>
>So I wrote a small awk-script to derive match equities, initial cube
>actions, and influence of the fibs rating difference on the outcome of
>matches at certain lengths.

>..

Thanks for the work Peter. And Mark, Patti, and Marvin and
probably others also.

>In part the results are surprising.

>...

>Anyway here are the data:

>...

>While the table seems to be fairly consistent in itself, that is, the
>chance of winning a match increases with the lead in the match,
>the actual numbers differ in part significantly from Kit
>Woolsey's table derived from Hal Heinrich's database of real life
>experts (to my knowledge first published in Inside
>Backgammon/Vol.2/No.2/March-April 1992). For example, for
>(1-away,2-away) the fibs table gives just 67% chance, while Woolsey's
>table give 70% chance. Even more striking, the percentage for
>(2-away,4-away) differs by a whopping 9% from the figure in Woolsey's
>table.

>...Mark Damish, whome I have sent the table a few days ago,

>suggested that this indicates that the fibs experts double way too late
>when being behind 4-away/2-away.

>...

>The other reason for the deviations may of course be a lack of examples.
>However, at least up to scores 7-away/7-away the number of examples
>almost reaches the numbers in Heinrich's database (see Table 3, divide
>numbers for n-away/n-away by 2). (I started with determining the degree
>of error for a 95% confidence interval - but still have to freshen up
>my poor background in statistics to give any hard data here).
>Number of examples for a particular score (Table 3)
>
> 1 2 3 4 5 6 7
> 1: 240 57 152 39 119 23 34
> 2: 57 212 203 90 114 38 37
> 3: 152 203 550 138 227 46 86
> 4: 39 90 138 118 212 39 55
> 5: 119 114 227 212 872 54 120
> 6: 23 38 46 39 54 62 128
> 7: 34 37 86 55 120 128 554

I think that chance is the likely explanation for the difference
between the tables. For example let's suppose that 2a 4a actually
is 68% as kit's table says. Your chart says that 90 instances came
up. To make math easy lets say that 100 instances come up of something
that happens 68% of time. Then the actual number of occurences would
be a binomial distribution with parameters n=100 and p=.68
The variance of a binomial is n*p*(1-p) which equals 100*.68*.32= 21.76
The standard deviation is the square root of this, approximately 4.7.
Since a binomial with large n is close to a normal distribution the
observed value will be within 1 s.d. of the mean about 65% of the time
and within 2 s.d. about 95% of the time. Your value and kit's differ
by 7.8% (not 9% as post stated). This is easily within 2 deviations
so chance alone is a decent explanation of the difference.

I do believe that trailer does do a little better than Kit's table
says however. Since it was based on data 5 years ago it might be
accurate for players 5 years ago but players have improved and I think
more for the 4 away side than for the 2 away side.

I don't understand Mark's explanation. Trailer did better based on
Big Brother results than Kit's table predicts, so this means trailer
plays better than we might have expected. Did he perhaps mean that
players 5 years ago doubled too late when losing 4a 2a?

Is the table supposed to represent winning % for expert players
of equal skill. If so, I think the methodology is a source of bias
which favors the leader. There is about a 250 ratings point difference
between the best player and the 150th best player. Suppose it is a
match in which the ratings difference (and presumably the skill
difference) is significant. If a lopsided score is reached it is
more likely that the better player is winning. Thus the statistics
will likely show the leader winning more often than would be the
case between equal opponents.

I am sure that this is a source of bias but not sure if it is
enough to worry about. One partial solution would be to only consider
matches in which ratings difference is less than a threshold. Obviously
the problem is that data is not plentiful enough. Another solution would
be to do some modeling and try to estimate size of such a bias. Problems
with this are could be difficult, ratings and skill do not correlate
perfectly, and sometimes make things worse if try to get too fancy.

>...

>(C) The Fibs-rating or "How much of a favorite is the favorite?"
>
>Finally, I took a look at the relationship between rating difference,
>match length, and outcome of the match. These data I find the hardest
>to believe, so I'm still searching hard for the bug in my script.
>

It might be interesting to also look at these results but instead
use the ratings for everyone on Jan 21 rather than the ratings when
the match was played. This would be especially useful for those players
with under about a 1000 experience when the match was played. I think
the rating they have today is a better indication of their skill
one month ago than their rating one month ago was of their skill
one month ago. ( I think that the fact that ratings based on more
experience is more reliable is more important than the possible skill
change in that time period.) Mark, was there a lower bound on experience
which BigBrother requires to record the game?

>
>Gain/Loss of favorite according to matchlength (Table 7)
>
>ml: nm ( np) tp tp/np; tg tga/nm
> 1: 120 ( 120) 24 0.200; 36.76 0.306
> 2: 10 ( 20) -4 -0.200; -6.81 -0.482
> 3: 185 ( 555) 27 0.049; 3.42 0.011
> 4: 3 ( 12) -12 -1.000; -12.62 -2.104
> 5: 405 (2025) -35 -0.017; -146.05 -0.161
> 6: 1 ( 6) -6 -1.000; -5.11 -2.086
> 7: 270 (1890) -14 -0.007; -143.61 -0.201
> 8: 1 ( 8) -8 -1.000; -5.83 -2.062
> 9: 25 ( 225) -45 -0.200; -43.51 -0.580
>11: 13 ( 143) -33 -0.231; -28.17 -0.653
>Su: 1033 (5004)-106 -0.103; -351.53 -0.115
>
>It looks like for all matches longer than 3 pts the favorite is not favorite
>at all! (and the 1-pters may mostly be due to one_pointer :))
>If this was only true for the rating changes, one might argue that
>the rating formula is unjust,

There is another explanation. Remember that results are from a
tail end sample of the best players. How does a player get a very
high rating? There are 2 ways. One obviously is to be a very good
player. The other is to get more than one's share of luck. The higher
the rating the more likely it is that it is based partly on luck.
Thus the higher rated a player is, the more likely the player is to
be overrated. Thus an extremely high rated player is likely
to lose more often than the ratings system predicts.
One way to test the theory of highest rated players being overrated
is to write down the ratings of the top 20 players one day. One month
later check the ratings of these players again and compare. On average
the change should be negative (enough to overcome even the ratings
inflation rate I think.)

>but it also holds for the total points won or lost.
>And for those the size of the rating difference is not taken into account at all.
>I'm totally stuck here - is there any principle flaw in the evaluation?

Yes, it is stange that total points won seems to be opposite.

> ...
>other data deleted

,Bob Koca
ko...@bobrae.bd.psu.edu
bobk on FIBS

Mark Damish

unread,

Jan 23, 1996, 3:00:00 AM1/23/96

>>table give 70% chance. Even more striking, the percentage for
>>(2-away,4-away) differs by a whopping 9% from the figure in Woolsey's
>>table.
>>...Mark Damish, whome I have sent the table a few days ago,
>>suggested that this indicates that the fibs experts double way too late
>>when being behind 4-away/2-away.

One of the purposes of the Big_Brother program was to look at matches
played by players of different strengths and to find areas where
play is inefficient, along with the results of such play.
Although the B_B data for 4a,2a is certainly within 2 standard
deviations, I believe the current bias towards 4a winning more
than he/she (referred to as he heretofore) theoretically should,
is correct, in spite of theoretically inefficient doubles
(doubling too late) and is explainable.

I don't have a copy of the original post, but for the record, Kit's
table shows that that 4a,2a wins about 32% of the time, while observed
matches between the top 150 players on FIBS yield about 38% wins through
December. Kit's table assumes a gammon rate in the low 20's.

> I don't understand Mark's explanation. Trailer did better based on
>Big Brother results than Kit's table predicts, so this means trailer
>plays better than we might have expected. Did he perhaps mean that
>players 5 years ago doubled too late when losing 4a 2a?

I've never tried to explain this. I summarize what I believe with:

1) At 4a,2a it seems that 4a doubles too late which is technically
inefficient.
2) At 4a,2a it seems that 2a takes doubles which are technically
drops at this score. This is also technically inefficient.
3) The two technically incorrect plays DO NOT cancel out, and the
result is a beautiful equity finesse which favors the 4a
player.
4) This brings up some ramifications where the technically correct
play might be the wrong play.

It is particularly important to note that the 4a player should be
the weaker player "on average", because the stronger player should
reach 2a more often than the weaker player. Given the wide range
of ability using the top 150 players on FIBS, it is on average
the weaker player who appears to be gaining from errors at this score.

1)
Why do I believe that 4a,2a doubles occur too late? Take a look at this
position. You (4a) roll a 31: 8/5 6/5. Your opponent (2a) rolls a
41: and slots with 13/9 6/5. Would you consider doubling this position?
Would you consider dropping this position? The built in money reflex
of money play would usually see the dice shake without a second thought.
But, lets evaluate the position. Some relative numbers. Assuming that
4a is considering turning the cube. The take point for 2a is 0.2.
The gammon price for 4a is 1.0. the gammon price for 2a is 0.0.
The recube vig for 2a is 0.0.
Since the gammon price is 1, there is a cute shortcut to compute
the percent of games the 2a needs to win in order to take:
min_%winning_games(2a) = 20% + %gammons(4a)
In the above example (from memory) after 4a made the 5-pt and 2a
exposed extra blots, 4a will gammon 2a more than in the starting
position. How much? If I assume that the starting position yields
11-12% gammons of total games for me, then I can estimate that about
17-20% of the total games are gammons for 4a given the above position.
2a needs (20+17=37%) to be able to win about 37-40% of the time to
have a take in this position. Does he? It's close, and is probably
correct for a stronger player to drop a weaker player at this
score from that position after the first roll!!!
The key to understanding 4a,2a double/takes is to be ability to
assess 4a's gammon chances, and 2a's winning chances. It is important
to look at the position every roll. There are other doubles at
4a,2a to be aware of. It truly is the most amazing score in backgammon.

2)
Once you start estimating the winning_game% for 2a, look at games
played at this score played by players at all levels. I've heard
some world class players recently comment that they would never have
considered some of the doubles that JF/Mloner make at this score,
but after study, they conclude that if correct, they have been
doubling too late. Since quite a few strong players fail to
consider to double at the technically optimal time, it is very
safe to assume that they (and weaker) players also fail to correctly
analyze this score and take positions that are technically passes.

3)
Assuming that the results are not some cruel statistical abboration,
(we're still within 2 standard deviations here), then the Big_Brother
program is indicating that when 4a doubles late, and 2a takes, then
4a will win more than his share of games, as dictated by match equity
tables. It makes sense: If one takes positions which are drops, then
he looses equity. This is true for money, as well as the 4a,2a score,
where correct cube action is often misjudged.

4)
Consider another score that has had a lot more analysis, 2a,2a.
What is the proper doubling strategy at this score?
-- Double with ANY single market losing sequence, no matter how remote.
-- Double when you approach your opponents take point (30% cubeless).
-- Double the first chance you get.
-- Double exactly the same as a money game.
If you chose ANY of the above, you are not getting maximum equity at the
2a,2a score, UNLESS you are always playing the same class of player all
of the time.
If two players are utilizing Kits Woolsey's advice and making the optimal
technical play of doubling at the first market loser, then the correct
play corresponds to the correct technical play.
What if you're opponent has never heard of Kit Woolsey and you know it?
Doubling at the first market losing sequence will be the incorrect play,
as your opponent will ALWAYS (give or take 1 market loosing sequence)
take next roll. I won several extra matches last year because I knew
my opponent would not double at 2a,2a, and by waiting to approach his
personal drop point, I managed to lose the game, but not the match, which
I would have lost using the technically correct strategy.
The point of this is that optimum play might not always the
the technically correct play. If 2a is taking too much, then it looks
like it is correct to bypass the technically correct doubling point
for 4a,2a. If 2a is making optimal takes based on calculating his
winning chances Vs your gammons, then not doubling at the optimum
"early" time will end up costing him equity. It looks like the
decision to double at this score is largely based (as usual) on your
opponent. The optimal strategy at 2a is to only take positions that
where game_winning% > 20 + gammons(4a). By "taking only the takes"
you don't give away equity as the Big_Brother data indicates, and can
gain equity if your opponent is doubling theoretically late, which he
probably is.

Peace...
..Mark
dam...@ll.mit.edu

0 new messages