0 views

Skip to first unread message

Aug 19, 1998, 3:00:00â€¯AM8/19/98

to

I beleive there has been some speculation recently (by Ian Shaw and

Brian Sheppard) that JF actually plays at well at level-5 as it does at

level-7 if its opponent is of ability (and/or style) considerably different

than JF's. The only significant data which has been reported that I recall

(and even then, not presented) is Brian's record of play vs. JF

levels -5, -6, and -7. I quote from his post of 12 Aug 1998:

Brian Sheppard) that JF actually plays at well at level-5 as it does at

level-7 if its opponent is of ability (and/or style) considerably different

than JF's. The only significant data which has been reported that I recall

(and even then, not presented) is Brian's record of play vs. JF

levels -5, -6, and -7. I quote from his post of 12 Aug 1998:

I have played thousands of games against JF on levels 5, 6, and 7. I keep

careful records of every game outcome. My statistics show that my results

do not depend on which level JF is playing at.

Brian, I for one would VERY MUCH like you to summarize the data in a

newsgroup post. Specically the various game outcomes as a function of

level you were playing against. You have made some rather strong

statements about your own ability (compared to JF) and the relative

abilities of its different levels with respect to your game. I really

think you owe us (and Fredrik) the raw data so that we can reach our

own conclusions. You have kept careful records (which is good).

Would you PLEASE post them. I would like to see a compilation with

headings such as (WARNING: not real data!):

vs. JF level-5:

Brian wins: 1 2 4 6 8 12 16 ...

total # of games: 131 45 13 ....

JF wins: 1 2 4 6 8 12 16 ...

total # of games: 105 56 9 ....

vs. JF level-6: (etc.)

Now I am going to produce some evidence, analysis, and statistics

which provides some (but probably not conclusive) results backing the

side (my side, BTW) which says that JF-7 plays stronger then JF-5 vs.

typical human players (here the FIBS community). First the raw data

(and thanks to Jason Lee and Matt R.--aka Hacksaw--for doing the FIBS

rating reports over the time period captured below):

date jellyfish JF_level_five rating

rating exper change rating exper change difference

30-Jan-98 29548 63516

15-Dec-97 2037.68 29548 0 2004.14 63516 0

02-Dec-97 2037.68 29548 0 2004.14 63516 0

19-Nov-97 2037.68 29548 0 2004.14 63516 0

06-Jun-97 2048.75 29485 63 2004.14 63516 0

26-May-97 2043.47 29270 215 2004.14 63516 0

09-May-97 2067.89 26725 2545 2004.14 63516 0

25-Apr-97 2052.86 26197 528 2004.14 63516 0 48.72

11-Apr-97 2033.78 26122 75 1959.12 58052 5464 74.66

21-Mar-97 1975.93 25094 1028 1923.87 52683 5369

08-Mar-97 1975.93 25094 0 1870.71 47345 5338

15-Feb-97 1975.93 25094 0 1893.67 41437 5908

01-Feb-97 1975.93 25094 0 1918.44 36870 4567

17-Jan-97 1975.93 25094 0 1887.97 34447 2423

31-Dec-96 1974.96 25079 15 1853.69 30422 4025

14-Dec-96 1972.03 25077 2 1830.79 24432 5990 141.24

18-Nov-96 2006.16 24927 150 1908.44 17173 7259

17-Sep-96 1903.77 19046 5881 1908.44 17173 0

26-Aug-96 1921.30 17759 1287 1908.44 17173 0 12.86

29-Jul-96 1933.81 17209 550 1894.14 17013 160

15-Jul-96 1933.81 17209 0 1825.27 15982 1031 108.54

03-Jul-96 1919.74 16789 420 1824.66 14093 1889

17-Jun-96 1971.99 15376 1413 1824.66 14093 0

02-Jun-96 1943.78 13372 2004 1824.66 14093 0 119.12

11-Mar-96 1946.17 7559 5813 1816.86 9117 4976

22-Feb-96 1946.17 7559 0 1828.68 5246 3871 117.49

21-Jan-96 1984.16 5661 1898 1901.31 3007 2239

totals 23887 60509

games missed 5661 3007

average rating difference 88.95

std dev of rating difference 45.56

Now the analysis (some of which is actually in the table):

In order to compare the two bot versions, I required that each

played a minimum of 100 games since the last time I did a comparison.

This cut into the raw data considerably, since often at least one of

them was idle. The last column shows seven rating periods when this

requirement was met. (BTW, I came up with this plan BEFORE looking

at the data, just in case someone was wondering...).

The average difference in FIBS ratings over these seven rating

periods was 88.95 rating points. The standard deviation was 45.56

rating points. The result is almost a two standard deviation result.

On the surface, the data shows that it is quite likely that JF level-7

(jellyfish on FIBS) plays better against the typical FIBS opponent

than does JF level-5.

There are some assumptions, and I will try and list all (but

almost certainly leave some out).

1) The quality of opponent was typically the same for each bot. (Probably

a decent assumption, but I could imagine that SOME strong players would

only go after level-7 while SOME weak players might steer towards level-5.

Of course if the ratings system is robust--meaning insensitive to level

of opponent--then even this bias, if present, wouldn't matter.)

2) Fredrik was not changing the neural net during this time period, or,

if he in fact was, that he was changing BOTH robots and inserting the

SAME neural net. (Fredrik, could you please comment.)

Note my method is set up so as to be insensitive to variations in

overall ability of the FIBS community as a function of time. (It has been

speculated that "ratings inflation" on FIBS could be due to an overall

change in player abilities.) I only compare the ratings over the

same time period.

I'm sure many have noticed that the FINAL ratings of the two JF versions

differed by a much smaller amount: just over 33 rating points instead of

the 89 of my study. I don't see this as anything more than a single data

point compared to the seven data points I used (which varied from 13 to 141

in difference). Of course if Fredrik was changing the NN's over the time

period above then this conclusion could be way off.

OK, now its your turn, advocates of the opposite opinion. Please

present some data.

Chuck

bo...@bigbang.astro.indiana.edu

c_ray on FIBS

Aug 19, 1998, 3:00:00â€¯AM8/19/98

to

In article <6rf507$9cd$1...@jetsam.uits.indiana.edu>,

Chuck Bower <bo...@bigbang.astro.indiana.edu> wrote:

Chuck Bower <bo...@bigbang.astro.indiana.edu> wrote:

(snip)

> Now I am going to produce some evidence, analysis, and statistics

>which provides some (but probably not conclusive) results backing the

>side (my side, BTW) which says that JF-7 plays stronger then JF-5 vs.

>typical human players (here the FIBS community). First the raw data

>(and thanks to Jason Lee and Matt R.--aka Hacksaw--for doing the FIBS

>rating reports over the time period captured below):

>

> date jellyfish JF_level_five rating

> rating exper change rating exper change difference

>

>30-Jan-98 29548 63516

>15-Dec-97 2037.68 29548 0 2004.14 63516 0

>02-Dec-97 2037.68 29548 0 2004.14 63516 0

>19-Nov-97 2037.68 29548 0 2004.14 63516 0

>06-Jun-97 2048.75 29485 63 2004.14 63516 0

>26-May-97 2043.47 29270 215 2004.14 63516 0

>09-May-97 2067.89 26725 2545 2004.14 63516 0

>25-Apr-97 2052.86 26197 528 2004.14 63516 0 (48.72) remove

(the above difference doesn't count since jellyfish only played

75 matches in the period)

>11-Apr-97 2033.78 26122 75 1959.12 58052 5464 74.66

>21-Mar-97 1975.93 25094 1028 1923.87 52683 5369

>08-Mar-97 1975.93 25094 0 1870.71 47345 5338

>15-Feb-97 1975.93 25094 0 1893.67 41437 5908

>01-Feb-97 1975.93 25094 0 1918.44 36870 4567

>17-Jan-97 1975.93 25094 0 1887.97 34447 2423

>31-Dec-96 1974.96 25079 15 1853.69 30422 4025

>14-Dec-96 1972.03 25077 2 1830.79 24432 5990 141.24

>18-Nov-96 2006.16 24927 150 1908.44 17173 7259

>17-Sep-96 1903.77 19046 5881 1908.44 17173 0

>26-Aug-96 1921.30 17759 1287 1908.44 17173 0 12.86

>29-Jul-96 1933.81 17209 550 1894.14 17013 160

>15-Jul-96 1933.81 17209 0 1825.27 15982 1031 108.54

>03-Jul-96 1919.74 16789 420 1824.66 14093 1889

>17-Jun-96 1971.99 15376 1413 1824.66 14093 0

>02-Jun-96 1943.78 13372 2004 1824.66 14093 0 119.12

>11-Mar-96 1946.17 7559 5813 1816.86 9117 4976

>22-Feb-96 1946.17 7559 0 1828.68 5246 3871 117.49

>21-Jan-96 1984.16 5661 1898 1901.31 3007 2239 [ 82.85] (include)

(this last--first chronologically--rating period SHOULD have

been included. jellyfish played 433 matches that period and

JF_level_five began during that time, thus playing 3007 matches)

>

> totals 23887 60509

> games missed 5661 3007

> average rating difference 88.95

> std dev of rating difference 45.56

LAST TWO LINES WITH CORRECTIONS DETAILED ABOVE:

average rating difference 93.82

std dev of rating difference 42.24

The changes aren't very large. Also, requiring a minimum of 100

matches was an arbitrary threshold, but it was decided upon before

looking at the data so I should stick with that (and thus I've thrown out

the 25-Apr-97 rating report where jellyfish had only played 75 matches

since the previous report).

BTW, the "games missed" line is just the total experience for

each player minus the sum of the third and sixth columns respectively

("change" columns) and can be seen to be the total number of matches

played by the bots through 21-Jan-96, the first rating period to have

jf_level_five. This is just a "check sum" to insure I didn't miss

something.

(snip)

> In order to compare the two bot versions, I required that each

>played a minimum of 100 games since the last time I did a comparison.

(snip)

This should have said "100 MATCHES", not "100 games".

Aug 20, 1998, 3:00:00â€¯AM8/20/98

to

> I have played thousands of games against JF on levels 5, 6, and 7. I

keep

> careful records of every game outcome. My statistics show that my

results

> do not depend on which level JF is playing at.

>

The higher levels do play better, measured by JF rollouts.

You could verify this quite easily by rolling out positions

that the levels disagree on.

The answer to Brian's results are probably quite easy to explain.

First of all he has hardly played enough games to eliminate the

randomness of his average results (I don't know this for sure,

as I don't know just how many games he has played).

Secondly, the level 5 really does play a very solid game.

When I lost my university account, it had a rating>2000 on fibs.

That was lucky, I think, but still.

Thirdly, people tend to adjust their playing speed to their opponent.

(This is part of the reason why lvl5 did so good also.)

Human expert play deriorates more than the expert thinks, if

he plays even slightly faster than his usual speed.

(My oppinion based on analysis of my own game, not claiming to

be world class expert, tho.)

Fredrik Dahl.

Aug 20, 1998, 3:00:00â€¯AM8/20/98

to

bo...@bigbang.astro.indiana.edu (Chuck Bower) writes (well... he didn't

write this exactly, but this is my paraphrasition of his original article

and amendments):

[quoting Brian Sheppard]

> I have played thousands of games against JF on levels 5, 6, and 7. I keep

> careful records of every game outcome. My statistics show that my results

> do not depend on which level JF is playing at.

>

>Brian, I for one would VERY MUCH like you to summarize the data in a

>newsgroup post.

write this exactly, but this is my paraphrasition of his original article

and amendments):

[quoting Brian Sheppard]

> I have played thousands of games against JF on levels 5, 6, and 7. I keep

> careful records of every game outcome. My statistics show that my results

> do not depend on which level JF is playing at.

>

>Brian, I for one would VERY MUCH like you to summarize the data in a

>newsgroup post.

I would be keen to see that too. In the meantime though, I believe the

total number of games played was around 3000 and that Brian was ahead by

an "insignificant amount" against L7 and behind by an "insignificant amount"

against L5. For lack of real data, I'll assume he played 1000 games against

each of levels 5, 6 and 7, and came out even against each one -- with a lot

of hand waving, you can see that this data set would lead to his conclusion.

> Now I am going to produce some evidence, analysis, and statistics

>which provides some (but probably not conclusive) results backing the

>side (my side, BTW) which says that JF-7 plays stronger then JF-5 vs.

>typical human players (here the FIBS community). First the raw data

>(and thanks to Jason Lee and Matt R.--aka Hacksaw--for doing the FIBS

>rating reports over the time period captured below):

>

> date jellyfish JF_level_five rating

> rating exper change rating exper change difference

>

>30-Jan-98 29548 63516

>15-Dec-97 2037.68 29548 0 2004.14 63516 0

>02-Dec-97 2037.68 29548 0 2004.14 63516 0

>19-Nov-97 2037.68 29548 0 2004.14 63516 0

>06-Jun-97 2048.75 29485 63 2004.14 63516 0

>26-May-97 2043.47 29270 215 2004.14 63516 0

>09-May-97 2067.89 26725 2545 2004.14 63516 0

>25-Apr-97 2052.86 26197 528 2004.14 63516 0

>11-Apr-97 2033.78 26122 75 1959.12 58052 5464 74.66

>21-Mar-97 1975.93 25094 1028 1923.87 52683 5369

>08-Mar-97 1975.93 25094 0 1870.71 47345 5338

>15-Feb-97 1975.93 25094 0 1893.67 41437 5908

>01-Feb-97 1975.93 25094 0 1918.44 36870 4567

>17-Jan-97 1975.93 25094 0 1887.97 34447 2423

>31-Dec-96 1974.96 25079 15 1853.69 30422 4025

>14-Dec-96 1972.03 25077 2 1830.79 24432 5990 141.24

>18-Nov-96 2006.16 24927 150 1908.44 17173 7259

>17-Sep-96 1903.77 19046 5881 1908.44 17173 0

>26-Aug-96 1921.30 17759 1287 1908.44 17173 0 12.86

>29-Jul-96 1933.81 17209 550 1894.14 17013 160

>15-Jul-96 1933.81 17209 0 1825.27 15982 1031 108.54

>03-Jul-96 1919.74 16789 420 1824.66 14093 1889

>17-Jun-96 1971.99 15376 1413 1824.66 14093 0

>02-Jun-96 1943.78 13372 2004 1824.66 14093 0 119.12

>11-Mar-96 1946.17 7559 5813 1816.86 9117 4976

>22-Feb-96 1946.17 7559 0 1828.68 5246 3871 117.49

>21-Jan-96 1984.16 5661 1898 1901.31 3007 2239 82.85

>

> totals 23887 60509

> games missed 5661 3007

> average rating difference 93.82

> std dev of rating difference 42.24

>

> The average difference in FIBS ratings over these seven rating

>periods was 93.82 rating points. The standard deviation was 42.24

>rating points. The result is almost a two standard deviation result.

>On the surface, the data shows that it is quite likely that JF level-7

>(jellyfish on FIBS) plays better against the typical FIBS opponent

>than does JF level-5.

I believe you can make a stronger conclusion than this. That standard

deviation is _your estimate of the population standard deviation_,

but we want to know _the error you expect in your measurement of

the population mean_. Since you made several "independent" (not quite

independent; see below) samples to arrive at your mean, you have been

able to use 6 degrees of freedom in your result and hence reduce the

variance by a factor of 6 (ie. the standard error by a factor of

sqrt(6)). Your data show JF 7 to be nearly 6 sds stronger than JF 5,

and are overwhelming support of your hypothesis. ("Quite" and "very"

are resonable adjectives for results significant to one and two

standard deviations... I run out of hyperboles before I get to six :-)

Personally, I would be reluctant to assume that samples that may be as

close as 100 experience points apart are independent. (Refer to an

earlier article of mine arguing that the "half life" of FIBS rating

points is of the order of 200 experience.) I made another analysis

measuring the two populations (L7 and L5) separately at intervals of

at least 400 experience and then compared the results (ie. I calculated

the difference of the means; Chuck computed the mean difference) but

the eventual conclusion was very similar to Chuck's so I won't bother

repeating it here. My computation shows the (individual) population

standard deviations to be considerably larger than Chuck's measurement,

but my results used greater degrees of freedom so overall our standard

errors were about the same.

Now comes the hard part: reconciling the fact that Chuck and Brian's

experiments yield (apparently) incompatible conclusions. Were either

or both experiments performed incorrectly? Are the data that have been

presented honest and accurate? Let me add that personally I have every

faith in Chuck and Brian's ability and honesty (I know you find judging

articles by the reputation of the author rather than the quality of the

reasoning to be distasteful, Chuck, but bear with me for a while :-) -- I

am willing to accept both sets of results and conclusions as they stand.

Allow me to reword Chuck and Brian's conclusions to see if they really

are incompatible. Chuck finds that JF7 is stronger than JF5 by 94 +/-

34 FIBS rating points (2 sd); Brian finds that JF7 is equal to JF5 (my

interpretation); assuming 1000 money games against each level, the 2 sd

confidence interval is 0.0 +/- 0.27 points per game. (The justification

for this result: the standard deviation of a single money game is

approximately 3 points; therefore the standard deviation after 1000

games is 95 points, or 0.095 points per game. The standard deviation

in the difference between two of these quantities is 0.134 points per

game; therefore the 2 sd confidence interval is +/- 0.27.)

How do we convert between FIBS ratings and expected points per game?

To the best of my knowledge this is an open question. However, here's

a simple model. Assume 1-point matches are being played on FIBS.

FIBS expects that between players ranked 94 +/- 34 points apart (as

Chuck found JF 7 to be above JF 5), the favourite will win 52.7% +/-

1.0 of the games. If we assume this constant factor is also correct

in money games, and assume a win in a money game is worth 2 points on

average (see my other article for justification), then this 2.7% +/-

1.0 CPW is worth 0.108 +/- 0.04 points per game. So, the results of

the two experiments are:

Chuck: 0.108 +/- 0.04

Brian: 0.000 +/- 0.27

Note that Brian's confidence interval INCLUDES Chuck's! The conclusions

do not disagree after all! My interpretation is that JF 7 is a little

stronger than JF 5 (by about 2.7% CPW, or 0.11 money ppg) -- this is

only a slight advantage, and NOT significant enough to be detected by

even 1,000 money games (as Brian found).

> OK, now its your turn, advocates of the opposite opinion. Please

>present some data.

Well, I'm not advocating any opinion, and I don't have any new data, but

I hope both kinds of advocates will accept this sort of statistical olive

branch, and agree that JF 7 appears slightly stronger than JF 5, by about

0.11ppg.

Cheers,

Gary (GaryW on FIBS).

--

Gary Wong, Department of Computer Science, University of Arizona

ga...@cs.arizona.edu http://www.cs.arizona.edu/~gary/

Aug 21, 1998, 3:00:00â€¯AM8/21/98

to

In article <wt90kjf...@brigantine.CS.Arizona.EDU>,

Gary Wong <ga...@cs.arizona.edu> wrote:

Gary Wong <ga...@cs.arizona.edu> wrote:

(snip)

Well, there Gary goes again. Instead of just snowing us with his

opinion, he's got to blow us away with statistics! I've read this twice,

and must admit I don't understand it completely, but what I do understand

I can't find fault with. Nice work, Gary. (But I reserve the right to

rescind this compliment if someone shows that it is all a bunch of smoke....

Now, if you had contradicted me, I'm SURE I could have found all kinds of

errors. ;)

There are a couple numbers which surprise me just a bit: 100 point

rating difference only gives a 53-47 edge in a one-point match. I looked

at Kevin Bastian's nice writeup on the FIBS rating formula at:

http://www.northcoast.com/~mccool/fibsrate.html

and, sure enough, that's what the formula says. Thought I had you there,

Gary....

The other surprise (to me) is related: 93 point ratings difference is

only worth about 0.1 ppg at money play. I remember talking about this kind

of thing with David Montgomery a while back. I seem to recall that he had

a different correlation between ratings difference and money play advantage,

so I'm cc'ing him in hopes that he will elaborate.

Also, I suspect that Fredrik has pitted JF-5 vs. JF-7 at one time or

another. I asked him in an e-mail if he would post some results on this

if he has it handy. I'm still hopeful he will comment. It would be

interesting to see how much Gary's prediction (~0.1 ppg at money play)

compares to real life numbers. It may say something about the ratings

formula (but then again, maybe not...).

Aug 24, 1998, 3:00:00â€¯AM8/24/98

to

In article <6rkflr$p6q$1...@flotsam.uits.indiana.edu>,

bo...@bigbang.astro.indiana.edu (Chuck Bower) wrote:

> In article <wt90kjf...@brigantine.CS.Arizona.EDU>,

> Gary Wong <ga...@cs.arizona.edu> wrote:

bo...@bigbang.astro.indiana.edu (Chuck Bower) wrote:

> In article <wt90kjf...@brigantine.CS.Arizona.EDU>,

> Gary Wong <ga...@cs.arizona.edu> wrote:

> >How do we convert between FIBS ratings and expected points per game?

> >To the best of my knowledge this is an open question. However, here's

> >a simple model. Assume 1-point matches are being played on FIBS.

> >FIBS expects that between players ranked 94 +/- 34 points apart (as

> >Chuck found JF 7 to be above JF 5), the favourite will win 52.7% +/-

> >1.0 of the games. If we assume this constant factor is also correct

> >in money games, and assume a win in a money game is worth 2 points on

> >average (see my other article for justification), then this 2.7% +/-

> >1.0 CPW is worth 0.108 +/- 0.04 points per game.

> >To the best of my knowledge this is an open question. However, here's

> >a simple model. Assume 1-point matches are being played on FIBS.

> >FIBS expects that between players ranked 94 +/- 34 points apart (as

> >Chuck found JF 7 to be above JF 5), the favourite will win 52.7% +/-

> >1.0 of the games. If we assume this constant factor is also correct

> >in money games, and assume a win in a money game is worth 2 points on

> >average (see my other article for justification), then this 2.7% +/-

> >1.0 CPW is worth 0.108 +/- 0.04 points per game.

Chuck wrote:

> The other surprise (to me) is related: 93 point ratings difference is

> only worth about 0.1 ppg at money play. I remember talking about this kind

> of thing with David Montgomery a while back. I seem to recall that he had

> a different correlation between ratings difference and money play advantage,

> so I'm cc'ing him in hopes that he will elaborate.

I looked at this a few months ago, but I used a much more complicated

model than Gary's. Instead of just going with the one point match

win percent, here is what I did:

- set a probability distribution for the points won for each player,

when they win. (for example: each player might win 1 point 38%,

2 points 38%, 4 points 20%, 6 points .5%, 8 points 2.5%,

and 16 points 1%)

- set a probability for how likely it is one player will beat the other.

- play a "long" match between these two players, assuming that the

results for each game will follow the money distribution until the

players get "close" to the end of the match.

- at this point, settle the match using a match equity table.

(An important refinement: use a skill-adjusted match equity table,

like those in _Can a Fish Taste Twice as Good_.)

- repeat this many times and determine overall match winning chances

for the two players

Based on the match winning chances, it is easy to get the rating

difference. Based on the probabilities you set, you have the money

points per game.

I did this for a lot of different probability distributions and

edges in probability of winning, along with a lot of different

definitions of "long" and "close" above.

The overall results are:

A rating difference of 40-50 points corresponds to about a .10ppg

money edge.

The key assumption is that play is like money until you get "close"

to the end of the match. This is pretty true most of the time.

When the score gets real lopsided, it's not. Also, there might be

some changes in the low frequency distributions (8-point and 16-point

wins) even fairly early in a match. I don't think this swings

much.

*Much* more important are many other factors. Some money players

don't play matches. And vice versa. Certain styles of play are

better suited to money or matches. And so forth. So this

result, even if valid, is *only an approximate rule of thumb*.

Data from my own real-life money play conforms to this rule. I think

I have about a 150 point rating edge over my average local opponent,

based on watch FIBS ratings go up and down, and my long-term money

result is about +.30ppg.

David Montgomery

mo...@cs.umd.edu

monty on FIBS

-----== Posted via Deja News, The Leader in Internet Discussion ==-----

http://www.dejanews.com/rg_mkgrp.xp Create Your Own Free Member Forum

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu