Guesstimated FIBS rating gone

az-willie

unread,

Jul 19, 2003, 11:05:01 AM7/19/03

to

When I play a match now with the latest version I have downloaded
instead of a guesstimated FIBS rating I get this:

gnubg Bill

Relative FIBS rating -1.#] -1.#]

which, stupid me, means nothing to me.

How can I fix this to get something resembling a normal FIBS rating of
1500 or along those lines? I looked through the options and couldn't
find a setting for this.

Kees van den Doel

unread,

Jul 19, 2003, 7:21:11 PM7/19/03

to

In article <x6dSa.45109$R92....@news2.central.cox.net>,

I would also like the previous "abs rating" back. It seemed a very good
way to present the equity drop due to errors and seemed to be quite
accurately reflecting my actual rating on average.

This "relative rating" is not interesting unless you play against a bot
with a known rating; against a human you never know how well (s)he
played so it doesn't say anything about your play really.

Kees (Now my innocent children about 3/4 van FL5,65.- of players you
move rating difference.)

az-willie

unread,

Jul 19, 2003, 7:44:43 PM7/19/03

to

I just downloaded todays build and played a short match.

This is what the FIBS rating is showing now...

gnubg Bill

Relative FIBS rating 261.25 -261.25

This still doesn't look like normal FIBS ratings. What's up with this stuff?

Relative FIBS rating .. relative to what? Looks like zero .. rates
itself up and me down an exactly equal amount.

Very very strange.

Can we please have the old guesstimated FIBS rating back?

And while you're at it, could you have it save our FIBS rating when we
save the analysis of a game, match, or session and then figure our
AVERAGE FIBS rating also? It would help to see if we are getting any
better to be able to watch our rating progress ( hopefully ).

Thanks

jthyssen

unread,

Jul 21, 2003, 4:51:18 AM7/21/03

to

az-willie <scl...@npole.com> wrote in message news:<x6dSa.45109$R92....@news2.central.cox.net>...

I'm sorry, but there has been too much noise about the FIBS ratings. The
problem is that I chose some arbitrary linear interpolation based on the
normalised error rate per move, which lead to all sorts of strange
behaviour. Obviosuly my own fault, I should have done proper research
before I implemented it. I've removed the absolute fibs ratings until
someone (not me!) does a scientific investigation of how the fibs
ratings and relate abs. fibs ratings to any of the numbers output by
gnubg.

Following the recent discussion on rec.games.backgammon regarding luck
adjusted resulsts, you should also know that using the normalised error
rate per move (or any other error rate) as basis for the fibs rating
leads to biased ratings. The best we can do is to input the luck
adjusted result of the match into the FIBS rating formula. This gives
unbiased rating estimates. The problem is that it's only possible to
provide relative fibs ratings.

Jørn

az-willie

unread,

Jul 21, 2003, 9:31:23 AM7/21/03

to

jthyssen wrote:

============
Relative to what .. each other? Weird.

How does FIBS calculate it's FIBS rating?

Kees van den Doel

unread,

Jul 21, 2003, 12:59:10 PM7/21/03

to

In article <36775ed0.03072...@posting.google.com>,
jthyssen <j...@chem.sdu.dk> wrote:

>> Relative FIBS rating -1.#] -1.#]

>> How can I fix this to get something resembling a normal FIBS rating of

>> 1500 or along those lines? I looked through the options and couldn't
>> find a setting for this.

>I'm sorry, but there has been too much noise about the FIBS ratings. The
>problem is that I chose some arbitrary linear interpolation based on the
>normalised error rate per move, which lead to all sorts of strange
>behaviour. Obviosuly my own fault, I should have done proper research
>before I implemented it. I've removed the absolute fibs ratings until
>someone (not me!) does a scientific investigation of how the fibs
>ratings and relate abs. fibs ratings to any of the numbers output by
>gnubg.

I've always looked it at a numerical version of the "Player rating"
{Moron, ..., Extraterrestrial} or what was it. I always forget if expert
is better than advanced and a rating style number is much easier to
present the normalized error rate per move. A user inter face which
calls the user a number is also politer than one which calls the user
bad names :-)

Instead of throwing out the baby with the bathwater I think it would be
wiser to put back the absolute rating as it was but rename it "GNU
rating", to avoid obsessive nitpicking noise.

As I posted earlier, I've found the absolute ratings very consistent
with the actual ratings of players I've player withnot on FIBS, but on
Gamesite2000, which uses a rating system like FIBS but +100. It's GNUBG
based bots play usually around 2100-2200 (which is their actual rating
on gs2000), the top players there usually in the range 1900-2100 (which
is their rating) and similar for other groups.

>Following the recent discussion on rec.games.backgammon regarding luck
>adjusted resulsts, you should also know that using the normalised error
>rate per move (or any other error rate) as basis for the fibs rating
>leads to biased ratings. The best we can do is to input the luck
>adjusted result of the match into the FIBS rating formula. This gives
>unbiased rating estimates. The problem is that it's only possible to
>provide relative fibs ratings.

OK, but then you might as well discard the error rates too as they are
obviously biased too. In fact the whole analysis mode is biased so why
not remove the whole "Analyse" menu except the luck adjusted analysis?

This "unbiased variance reduction" is cute but for most practical
purposes we can assume GNUBG plays perfectly and there is no bias and
its word is gospel. It's like complaining after getting a "C" for you
physics exam graded by Albert Einstein that this "C" is biased as Albert
Einstein is not perfect.

Kees (Ah must know many moo's in zitten???)

Jørn Thyssen

unread,

Jul 21, 2003, 3:14:19 PM7/21/03

to

Kees van den Doel wrote:
> In article <36775ed0.03072...@posting.google.com>,
> jthyssen <j...@chem.sdu.dk> wrote:
>
>
>>>Relative FIBS rating -1.#] -1.#]
>>
>
>>>How can I fix this to get something resembling a normal FIBS rating of
>>>1500 or along those lines? I looked through the options and couldn't
>>>find a setting for this.
>>
>
>>I'm sorry, but there has been too much noise about the FIBS ratings. The
>>problem is that I chose some arbitrary linear interpolation based on the
>>normalised error rate per move, which lead to all sorts of strange
>>behaviour. Obviosuly my own fault, I should have done proper research
>>before I implemented it. I've removed the absolute fibs ratings until
>>someone (not me!) does a scientific investigation of how the fibs
>>ratings and relate abs. fibs ratings to any of the numbers output by
>>gnubg.
>
>
> I've always looked it at a numerical version of the "Player rating"
> {Moron, ..., Extraterrestrial} or what was it. I always forget if expert
> is better than advanced and a rating style number is much easier to
> present the normalized error rate per move. A user inter face which
> calls the user a number is also politer than one which calls the user
> bad names :-)
>
> Instead of throwing out the baby with the bathwater I think it would be
> wiser to put back the absolute rating as it was but rename it "GNU
> rating", to avoid obsessive nitpicking noise.

Could be an idea. What I don't like is that there is no rating formulae
behind the "GNU Rating". You cannot calculate how many rating points
you'd win from winning a 7 point match from an opponent rated at GNU
Rating 1900 if you're at GNU Rating 1800 yourself!

>
> As I posted earlier, I've found the absolute ratings very consistent
> with the actual ratings of players I've player withnot on FIBS, but on
> Gamesite2000, which uses a rating system like FIBS but +100. It's GNUBG
> based bots play usually around 2100-2200 (which is their actual rating
> on gs2000), the top players there usually in the range 1900-2100 (which
> is their rating) and similar for other groups.

Yes, but I would really like to see some statistical evidence.

>>Following the recent discussion on rec.games.backgammon regarding luck
>>adjusted resulsts, you should also know that using the normalised error
>>rate per move (or any other error rate) as basis for the fibs rating
>>leads to biased ratings. The best we can do is to input the luck
>>adjusted result of the match into the FIBS rating formula. This gives
>>unbiased rating estimates. The problem is that it's only possible to
>>provide relative fibs ratings.
>
>
> OK, but then you might as well discard the error rates too as they are
> obviously biased too. In fact the whole analysis mode is biased so why
> not remove the whole "Analyse" menu except the luck adjusted analysis?

Yes and no!

I actually think that the overall rating (which you apparently dislike)
should be based on the luck adjusted analysis rather than the error
analysis.

The error analysis is still useful to find moves where the bot disagree,
and as you indicate below, for most people this is sufficient since they
play much worse than gnubg (including myself).

>
> This "unbiased variance reduction" is cute but for most practical
> purposes we can assume GNUBG plays perfectly and there is no bias and
> its word is gospel. It's like complaining after getting a "C" for you
> physics exam graded by Albert Einstein that this "C" is biased as Albert
> Einstein is not perfect.

Yes, but what if he was grading Niels Bohr?

Jørn

Massimiliano Maini

unread,

Jul 22, 2003, 3:51:17 AM7/22/03

to

> > This "unbiased variance reduction" is cute but for most practical
> > purposes we can assume GNUBG plays perfectly and there is no bias and
> > its word is gospel. It's like complaining after getting a "C" for you
> > physics exam graded by Albert Einstein that this "C" is biased as Albert
> > Einstein is not perfect.
>
> Yes, but what if he was grading Niels Bohr?

Well, I guess Niels Bohr would not care too much about being evaluated
by Albert Einstein (admitting he would accept to be evaluated at all).

Personally, I'd like to have the old (and funny) guestimated FIBS
rating back in the output (along with the recent and
probably-theoretically-more-correct relative fibs rating based on luck
adjusted result).

A very strong player will just ignore it, especially when it rates him
below 1900 :))

MaX.

jthyssen

unread,

Jul 23, 2003, 3:21:21 AM7/23/03

to

Massimili...@space.alcatel.fr (Massimiliano Maini) wrote in message news:<97e59e7a.03072...@posting.google.com>...

> > > This "unbiased variance reduction" is cute but for most practical
> > > purposes we can assume GNUBG plays perfectly and there is no bias and
> > > its word is gospel. It's like complaining after getting a "C" for you
> > > physics exam graded by Albert Einstein that this "C" is biased as Albert
> > > Einstein is not perfect.
> >
> > Yes, but what if he was grading Niels Bohr?
>
> Well, I guess Niels Bohr would not care too much about being evaluated
> by Albert Einstein (admitting he would accept to be evaluated at all).
>
> Personally, I'd like to have the old (and funny) guestimated FIBS
> rating back in the output (along with the recent and
> probably-theoretically-more-correct relative fibs rating based on luck
> adjusted result).

The problem is that the two rating estimates will be inconsistent,
e.g., the two absolute ratings could be 1800 and 1900, but the
difference is estimated to be 200.... Note that it's not possible to
"syncronise" the two, since they're based on different numbers (error
rate and luck adjusted result).

A long time ago gnubg also produced two rating estimates which were
inconsistent, and I got bombed with mails asking why there was a
difference, so I'm not opening that can of worms again!

I'm happy to reinstate a "gnubg rating" if someone produces some
material that relates normalised error rates per move and match length
with some estimate of winning the match.

Jørn

Albert Silver

unread,

Jul 23, 2003, 9:13:50 AM7/23/03

to

j...@chem.sdu.dk (jthyssen) wrote in message news:<36775ed0.03072...@posting.google.com>...

Although anyone can keep a record of their FIBS matches and send them,
this might be slow to collect enough material. Why not ask Heled for
his log of GNU 0-ply games. This way GNU 2-ply would be able to
analyze the errors of its slightly more error prone sibling and give
some info on its relative rating. Watch out for the rating
manipulators such as Nihilist though, as that will skew the results.

Albert Silver

az-willie

unread,

Jul 23, 2003, 9:43:10 AM7/23/03

to

jthyssen wrote:

================
Dumb question ... how does FIBS calculate the FIBS rating?

Couldn't you do the same?

Albert Silver

unread,

Jul 23, 2003, 1:14:13 PM7/23/03

to

az-willie <scl...@npole.com> wrote in message news:<OhwTa.61030$R92....@news2.central.cox.net>...

Remember that FIBS ratings aren't even remotely accurate until a few
hundred games have been played, and everyone starts with 1500. GNU is
estimating its rating performance based on the error rates of the
players, which is very different. One concern is that it is entirely
based on *its* perception of your play. So it will always give itself
a perfect rating, in this case 2200, and thus bias the quality of your
play according to how much you played like GNU. I truly understand his
concern but still must add that even though I realize the caveats, I
enjoyed the feature. Shooting for a high rating performance was a
motivational factor even if I knew it was far from perfect. If I saw
that I had played over 1950 in its eyes, I felt I had done a decent
job. That score no doubt changes from person to person, but it was
still fun. My only problem was that I felt that its 2200 "perfect"
rating is too high. I believe that it would average about 2050 on
FIBS, even if it could peak at a higher rating, so I asked that its
FIBS ratings be adjusted downwards by 150 points.

I don't feel its much different from its grades to be honest. It used
to be more generous in its grades and achieveing an Expert or World
Class rating was not too hard. This was tightened up though, and it is
now tougher in its ratings (Expert, World Class, etc.) than Snowie 4.
I liked the FIBS guestimate just fine, but wanted it too to be a bit
tighter in the ratings.

Albert Silver

s.w.a....@hccnet.nl

unread,

Jul 23, 2003, 1:15:07 PM7/23/03

to

On 23 Jul 2003 00:21:21 -0700, j...@chem.sdu.dk (jthyssen) wrote:

>I'm happy to reinstate a "gnubg rating" if someone produces some
>material that relates normalised error rates per move and match length
>with some estimate of winning the match.
>>
>>

But, forgive me if I am wrong, I think you provide that already as
"mwc against current opponent"....only in a different format....or ?

What could be a fun addition (I am not sure if it is anything more
than that) is a sort of personalized rating a la Fibs (or ABS, or
whatever) that a user can build up during playing against GNU over
longer periods of time...not based on the quality of the user's play
per se, but on the outcome of games/matches....isn't that what this
sort of rating is about ?

peter

Kees van den Doel

unread,

Jul 23, 2003, 1:32:32 PM7/23/03

to

In article <36775ed0.03072...@posting.google.com>,
jthyssen <j...@chem.sdu.dk> wrote:

>The problem is that the two rating estimates will be inconsistent,
>e.g., the two absolute ratings could be 1800 and 1900, but the
>difference is estimated to be 200.... Note that it's not possible to
>"syncronise" the two, since they're based on different numbers (error
>rate and luck adjusted result).

I thought this would be possible by picking a monotonically increasing
function f() (which used to be your linear guess) and defining absolute
rating as f("norm. err. rate. per move") with the consistency contraint.

However often I get a funny results: The luck adjusted rating difference
rates A lower than B, yet the error rate of A was higher than the error
rate of B. This can't be right.

>I'm happy to reinstate a "gnubg rating" if someone produces some
>material that relates normalised error rates per move and match length
>with some estimate of winning the match.

IMAO the best way to do it would be to let gnubg play itself on a number
of settings, one of which is considered "perfect", the others simulate
lower rated players (i.e. using noise). For example

2ply - 0 ply
0ply - 0 ply noise .02
0ply - 0 ply noise .04
0ply - 0 ply noise .06
0ply - 0 ply noise .08

For each pair we would have to play a large number of games and
calculate the resulting ratings in the FIBS manner (perhaps even luck
adjusted). GNUBG 2 ply we know is roughly 2100 on FIBS so let's define
its rating to be 2100. We can just fix it and just adjust the right
column's rating after every match in the set. After the first set we
compute 0ply rating and for speed reasons then pair the noisy bots with
0ply instead of 2 ply. For each pair we also need to measure the
average norm. error rate per move as measured by gnu 2ply.

This will give us 6 points of the map norm err. rate p move --> rating,
which you can interpolate for a good rating output.

If all of this could somehow be automated and scripted I'd be happy to
run some processes on my machine to generate the data.

Kees (Peter R to compute ratingepm except according unto you like!)

az-willie

unread,

Jul 23, 2003, 2:01:42 PM7/23/03

to

Not being a mathematical genius and not being one who memorizes METS and
tables of probablilities etc. etc. ( how do people ever finish a game
if they have to keep track of all that ), I just play for the fun of it
and some competition.

I have been using gnubg in tutor mode with analysis and evaluation on
supremo and having it warn me only about very bad moves.

I have gradually gone from being warned about every other move to maybe
once or twice in a 7 point match. If I ever get to where I get no more
warnings for awhile I will set it to warn me about bad moves. I figure
this is a fairly good way to improve over time without getting too
discouraged by being called stupid by the machine all the time < smile>.

But I like having some form of numerical evaluation rating for each game
or match, and would love to have an average of that rating, I find it
helpful in seeing how I'm doing.

I suspect 90% of gnubg users are in somewhat the same category. I doubt
very many of us are calculating our error moves to .00014 etc. etc.

That's for the computer to do and just tell us when we are screwing up.

jthyssen

unread,

Jul 24, 2003, 3:11:25 AM7/24/03

to

s.w.a....@hccnet.nl wrote in message news:<qigthv0dtp6lt6dmb...@4ax.com>...

> On 23 Jul 2003 00:21:21 -0700, j...@chem.sdu.dk (jthyssen) wrote:
>
> >I'm happy to reinstate a "gnubg rating" if someone produces some
> >material that relates normalised error rates per move and match length
> >with some estimate of winning the match.
> >>
> >>
> But, forgive me if I am wrong, I think you provide that already as
> "mwc against current opponent"....only in a different format....or ?

The "mwc against current opponent" is based on the _unnormalised_
error rate whereas the FIBS rating is based on the _normalised_ error
rate.

You'll easily find cases where you're declared the favourite based on
the _unnormalised_ error rate but your opponent has a lower
_normalised_ error rate per move.

The _normalised_ error rate per move has no simple relationship with
the _unnormalised_ error rate, so we can't use it...

Jørn

jthyssen

unread,

Jul 24, 2003, 3:23:02 AM7/24/03

to

kvan...@xs1.xs4all.nl (Kees van den Doel) wrote in message news:<3f1ec6b0$0$61668$e4fe...@dreader3.news.xs4all.nl>...

> In article <36775ed0.03072...@posting.google.com>,
> jthyssen <j...@chem.sdu.dk> wrote:
>
> >The problem is that the two rating estimates will be inconsistent,
> >e.g., the two absolute ratings could be 1800 and 1900, but the
> >difference is estimated to be 200.... Note that it's not possible to
> >"syncronise" the two, since they're based on different numbers (error
> >rate and luck adjusted result).
>
> I thought this would be possible by picking a monotonically increasing
> function f() (which used to be your linear guess) and defining absolute
> rating as f("norm. err. rate. per move") with the consistency contraint.

You can't make this hold for any match played -- you can only do a
function fit. So for any specific match you'll in general find that
the two rating estimates disagree.

>
> However often I get a funny results: The luck adjusted rating difference
> rates A lower than B, yet the error rate of A was higher than the error
> rate of B. This can't be right.

Why? Suppose B plays gnubg-style, but player A is really the better
player and "sees" moves the gnubg disfavours. The result is that the
luck adjusted result gets A as a favourite even though A seems to has
a higher error rate than B. This is precisely what is meant by the
luck adjustement *not* being biased!

Also note that the luck analysis is usually done on 0-ply, which may
not produce so accurate luck estimates. Olivier Riordan's last
postings indicate that 1-ply is a bit better, and if you do a 2-ply
error analysis anyway, the 1-ply luck analysis is cheap (almost free).

> >I'm happy to reinstate a "gnubg rating" if someone produces some
> >material that relates normalised error rates per move and match length
> >with some estimate of winning the match.
>
> IMAO the best way to do it would be to let gnubg play itself on a number
> of settings, one of which is considered "perfect", the others simulate
> lower rated players (i.e. using noise). For example
>
> 2ply - 0 ply
> 0ply - 0 ply noise .02
> 0ply - 0 ply noise .04
> 0ply - 0 ply noise .06
> 0ply - 0 ply noise .08
>
> For each pair we would have to play a large number of games and
> calculate the resulting ratings in the FIBS manner (perhaps even luck
> adjusted). GNUBG 2 ply we know is roughly 2100 on FIBS so let's define
> its rating to be 2100. We can just fix it and just adjust the right
> column's rating after every match in the set. After the first set we
> compute 0ply rating and for speed reasons then pair the noisy bots with
> 0ply instead of 2 ply. For each pair we also need to measure the
> average norm. error rate per move as measured by gnu 2ply.
>
> This will give us 6 points of the map norm err. rate p move --> rating,
> which you can interpolate for a good rating output.

One of the problems with my suggestion is that we have to redo the
analysis everytime we release a new better neural net...

Another problem is that the fit only works for this specific analysis
setting. If you have an error rate of x on a 0-ply analysis this
surely corresponds to another rating than if you had an error rate of
x on 2-ply. But let's ignore that.

Another possibility is to analyse a huge number of matches from FIBS
between players with known ratings. This should get rid of the
assumption that gnubg 2-ply has a fixed rating of 2100.

Jørn

Kees van den Doel

unread,

Jul 24, 2003, 11:26:14 AM7/24/03

to

In article <36775ed0.03072...@posting.google.com>,
jthyssen <j...@chem.sdu.dk> wrote:

>> However often I get a funny results: The luck adjusted rating difference
>> rates A lower than B, yet the error rate of A was higher than the error
>> rate of B. This can't be right.

>Why? Suppose B plays gnubg-style, but player A is really the better
>player and "sees" moves the gnubg disfavours. The result is that the
>luck adjusted result gets A as a favourite even though A seems to has
>a higher error rate than B. This is precisely what is meant by the
>luck adjustement *not* being biased!

I would believe that from a large number of games, but not from a single
game (where it occurred). The information is just not there.

>Also note that the luck analysis is usually done on 0-ply, which may
>not produce so accurate luck estimates.

That must be the reason. If I redo analysis also at 0 ply the discrepancy
is not there.

>> >I'm happy to reinstate a "gnubg rating" if someone produces some
>> >material that relates normalised error rates per move and match length
>> >with some estimate of winning the match.
>>
>> IMAO the best way to do it would be to let gnubg play itself on a number
>> of settings, one of which is considered "perfect", the others simulate
>> lower rated players (i.e. using noise). For example
>>
>> 2ply - 0 ply
>> 0ply - 0 ply noise .02
>> 0ply - 0 ply noise .04
>> 0ply - 0 ply noise .06
>> 0ply - 0 ply noise .08
>>
>> For each pair we would have to play a large number of games and
>> calculate the resulting ratings in the FIBS manner (perhaps even luck
>> adjusted). GNUBG 2 ply we know is roughly 2100 on FIBS so let's define
>> its rating to be 2100. We can just fix it and just adjust the right
>> column's rating after every match in the set. After the first set we
>> compute 0ply rating and for speed reasons then pair the noisy bots with
>> 0ply instead of 2 ply. For each pair we also need to measure the
>> average norm. error rate per move as measured by gnu 2ply.
>>
>> This will give us 6 points of the map norm err. rate p move --> rating,
>> which you can interpolate for a good rating output.

>One of the problems with my suggestion is that we have to redo the
>analysis everytime we release a new better neural net...

If we assume gnubg 2 play plays perfectly the results are generally
valid. In other words the error in the results will be proportional to
gnubg's imperfection. I think the assumption this is very small is very
valid.

>Another problem is that the fit only works for this specific analysis
>setting. If you have an error rate of x on a 0-ply analysis this
>surely corresponds to another rating than if you had an error rate of
>x on 2-ply. But let's ignore that.

I repeat, the working assumption is gnubg 2ply plays perfectly.

>Another possibility is to analyse a huge number of matches from FIBS
>between players with known ratings. This should get rid of the
>assumption that gnubg 2-ply has a fixed rating of 2100.

I think that data will be very noisy and your result will be biased to
FIBS style players, which one could argue are more volatile than usual.
Regard the assumption that gnubg 2-ply has a fixed rating of 2100 as a
DEFINITION instead of an assumption. The whole analysis mode will be
used by people under the assumption of gnubg 2ply perfect play, so its
rating should be fixed. (4 ply would be better but impractical, I think
the rating difference between 2ply and 4 ply will be negligible small
anyways.)

I think my solution is cleaner and easier to execute. An error analysis
of a huge set of FIBS matches would of course be very very interesting.

jthyssen

unread,

Jul 25, 2003, 5:14:11 AM7/25/03

to

kvan...@xs1.xs4all.nl (Kees van den Doel) wrote in message news:<3f1ffa96$0$145$e4fe...@dreader7.news.xs4all.nl>...

> In article <36775ed0.03072...@posting.google.com>,
> jthyssen <j...@chem.sdu.dk> wrote:
>
> >> However often I get a funny results: The luck adjusted rating difference
> >> rates A lower than B, yet the error rate of A was higher than the error
> >> rate of B. This can't be right.
>
> >Why? Suppose B plays gnubg-style, but player A is really the better
> >player and "sees" moves the gnubg disfavours. The result is that the
> >luck adjusted result gets A as a favourite even though A seems to has
> >a higher error rate than B. This is precisely what is meant by the
> >luck adjustement *not* being biased!
>
> I would believe that from a large number of games, but not from a single
> game (where it occurred). The information is just not there.

Which information?

You *cannot* say that the error rate and luck adjusted will give the
same result! You may very well see cases where they disagree -- I
don't think this is "funny" results: this is exactly the differences
you would expect to see between biased and non-biased results. If
there were no difference we wouldn't be so keen on promoting the luck
adjusted results.

If you're analysing a match between two intermediate players then I'd
expect the error rate and the luck adjusted result to agree in the
sense that that agree who's the best player, but if you're analysing a
match between /really/ strong players then you cannot expect the two
results to agree!

> >Also note that the luck analysis is usually done on 0-ply, which may
> >not produce so accurate luck estimates.
>
> That must be the reason. If I redo analysis also at 0 ply the discrepancy
> is not there.

Either by chance or because you're analysing a match between players
that play much worse than gnubg.

Yes, but what happens when I analyse a match on 0-ply? The 0-ply
analysis would use the same rating formula as we've calculated for
2-ply, so now the working assumption is that 0-ply plays perfectly.

There is no way to fix this generally. Just think of match analysed
with mixed options (for example, the user may do rollouts on
individual moves).

> >Another possibility is to analyse a huge number of matches from FIBS
> >between players with known ratings. This should get rid of the
> >assumption that gnubg 2-ply has a fixed rating of 2100.
>
> I think that data will be very noisy and your result will be biased to
> FIBS style players, which one could argue are more volatile than usual.
> Regard the assumption that gnubg 2-ply has a fixed rating of 2100 as a
> DEFINITION instead of an assumption. The whole analysis mode will be
> used by people under the assumption of gnubg 2ply perfect play, so its
> rating should be fixed. (4 ply would be better but impractical, I think
> the rating difference between 2ply and 4 ply will be negligible small
> anyways.)

Yes, but you will still apply the same rating formula for 0-ply and
4-ply analysis.

> I think my solution is cleaner and easier to execute.

Yes, I agree to that.

Jørn

Kees van den Doel

unread,

Jul 25, 2003, 12:49:12 PM7/25/03

to

In article <36775ed0.03072...@posting.google.com>,
jthyssen <j...@chem.sdu.dk> wrote:

>> >> However often I get a funny results: The luck adjusted rating difference
>> >> rates A lower than B, yet the error rate of A was higher than the error
>> >> rate of B. This can't be right.

>> >Why? Suppose B plays gnubg-style, but player A is really the better
>> >player and "sees" moves the gnubg disfavours. The result is that the
>> >luck adjusted result gets A as a favourite even though A seems to has
>> >a higher error rate than B. This is precisely what is meant by the
>> >luck adjustement *not* being biased!

>> I would believe that from a large number of games, but not from a single
>> game (where it occurred). The information is just not there.

>Which information?

The information on the playing strengths of the players.

>You *cannot* say that the error rate and luck adjusted will give the
>same result! You may very well see cases where they disagree -- I
>don't think this is "funny" results: this is exactly the differences
>you would expect to see between biased and non-biased results. If
>there were no difference we wouldn't be so keen on promoting the luck
>adjusted results.

>If you're analysing a match between two intermediate players then I'd
>expect the error rate and the luck adjusted result to agree in the
>sense that that agree who's the best player, but if you're analysing a
>match between /really/ strong players then you cannot expect the two
>results to agree!

OK, but if both players come close in strength to gnubg both strength
estimates (based on L(uck) A(djusted) R(esult) or based on equity drops)
are unreliable if we are looking at a short match (like a 1 pointer).
LAR reduces the variance but it is still huge for a short match so it's
relative rating estimate is close to meaningless. (Actually can you also
included the (reduced) variance in the output, so it say someting like
"Rel. FIBS rating: -200 +/- 340",?)

>> >Also note that the luck analysis is usually done on 0-ply, which may
>> >not produce so accurate luck estimates.

>> That must be the reason. If I redo analysis also at 0 ply the discrepancy
>> is not there.

>Either by chance or because you're analysing a match between players
>that play much worse than gnubg.

They did.

Well, in order to get a good rating estimate you have to analyse at
2ply. If you analyse at 0 ply your result may be inaccurate, but that's
always true and has nothing to do with the rating output. For myself if
I analyse a weak players match, such that strength(gnu 0ply) >>
strength(players) (>> stands for "is much geater than) 0 ply is
reliable, but for stronger players that approach gnubg 0plyb in strength
the ">>" relation fails but if strength(gnu 2ply) >> strength(players)
2ply analysis will still work. If the players are so strong that this
">>" also fails, the only way to estimate their relative strength is
through LAR on a large number of games. In this case the analysis of a
single short match can't say anything really (as the gnubg is too weak
to spot their mistakes, and LAR still has too much variance).

Really the whole argumens appllies equally to the string output
"WorldClass, .., Moron" so if you want to be consistent you should
remove that feature too. (Of course I'd rather you be consistent by
putting back the "Gnestimated Pseudo Rating" instead :-)

>There is no way to fix this generally. Just think of match analysed
>with mixed options (for example, the user may do rollouts on
>individual moves).

I think you are worried about small effects here. Using 2ply (=def 2100)
as a basis with the measurements I proposed you get a realistic rating
up to say 2000. If you use rollouts (or >2ply, anything that improves on
2ply) in analysis you can go all the way up to say 2050. Strictly
speaking the norm err. rate --> rating map was deduced from 2ply
analyses and shoul be deduced from full rollouts but if we assume this
function is reasonably smooth the effect of this is negligible. I bet
you can even do all the analysis in the simulations to get this mapping
formula at 0 play and still get a very good mapping.

>> >Another possibility is to analyse a huge number of matches from FIBS
>> >between players with known ratings. This should get rid of the
>> >assumption that gnubg 2-ply has a fixed rating of 2100.

>> I think that data will be very noisy and your result will be biased to
>> FIBS style players, which one could argue are more volatile than usual.
>> Regard the assumption that gnubg 2-ply has a fixed rating of 2100 as a
>> DEFINITION instead of an assumption. The whole analysis mode will be
>> used by people under the assumption of gnubg 2ply perfect play, so its
>> rating should be fixed. (4 ply would be better but impractical, I think
>> the rating difference between 2ply and 4 ply will be negligible small
>> anyways.)

>Yes, but you will still apply the same rating formula for 0-ply and
>4-ply analysis.

There is no problem in that unless you assume the function to behave
strangely in the region 2000-2100 (2000 my guess for the 0ply rating).

>> I think my solution is cleaner and easier to execute.

>Yes, I agree to that.

In summary I think if this would be done the result is a rating
estimator which is accurate up to playing strengths close to gnubg
itself. There are not many people on earth who will come close so if you
think of user basis I think this feature will work very well.

Meanwhile I'm stuck using July 2 2003's build as I really like the FIBS
rating numbers it produces.

Kees (En iets denken, whereas setar would perform against right-angleism
the Republican works fine with doubling at esfehan in metrische
tensor.)

Jørn Thyssen

unread,

Jul 25, 2003, 1:31:12 PM7/25/03

to

Kees van den Doel wrote:

Jørn Thyssen

unread,

Jul 25, 2003, 1:55:09 PM7/25/03

to

Kees van den Doel wrote:

[snip]

> OK, but if both players come close in strength to gnubg both strength
> estimates (based on L(uck) A(djusted) R(esult) or based on equity drops)
> are unreliable if we are looking at a short match (like a 1 pointer).

Unreliable in which sense?

> LAR reduces the variance but it is still huge for a short match so it's
> relative rating estimate is close to meaningless. (Actually can you also
> included the (reduced) variance in the output, so it say someting like
> "Rel. FIBS rating: -200 +/- 340",?)

Isn't the contribution to the variance around 40,000?

[snip]

> Well, in order to get a good rating estimate you have to analyse at
> 2ply. If you analyse at 0 ply your result may be inaccurate, but that's
> always true and has nothing to do with the rating output. For myself if
> I analyse a weak players match, such that strength(gnu 0ply) >>
> strength(players) (>> stands for "is much geater than) 0 ply is
> reliable, but for stronger players that approach gnubg 0plyb in strength
> the ">>" relation fails but if strength(gnu 2ply) >> strength(players)
> 2ply analysis will still work. If the players are so strong that this
> ">>" also fails, the only way to estimate their relative strength is
> through LAR on a large number of games. In this case the analysis of a
> single short match can't say anything really (as the gnubg is too weak
> to spot their mistakes, and LAR still has too much variance).
>
> Really the whole argumens appllies equally to the string output
> "WorldClass, .., Moron" so if you want to be consistent you should
> remove that feature too.

Yes, that's one of the reasons why I wrote earlier that I'm willing to
ignore this problem.

[snip]

> In summary I think if this would be done the result is a rating
> estimator which is accurate up to playing strengths close to gnubg
> itself. There are not many people on earth who will come close so if you
> think of user basis I think this feature will work very well.

Yes, I'm awaiting your results :-)

>
> Meanwhile I'm stuck using July 2 2003's build as I really like the FIBS
> rating numbers it produces.

Sorry, this time I want to get it right rather than introducing some
random rating.

Jørn