Error rate vs. win probability

Tim Chow

unread,

Oct 24, 2016, 5:19:55 PM10/24/16

to

Murat the non-troll likes to think that he is saying something original when
he points out that error rate is an unreliable predictor of win probability.
He can be partially forgiven for thinking this since so many people who should
know better give much more credence to error rate (or performance rating) as
a measure of winning probability than is warranted.

Here's a simplified game to illustrate how two players with the same "error
rate" nevertheless perform differently against a "zero error rate" player.

The game is played on a number line. A counter, or football, is initially
placed at 0. One player is trying to get the football to +10 (or higher)
and the other player is trying to get the football to -10 (or lower).

On your turn, you begin by flipping a fair coin, and then you move 2 steps
towards your goal if the coin lands heads, and you move 2 steps away from
your goal if the coin lands tails. This doesn't end your turn, though, even
if you step past the +10 or -10 mark. You have to answer a question. There
are two options:

- You can choose a "tame" question. If you get it right, you stay put.
But if you get it wrong, you move 1 step away from your goal.

- You can choose a "wild" question. This option is available to you only
if the coin landed heads and you moved 2 steps towards your goal. If you
get the wild question right, you move 1 step towards your goal. If you
get it wrong, then you move 4 steps away from your goal (so it is as if
the coin landed tails and you didn't answer a question at all).

This ends your turn. If the football is now at +10 or higher, or at -10 or
lower, then the game ends as well.

Now let's consider three different players.

Player A always opts for a tame question, and always gets it right.

Player B always opts for a tame question, and always gets it wrong.

Player C always opts for a wild question if available, and gets it right
40% of the time and wrong 60% of the time. If the coin lands tails and C
is forced to answer a tame question, C always gets the tame question right.

On average, Player A moves the football 0 steps per turn.

On average, Player B moves the football -1 steps per turn (i.e., 1 step away
from the goal per turn).

On average, Player C also moves the football -1 steps per turn. To see this,
note that 50% of the time, C gets tails and moves -2 steps; (50%)*(60%) = 30%
of the time, C gets heads but gets the wild question wrong and moves a net
of -2 steps; and (50%)*(40%) = 20% of the time, C gets heads and gets the
wild question right and moves a net of +3 steps. So on average, C moves
(50% + 30%)*(-2) + (20%)*(+3) = -1 steps per turn.

(Note that since C is so good at answering tame questions and not so good at
answering wild questions, choosing to answer a wild question is bad strategy,
but we assume that this is just how C behaves.)

By analogy with backgammon, we can say that B and C have the same "error rate"
of 1 step per turn.

However, what happens if Player A plays Player B? And what happens if
Player A plays Player C?

I ran a simulation of 1 million "A versus B" games and 1 million "A versus C"
games. I arranged for A to move first half the time and for A's opponent to
move first half the time. I found that B won about 6.6% of the games while C
won about 8.2% of the games.

What is happening is that B never progresses more than 1 step towards the
goal each turn, whereas C is taking a gamble and trying to advance 3 steps
towards the goal on a single turn. Even though these gambles usually don't
pay off, they pay off big when they do work, and increase C's chances even
though on average C is bleeding away equity at the same rate as B.

Very roughly speaking, C's strategy is akin to the backgammon strategy of
jacking up the cube value in favorable circumstances. This should increase
your win probability relative to someone else who is bleeding away equity at
the same rate as you but who is timid with the cube.

---
Tim Chow

michae...@gmail.com

unread,

Oct 25, 2016, 6:03:57 AM10/25/16

to

There are some formulae used by bots e.g. GNUbg uses the formulae below to first calculate your abs Fibs Rating, and there's also another published formula that calculates your winning chances based on your abs. Fibs Rating.

How far away from being reliable are these formulae do you think?Are your results from playing against XG very far away?

NB. Here's an Excel Spreadsheet where you can enter your numbers at the yellow cells and get the results in the purple cells.
http://www.filedropper.com/winningpropabilities

And here are the formulae if you don;t have Excel.

FIBS abs rating in GNU

FIBSr =2050- (checker(N)*Checker_mEMG+Cube(N)*Cube_mEMG)

N=Match length
checker(N) = 8.798 + 25.526/N,
Cube(N) = 0.863 -0. 519/N.

Winning Propability
Winning prob. = 1-(1/(10^((YOU-HIM)*SQRT(ML)/2000)+1))

XG ELO = 1 / (1 + Exp(abs_Performance * 40 - 1.12)) * 2000 + 732
abs_Performance=Total mEMG/1000

Tim Chow

unread,

Oct 25, 2016, 6:08:45 PM10/25/16

to

On Tuesday, October 25, 2016 at 6:03:57 AM UTC-4, michae...@gmail.com wrote:
> How far away from being reliable are these formulae do you think?
> Are your results from playing against XG very far away?

I don't know the answer to either of these questions. I don't pay attention
to how my predicted win rate against XG measures against my actual win rate.

However, I would expect that the formulas are reasonably good on average, but
that if you (for example) picked a particular error rate and looked at players
with that error rate, there would be a range of win rates.

---
Tim Chow

Paul

unread,

Oct 26, 2016, 6:28:49 AM10/26/16

to

Tim,

Let u_k be the probability of winning, given that the player is at position (k - 10) So, for the aim-high player u_0 = 0. And the u_k numbers are different for different players.

We then have a set of 23 recursive linear equations in 23 unknowns. (The domain is from -10 to 12 which is 23 numbers. We go to 12 instead of 10 because we may need to jump back.)
Surely, solving such a small matrix equation is trivial with packages
available (though I haven't tried it). So exact numbers should be readily
attainable.

Interesting thought experiment and simulation but it ain't rock and roll to me.

Paul

mu...@compuplus.net

unread,

Oct 27, 2016, 2:19:03 AM10/27/16

to

October 24, 2016 at 3:19:55 PM UTC-6, Tim Chow wrote:

> Murat the non-troll

You never disappoint. You have lost the argument in the
other thread. So, you start a new one and try to earn
"credibility on credit" by demeaning me right from the
start, before saying anything else.

Your resorting to this after my having discussed with
you politely for the past week, only shows your lack
of character. But don't worry, I won't yet sink to your
levek of "juvenile tactfulness" by calling you "Tim the
non-cocksucker, non-asshole, non-scumbug, non-etc..."

> ... likes to think that he is saying something original

> when he points out that error rate is an unreliable
> predictor of win probability.

Don't water down my argument, which is that error rate,
perror rate, PELO, etc. are all total bullshits with no
practical value whatsoever in any context, not just win
probability!

> He can be partially forgiven for thinking this since so
> many people who should know better give much more credence
> to error rate (or performance rating) as a measure of
> winning probability than is warranted.

As I said many times, It's not a matter of degree! It's
not a matter of "more credence than warranted". It's not
a matter of "imperfection" or negligible "inaccuracy".

ER/PR/ELO are all based on bots picking the best moves
by doing mini/full rollouts.

And I have given you step-by-step intructions to prove it
to yourselves that early and especially early-cubefull
rollouts are total bullshit that don't correspond to real
world results.

Have you done your homework, little "match PHD"..??

No! Of course, not. You and your ilks are so badly treed
that you have no option but to ignore my challenges and
go on bullshitting in your little fantasy world... :(

> Here's a simplified game to illustrate how two players

Your fantasy doesn't relate to backgammon...

> Very roughly speaking, C's strategy is akin to the
> backgammon strategy of jacking up the cube value in
> favorable circumstances. This should increase your
> win probability relative to someone else who is
> bleeding away equity at the same rate as you but who
> is timid with the cube.

This is all you needed to say and it's all worthless!

If a player can defeat your so-called "cube skill" by
jacking up the cube or whatever hypothetical bullshit
you try to concoct, then the only conclusion can be
that there is no such thing as "cube skill"...!

Furthermore, you have to keep talking of players A, B
and C because you can't name real people in the BG world
to support your argument...!!

You are a pathetic moron trying to get acknowledgement
by being the ass-kisser of the cock-suckers... :((

MK

mu...@compuplus.net

unread,

Oct 27, 2016, 2:25:39 AM10/27/16

to

October 25, 2016 at 4:03:57 AM UTC-6, michae...@gmail.com wrote:

> How far away from being reliable are these formulae do you
> think?Are your results from playing against XG very far away?

Michael, this thread is a deja-vue of another one that I had
started in November 2014, in which you had participated also.

See: https://groups.google.com/d/msg/rec.games.backgammon/X4t7ixL6-rA/oEifben73dMJ

Nothing ever sticks to these "teflon assholes"... They just
rewind the tape and replay from where they were left off...

One good thing though, may be that they never have to buy
toilet paper or vaseline...

MK

michae...@gmail.com

unread,

Oct 27, 2016, 7:00:07 AM10/27/16

to

Murat I personally respect everyone's views whatever those might be.
I use the formulae GNUbg uses to check whether my win/loses results agree or not when I play Vs humans. They do agree within + - 3%!!
One of your arguments is that the cube skill is a total B**t in determining someone's overall skill.
What if I tell you that GNUbg agrees to some extend with you?
Do download my Excel sheet enter say 12 for checker error rate and 12 for cube error rate. Then have a look at loss of checker and loss of cube rating points. What is the ratio? Less than 1 to 10 ;-)

Keene

unread,

Oct 27, 2016, 11:21:59 AM10/27/16

to

I watch these forums, I read some posts, I look at the positions, but I don't typically comment.

In reading this thread, I am finding it amusing that throughout all your posts, you:

1. Declare that the measurement ratings used by the current bots (GNU, XG etc) are meaningless
2. Point out undocumented (and without doubt 'selected') evidence that you have collected through your own processes that prove your theory
3. Are unwilling to play anyone straight up for money in a public environment
4. Are against the gambling aspect of backgammon - while offering to bet on yourself

And now, you are claiming there is no such thing as cube skill, yet this is the very tool that you use in order to 'prove' how the bots are wrong and PR, ER, ELO etc are meaningless.

I am not going to get into a debate over semantics, I likely won't even respond to anything you add to this. I would like to point out your inconsistencies, and also acknowledge that you are only willing to play under your specific circumstances, and not allow anyone to change your environment to adjust to their needs.

On the subject of money play in backgammon, you can be against it, that's fine, but it is an aspect of the game, and it is played differently than match play is. You should try it out, its fun. Lets see how your 'cube skill' argument holds up when you stand to lose something that matters to you based on your clearly reckless attitude on cube management. Remember, its not how much you win or lose really, because that part doesn't matter, its mostly about your reputation.

As a side note, I used to play on the play65 sites, TMG, PG, etc. I (many times) saw a player on those sites with the name Murat. I have no idea if that was you, but I do know that the stronger players on those sites couldn't get enough of that player, as he was pouring money out for fun. If it was you, I am sorry that you didn't have a better experience. If it wasn't you, then no matter.

Regarding the name calling, please stop, its unnecessary - thats for TC too. When discussing the game of backgammon, please do try to be more respectful.

And finally! As far as your claims regarding PR, ER etc go, you must surely be aware that you are using small data sets to apply ideas that belong over data sets that represent millions of games. The results you experience are in fact, expected results, in that they exist as lower probability events within your dataset expectations. You should take ALL the results (without selection), and look at that data on the whole. Your arguments in this area are strikingly similar to the intermediate player who defeats the very strong player in one or two individual games, and concludes that they are in fact very strong themselves. Puerile, non-representative, and not enough data to draw a formal and appropriate conclusion.

I do look forward to reading the responses, but as I mentioned, no more from me on this subject.

Keene

Tim Chow

unread,

Oct 27, 2016, 6:58:47 PM10/27/16

to

On Thursday, October 27, 2016 at 11:21:59 AM UTC-4, Keene wrote:
> Regarding the name calling, please stop, its unnecessary - thats for TC too.

Actually, I think it is necessary. Clearly, when Murat resorts to name calling,
it's because he recognizes that he's lost the argument and has no more content
to contribute. Given that he won't actually concede defeat in plain English,
how else is he supposed to signal his concession? The name calling is an easily
recognizable way to identify when he has nothing of substance to contribute.

As for my name-calling---the only purpose of threads involving Murat is their
entertainment value. Calling him a non-troll is an easy way to generate that
entertainment, so why not do it? It doesn't bother him. On the contrary, he
enjoys the opportunity to feel that his own name-calling is justified. Why
should I take that enjoyment away from him?

---
Tim Chow

Tim Chow

unread,

Oct 27, 2016, 7:16:52 PM10/27/16

to

On Thursday, October 27, 2016 at 2:19:03 AM UTC-4, mu...@compuplus.net wrote:
> But don't worry, I won't yet sink to your
> levek of "juvenile tactfulness" by calling you "Tim the
> non-cocksucker, non-asshole, non-scumbug, non-etc..."

That's a relief. I couldn't sleep last night because I was lying awake
worrying that you might call me a non-scumbug.

---
Tim Chow

michae...@gmail.com

unread,

Oct 28, 2016, 10:54:56 AM10/28/16

to

I've kept statistics in the last few days.According to the fibs formulae I should have won 56.71% out of the 48 7 pointers I have played. Well I won 27 of them! The formulae are damn pretty accurate!! The result fluctuated over time of course, a bit more a bit less but nothing dramatic.

Strange thing though, my total luck in all those matches is +206 mEMG and my opponent's +218. (about +5 mEmg per match) Shouldn't they both be near zero after 48 matches??

mu...@compuplus.net

unread,

Oct 29, 2016, 10:46:05 PM10/29/16

to

October 27, 2016 at 5:00:07 AM UTC-6, michae...@gmail.com wrote:

> One of your arguments is that the cube skill is a total
> B**t in determining someone's overall skill. What if I
> tell you that GNUbg agrees to some extend with you?

I would scream "Oh nooo! I'm infected too...!"

> Do download my Excel sheet enter say 12 for checker
> error rate and 12 for cube error rate. Then have a
> look at loss of checker and loss of cube rating points.
> What is the ratio? Less than 1 to 10 ;-)

I didn't understand any of this at all. So, I must still
be okay... :)

I do, however, seriously wonder how much (even if very
little) I may have become bot-like after so many years
of playing against Jellyfish, Snowie, Gnubg and XG...??

And if that change is making make me perform worse agianst
against the bots?

And, of course, if I'm playing more bot-like now against
humans also? I sure wouldn't want to hear an old friend
make that observation while playing against me now. :(

MK

mu...@compuplus.net

unread,

Oct 30, 2016, 12:44:24 AM10/30/16

to

October 27, 2016 at 9:21:59 AM UTC-6, Keene wrote:

> In reading this thread, I am finding it amusing that....

That could be because you don't understand the arguments or
you don't want to accept what you understand as a survival
mechanism.

> 1. Declare that the measurement ratings used by the
> current bots (GNU, XG etc) are meaningless

Why is this amusing? Have you followed the step-by-step
instructions I gave, in other threads, to prove this to
yourselves by inserting a "meaningless" cube decision
logic into a bot's roll-out code and see that it won't
effect, for example, the cubefull equity results of an
opening roll?

I bet you haven't because you wouldn't find it "amusing"
if you had... ;)

> 2. Point out undocumented (and without doubt 'selected')
> evidence that you have collected through your own processes
> that prove your theory

If you had trusted and looked at what I provided with an
"open mind", you might have discovered and learned something
but as they say, "you can take the horse to the water but
you can't make him drink"... So, just smile and ignore it.

> 3. Are unwilling to play anyone straight up for money in
> a public environment

Why is this amusing? I always made it clear that never
claimed to be better than other human players and that I'm
not interested in playing for money just for the money.
SO, maybe you have problems comprehending what you read
or maybe you are too good to bother even reading what I
wrote before jumping to your "amusing" conclusions..??

> 4. Are against the gambling aspect of backgammon - while
> offering to bet on yourself

You are somewhat right on this contradiction but if you
consider the history of this debate, which you probably
are ignorant of, I had started out wanting to conduct
some experiments "observed by others" but everybody acted
like their time was too valuable for that and I was asked
to backup my claims with money.

Even though it wasn't my original preference, in time, I
came to like the idea as I saw that they themselves grew
insecure and unwilling to backup their claims with money.

Personally, I find that to be "amusing"... :)

> And now, you are claiming there is no such thing as cube skill,

Just to make a minor clarification, as I have done at times in
the past, I don't mean no cube skill at all but very little to
make a make a big deal about it.

Any game, chess, tennis, baketball. etc. can be played with
the cube but that would only bastardize and degrade that
game, instead of adding another layer of skill to it.

> yet this is the very tool that you use in order to 'prove' how
> the bots are wrong and PR, ER, ELO etc are meaningless.

You dare me to a gunfight. While you try to draw your gun and
fire at me, aiming with a six decimal bullet accuracy, I just
throw my gun at your face and know you out... ;)

How else can I defy the "cube skill" by other than using the
cube "without skill"?? Do you have a suggestion??

Maybe like this: https://www.youtube.com/watch?v=JiJiTCcWL_g

> ..... you are only willing to play under your specific

> circumstances, and not allow anyone to change your
> environment to adjust to their needs.

This is not true. Many possibilities have been considered and
discussed at length in the past but none came to realize for
various (some of mine and some of others) reasons.

> Lets see how your 'cube skill' argument holds up when you
> stand to lose something that matters to you based on your
> clearly reckless attitude on cube management.

Thank you for arguing my point that the so-called "cube skill"
is only meaningful, applicably in the restricted environment
of gambling.

If there were non-gambling "cube skill olympics" open to
reckless amateurs also, some of those so-called "giants" would
surely shrink into "midgets of cube skill"...(?:)

In the alternative, I can argue that if I were a billionaire,
I could play just as recklessly as I pleased and debunk the
"cube skill" without any interfering fears of losing money.

> Remember, its not how much you win or lose really, because
> that part doesn't matter, its mostly about your reputation.

Well, so, why are you daring me to play for money?

Is it up to you to determine wether I value my reputation as
muych as or even more than money??!!

> Regarding the name calling, please stop, its unnecessary -
> thats for TC too.

I appreciate this comment but I would have appreciated more if
you had wrote it in response to TC's post and than say "that's
for MK too"...

BTW, my this comments goes to Michael too... :)

> And finally! As far as your claims regarding PR, ER etc go,
> you must surely be aware that you are using small data sets
> to apply ideas that belong over data sets that represent
> millions of games.

I agree. Any bets or experiments involving human players will
never be in the thousands, let alone millions of games.

However, you can follow my step-by-step instructions that I
mentioned above to do as many millions of roll-outs to prove
to yourself that ER/PR/ELO/Etc. are bot biased and inaccurate
by an undeterminable amount and thus "meaningless" for having
no practical use or value, (that at least nobody has been
willing to bet money on)...!

> I do look forward to reading the responses,

I answered not for you or anyone else in particular but simply
clarify my position that "although I want bots to become
better, which I believe they eventually, inevitably will,
I don't believe the current bots are better than humans,
partially because of the cube-skill fallacy".

So, I don't mind sparing some time to contribute what I can.

> but as I mentioned, no more from me on this subject.

Why deprive the world from your wisdom...??

MK

michae...@gmail.com

unread,

Oct 30, 2016, 9:20:37 AM10/30/16

to

On Sunday, October 30, 2016 at 6:44:24 AM UTC+2, mu...@compuplus.net wrote:
>
> BTW, my this comments goes to Michael too... :)
>
>

Huh!?
You are he one who started it Murat and your insults towards Tim are way too "heavy". Even beyond the limit of a simple personal attack.
Not only that but you reveal a personal envy or inferiority complex Vs PHD holders or mathematicians.
Compare that to his one and only name that Tim used to call you which is questionable if it's not true.

mu...@compuplus.net

unread,

Oct 31, 2016, 2:21:31 AM10/31/16

to

October 30, 2016 at 7:20:37 AM UTC-6, michae...@gmail.com wrote:

> October 30, 2016 at 6:44:24 AM UTC+2, mu...@compuplus.net wrote:

>> BTW, my this comments goes to Michael too... :)

> Huh!? You are he one who started it Murat

Okay let's not waste time on who started.

> and your insults towards Tim are way too "heavy".
> Even beyond the limit of a simple personal attack.

Not even. I have never met anybody here in person and
I really don't care about anything besides what they
write in this forum.

Too bad if it's not obvious that by using an array of
exaggerated expressions in a single sentence, I am
making a mockery out of name calling... :(

Which, btw, I learned in rgb. I don't know exactly
why I like doing it and keep doing it but perhaps
because most debates degrade to a level where there
is nothing more meaningful left to say.

Do you ever watch debates about god between believer
"religious scholars" (who admit that no amount of
evidence will change their minds) and scientists (who
know that they are whistling against the wind)...(?)

> Not only that but you reveal a personal envy or
> inferiority complex Vs PHD holders or mathematicians.

Not at all. I do respect most PHD's knowledge in their
fields (especially in scienced) even if I don't like
their personalities.

But the "general credibilty and trust" people get solely
based on their being PHD, judges, catholic priests, etc.
can also be abused very easily.

I won't respect a DR who promotes an unproven cure for
baldness, even if he is a real DR and especiall if it's
outside of his specialty.

My problem with Chow is that he is trying to give false
credibility to "cube skill", "roll outs", etc. often
using inapplicable analogies and examples from other
real or imaginary games... :(

You can't measure a yard stick by itself and say it's
one yard long. Same goes for bg bots. You can't measure
a bot by itself. And science without measurements is not
science! It's bad-science, folk-science, fantasy... :(

> Compare that to his one and only name that Tim used
> to call you which is questionable if it's not true.

It's not just words. Read carefully some of the stuff
they write about me among themselves. They are like a
pack of hynenas, trying to find strength in numbers,
and they have been insulting me in every insiduous way
for almost two decades now... :((

MK

michae...@gmail.com

unread,

Oct 31, 2016, 3:23:29 PM10/31/16

to

On Monday, October 31, 2016 at 8:21:31 AM UTC+2, mu...@compuplus.net wrote:
>
> and they have been insulting me in every insiduous way
> for almost two decades now... :((
>
> MK

Sorry to hear that MK, I was not here to witness it.
You are free to express your views about backgammon, and as far as I am concerned I don't strike out the possibility of your been right.