Borrowing a page from Paul Epstein's book

Timothy Chow

unread,

May 24, 2023, 12:29:44 PM5/24/23

to

Paul has more than once suggested that it might be nice to
take into account not just a person's choice of play, but how
confident the person is in the play. Here's an idea for how
to score the Othello quiz in a way that partially takes into
account your confidence.

For each problem in the quiz, you may, in addition to selecting
a play, optionally declare, "I am confident my play is correct."

If you make such a declaration and your play is indeed correct,
then you score 2 points. But if you make such a declaration and
your play is incorrect, then you score -2 points.

If you do not declare that you are confident, then you score
1 point for a correct play and 0 points for an incorrect play
(or no play at all).

In effect, this scoring system allows you to offer 2 to 1 odds
that your play is correct. That is, you should declare, "I am
confident" if you believe your chances of being correct are at
least 2/3; otherwise, you should remain silent.

To get a feeling for this scoring system, let's consider some
examples. Suppose someone indiscriminately declares, "I am
confident" for every problem. This strategy will boost the
contestant's score if the contestant gets at least 7 out of 10
problems right, but will decrease the contestant's score
otherwise. So weaker contestants cannot trivially boost their
scores this way.

On the other hand, someone who only gets 3 problems right but
correctly declares "I am confident" for those 3 problems and no
others will get credit for that confidence, and will score as
well as someone who gets 6 problems right but isn't confident
about any problems.

---
Tim Chow

MK

unread,

May 24, 2023, 8:06:53 PM5/24/23

to

On May 24, 2023 at 6:29:44 AM UTC-6, Timothy Chow wrote:

> your play is incorrect, then you score -2 points.
>
> If you do not declare that you are confident,
> then you score 1 point for a correct play and
> 0 points for an incorrect play (or no play at all).

0 for no play is okay but correct/incorrect
plays should be +1/-1 similar to +2/-2

MK

peps...@gmail.com

unread,

May 24, 2023, 10:55:52 PM5/24/23

to

This is a simpler scheme, and might be more practical
because there are already clear precedents with many
academic multiple-choice exams having negative grading.

I think Tim's idea was to enable a solver to say "I think
X is the play but I'm not confident."
The 1/0/-1 system doesn't really cater to that.
But I think the 1/0/-1 system is a better suggestion.
It decreases the luck in the test because it discourages
wild guessing.

Paul

Simon Woodhead

unread,

May 25, 2023, 1:09:27 AM5/25/23

to

Rather than right or wrong, the /size/ of the error is what is
most interesting (to me at least).

MK

unread,

May 25, 2023, 1:24:39 AM5/25/23

to

On May 24, 2023 at 4:55:52 PM UTC-6, peps...@gmail.com wrote:

> On Wednesday, May 24, 2023 at 9:06:53 PM UTC+1, MK wrote:

>> 0 for no play is okay but correct/incorrect
>> plays should be +1/-1 similar to +2/-2

> I think Tim's idea was to enable a solver to
> say "I think X is the play but I'm not confident."

I understood that. His scoring system is unfair
regardless of what his inital idea/intention was.

If you can't say it for fear of hurting his feelings,
you're contibuting to RGB incorrectly/negatively.

MK

MK

unread,

May 25, 2023, 2:15:04 AM5/25/23

to

On May 24, 2023 at 7:09:27 PM UTC-6, Simon Woodhead wrote:

> Rather than right or wrong, the /size/ of the
> error is what is most interesting (to me at least).

I agree in principle but we would need better
bots with consistently more accurate rollouts
for that kind of mesuring/scoring.

I was just thinking about posting an article by
likening the current bots to accordions, biased
rollouts to bellows and the estimated equities
to the folds of the bellow of an accordion.

The maximum size of the "error" expands and
contracts on multiple axes at once, while the
distances between intermediate error values
also expand and contract (but not proportionally),
based on even things like for how many trials a
rollout is done.

Until we have unbiased AI bots, it's as absurd
to take quizzes, do rollouts, compare ERs/PRs,
etc. as walking around measuring things with
a rubber tape...

MK

peps...@gmail.com

unread,

May 25, 2023, 11:41:59 AM5/25/23

to

On Thursday, May 25, 2023 at 2:09:27 AM UTC+1, Simon Woodhead wrote:
> Rather than right or wrong, the /size/ of the error is what is
> most interesting (to me at least).

Another brilliant point.
We should perhaps create an advertising campaign along the lines of:
"rec.games.backgammon --- simply the wisest voices on the web!"

This idea has been discussed before, with Tim being a major participant.
It suggests (to me, anyway) the idea of scoring Othello answers by lost equity,
rather than the number of correct replies.
However, the problem with this is that this lost equity is very hard to ascertain,
and so scores will oscillate according to changes in bot technology and rollout settings etc.

This raises the question: Has it ever happened that improvements in backgammon understanding
or rollout settings have changed the consensus on an Othello answer?

If so, what happens then?

Paul

Timothy Chow

unread,

May 25, 2023, 1:02:46 PM5/25/23

to

On 5/24/2023 6:55 PM, peps...@gmail.com wrote:
> I think Tim's idea was to enable a solver to say "I think
> X is the play but I'm not confident."
> The 1/0/-1 system doesn't really cater to that.
> But I think the 1/0/-1 system is a better suggestion.
> It decreases the luck in the test because it discourages
> wild guessing.

I'm not too happy with 0 for no answer because in a real game
you have to make a play. So I don't think there should be any
incentive to leave an answer blank.

In any case, if someone decides to make a play, then under my
scoring system you get

+2 for correct + confident
+1 for correct + not confident
0 for incorrect and not confident
-2 for incorrect and confident

Under the system you and Murat are proposing, you get

+2 for correct + confident
+1 for correct + not confident
-1 for incorrect and not confident
-2 for incorrect and confident

The trouble with the latter system is that if you're only 50/50
about a play then you have no disincentive to claim you're
confident. You gain 1 point if you're right and you lose 1
point if you're wrong. Under my system, you gain 1 point if
you're right and you lose 2 points if you're wrong. So you're
disincentivized to claim confidence unless you're at least
66% confident.

---
Tim Chow

Timothy Chow

unread,

May 25, 2023, 1:14:54 PM5/25/23

to

On 5/24/2023 9:09 PM, Simon Woodhead wrote:
> Rather than right or wrong, the /size/ of the error is what is
> most interesting (to me at least).

The book 'Backgammon Super Genius Quiz' scores things both ways,
but the official ranking was based on 1 for correct and 0 for
incorrect, rather than on equity. I don't entirely agree with
the justification given in the book, though, which among other
things assumes that it's not possible to come up with enough
discriminatory problems with large equity differences. (The
Othello quiz is full of counterexamples, after all!)

There is a lot to be said for scoring based on the size of the
error, but there are two problems I see with that.

1. It means that for match play, you pretty much have to use EMG
to measure error size, and EMG has well-known problems.

http://www.fortuitouspress.com/emg

2. Size of error depends too sensitively on the rollout settings,
choice of bot, etc. The Othello quiz is carefully designed so that
the correct answer is robust to these variations. You can go back
to the earliest Othello quiz and use Snowie, GNU, BGBlitz, or XG,
on anything but the very weakest settings, and they will all agree
on what the top play is. But if you ask them for the size of the
errors then they may disagree rather significantly.

---
Tim Chow

Bradley K. Sherman

unread,

May 25, 2023, 1:30:58 PM5/25/23

to

Timothy Chow <tchow...@yahoo.com> wrote:
> ...
>http://www.fortuitouspress.com/emg
> ...

Nice article. But I wish people would stop using light gray (light grey,
Paul) text on white backgrounds. Paint it black!

--bks

Timothy Chow

unread,

May 25, 2023, 1:45:04 PM5/25/23

to

If one wishes to score "no answer" and "wrong answer"
differently, then here would be my proposal.

+3 for correct + confident
+2 for correct + not confident
0 for no answer (confidence is ignored)
-1 for incorrect + not confident
-3 for incorrect + confident

Another feature of the Othello quiz problems is that there
are usually many plausible options---at least 4 or 5 in most
cases, and sometimes even more than that. The above scoring
system encourages you to submit an answer as long as you
think you have at least a 1/3 chance of being correct. This
will discriminate between people who have absolutely no clue
from people who are able to narrow down the choices to no
more than 3 candidates.

A declaration of confidence earns you 1 point if you are right
and -2 points if you are wrong. So you are discouraged from
declaring confidence unless you think your chances of being
right are at least 2/3.

---
Tim Chow