Nolot's test vs. Spector

Steven J Edwards

unread,

Jul 29, 1994, 1:27:33 PM7/29/94

to

Last night after I produced the EPD version of the 11 position Notot
test, I fed it into Spector with the primary goal of checking my
conversion of the source material. The EPD went through the program
okay, which I verified by looking at Spector's "Search Report
Summary". The report does not give predicted variations or other
similar data, but instead gives counts of total and solved positions
along with lists of which problems fall into which categories
(solved/unsolved/mate). I designed it this way so it could easily
handle real test suites with hundreds to thousands of positions and
still fit the output on a page or two. After I had the confirmation
of the report, I posted the EPD along with the comment that a few
problems were solved.

The summary report indicated that five of the eleven problems were
solved. However, the solution given on my input for problem number
eleven may have been wrong, and so Spector's match for it may be
spurious. The other four problems were solved, not by tree search,
but by the new position database I installed a couple of weeks ago.
This is what I learned after I made the EPD posting. Spector sees no
reason to do a search if some other method will work for the given
position; this is its standard tournament practice. However, there is
no provision for the summarizer code to distinguish among problems
solved by search, database probe, or fortuitous cosmic ray influence.

-- Steven (s...@world.std.com)

Feng-Hsiung Hsu

unread,

Jul 29, 1994, 2:28:13 PM7/29/94

to

In article <Ctppu...@world.std.com> s...@world.std.com (Steven J Edwards) writes:
>The summary report indicated that five of the eleven problems were
>solved. However, the solution given on my input for problem number
>eleven may have been wrong, and so Spector's match for it may be
>spurious. The other four problems were solved, not by tree search,
>but by the new position database I installed a couple of weeks ago.

Now I am agreeing with the Luddites. This is cheating:-). DT-2 would
have "solved" 8 of the 11 positions by using the same method in one
second total:-). Seriously, what would Spector do if opponents do not play
the game moves? Aren't you getting your program into something that
it cannot comprehend?

Feng-Hsiung Hsu

unread,

Jul 29, 1994, 5:45:18 PM7/29/94

to

In article <Ctpz4...@world.std.com> s...@world.std.com (Steven J Edwards) writes:
>Perhaps we need a new concensus on what "solve" means. To me, "solve"
>means "get the right answer to a problem". The solutions Spector

How do you know it is the RIGHT answer? Just because someone else played
it does not make it right. And if the machine does not have refutations for
anything that can come up, it has not solved it. Any decent chess player
would have told you that guessing the right move for a chess quiz isn't
"solving" it. (OK, I am not a decent chess player, but...)

>Spector's database library public, perhaps Deep Thought's database
>library could be similarly uploaded for comparison and so that others
>might use it as well.

Ever heard of copyright? You are asking for the impossible. IBM would be
sued for big bucks if we do. We all should respect others' intellectual
properties. You own endgame CD-ROMs. Does that give you the right to upload
them without the author's permission?

IM Mike Valvo, whom you met at the ACM, has 700,000+ games in his database.
Do you think he can make it ALL public? Copyright again.

Steven J Edwards

unread,

Jul 29, 1994, 7:36:33 PM7/29/94

to

f...@watson.ibm.com (Feng-Hsiung Hsu) writes:

>In article <Ctpz4...@world.std.com> s...@world.std.com (Steven J Edwards) writes:
>>Perhaps we need a new concensus on what "solve" means. To me, "solve"
>>means "get the right answer to a problem". The solutions Spector

>How do you know it is the RIGHT answer? Just because someone else played
>it does not make it right. And if the machine does not have refutations for
>anything that can come up, it has not solved it. Any decent chess player
>would have told you that guessing the right move for a chess quiz isn't
>"solving" it. (OK, I am not a decent chess player, but...)

The idea is that the "right" answer is established by a consensus of
results of strong players, including programs. To help form this
consensus, the tests are made easily available so all can comment.

Spector's position database includes moves that were not just played,
but played in games that were won against strong players. It is not a
random selection.

>>Spector's database library public, perhaps Deep Thought's database
>>library could be similarly uploaded for comparison and so that others
>>might use it as well.

>Ever heard of copyright? You are asking for the impossible. IBM would be
>sued for big bucks if we do. We all should respect others' intellectual
>properties. You own endgame CD-ROMs. Does that give you the right to upload
>them without the author's permission?

I'm not now, or ever, asking anyone to divulge proprietary data. I'm
only asking for unannotated game data from events played in a public
forum. If such is not part of Deep Thought's library, than you can
forget the question.

I do not own any endgame CD-ROMs. I am building my own endgame
tablebases with my own algorithm. This is not a slight against
anyone's work; I merely want to try out my own ideas. As the
tablebases are completed, I will give them away to anyone who wants
them. They are mere calculations and not subject to copyright. If
they had annotations or some other added value representing original
work, then that would be a different story.

How can annotators' intellectual property rights be infringed when the
annotations are removed prior to any distribution? This is like
copyrighting blank space. Annotating a game at best grants ownership
of the annotations, not the game itself.

>IM Mike Valvo, whom you met at the ACM, has 700,000+ games in his database.
>Do you think he can make it ALL public? Copyright again.

If Mike Valvo or anyone else were willing to provide unannotated game
data from events played in a public forum, I'm sure many readers of
this newsgroup would be very grateful.

-- Steven (s...@world.std.com)

Hal Bogner

unread,

Jul 30, 1994, 2:10:27 AM7/30/94

to

Whoa! (Or whatever it is you call to your horse to tell it to slow down...)

In article <Ctq1r...@hawnews.watson.ibm.com> f...@watson.ibm.com (Feng-Hsiung Hsu) writes:
>In article <Ctpz4...@world.std.com> s...@world.std.com (Steven J Edwards) writes:
>>Perhaps we need a new concensus on what "solve" means. To me, "solve"
>>means "get the right answer to a problem". The solutions Spector
>
>How do you know it is the RIGHT answer? Just because someone else played
>it does not make it right. And if the machine does not have refutations for
>anything that can come up, it has not solved it. Any decent chess player
>would have told you that guessing the right move for a chess quiz isn't
>"solving" it. (OK, I am not a decent chess player, but...)
>

Good point. But I think the term you are looking for is "proof." If Steve
(or anyone else) can produce a program that, through "good judgement," plays
more strongly than a program that searches deeper, then arguing with his or
her methods won't change the results. But here, I gather that Msr. Nolot's
criteria was (or should be considered to be) "demonstrate the correct move
through searching."

>>Spector's database library public, perhaps Deep Thought's database
>>library could be similarly uploaded for comparison and so that others
>>might use it as well.
>
>Ever heard of copyright? You are asking for the impossible. IBM would be
>sued for big bucks if we do. We all should respect others' intellectual
>properties. You own endgame CD-ROMs. Does that give you the right to upload
>them without the author's permission?
>
>IM Mike Valvo, whom you met at the ACM, has 700,000+ games in his database.
>Do you think he can make it ALL public? Copyright again.

This is an over-reaction, in my opinion. The games themselves are not
copyrightable. Now, if they contain annotations, then those annotations are
copyrightable. But neither IBM nor Mike would be liable for any copyright
violations if you were to make any number of played chess games public.

This is not a call for either of you to actually do what was suggested; just a
small attempt to keep the record clear on the status of certain worries and
claims about the purported "proprietary nature" of general collections of
actual played chess games, and of all individual chess games.

-hal bogner

Steven J Edwards

unread,

Jul 29, 1994, 4:48:10 PM7/29/94

to

f...@watson.ibm.com (Feng-Hsiung Hsu) writes:

# In article <Ctppu...@world.std.com> s...@world.std.com (Steven J Edwards) writes:
# >The summary report indicated that five of the eleven problems were
# >solved. However, the solution given on my input for problem number
# >eleven may have been wrong, and so Spector's match for it may be
# >spurious. The other four problems were solved, not by tree search,
# >but by the new position database I installed a couple of weeks ago.

# Now I am agreeing with the Luddites. This is cheating:-). DT-2 would
# have "solved" 8 of the 11 positions by using the same method in one
# second total:-). Seriously, what would Spector do if opponents do not play
# the game moves? Aren't you getting your program into something that
# it cannot comprehend?

First, let me say that the summarizer program is a separate entity
from Spector; it was written some time ago and long before the
position database probe was installed into Spector. The summarizer is
implemented as a standalone program, not just because it is a better
design, but because this allows me to distribute it to other
researchers who might find it useful. As it took me some time to
write it (the current version is about seven hundred lines long, I
think), it has a potential for saving a similar amount of time for
each developer who uses it. A minor added benefit is that the search
results abstracts produced are directly comparable among different
chessplaying programs.

Now, is position database look-up cheating? The mechanism used in
Spector for this is the new, general opening library subsystem. It's
configured to be active unless explicitly deselected; I honestly
hadn't thought of of it at the time. I had just wanted to get the EPD
data validated so that others could use it.

Perhaps we need a new concensus on what "solve" means. To me, "solve"
means "get the right answer to a problem". The solutions Spector

derived for the four position database probe matches would have been
located under tounament conditions as well as on the test. Now, if
someone wants "solve" to mean something other than the above, then an
explicit redefinition is in order. We could have "solve without an
opening library", "solve without position database probe, "solve
without tablebase look-up", "solve without resorting to disk", "solve
without accessing a network", "solve without use of a government
research grant", etc.

I'm pleased to see that Deep Thought could locate eight of the eleven
test positions in its database library. This is certainly an
improvement over Spector's four out of eleven. As I have already made

Spector's database library public, perhaps Deep Thought's database
library could be similarly uploaded for comparison and so that others
might use it as well.

As to whether or not Spector, or any other program, can truly
"comprehend" anything, this is hard to measure by objective means. Is
there anyone out there who will publically claim the ability of true
comprehension for their program? This is more likely a topic for
comp.ai.philosophy, not chess. I am more interested on working on
providing data, programs, and specifications that others can use, and
that work more than fills my time.

-- Steven (s...@world.std.com)

Marc-Francois Baudot

unread,

Aug 1, 1994, 5:58:46 AM8/1/94

to

s...@world.std.com (Steven J Edwards) writes:

>Summary". The report does not give predicted variations or other
>similar data, but instead gives counts of total and solved positions
>along with lists of which problems fall into which categories

Then it is useless for tests where you must give a score and
a variation to make sure you have found the right move for
the right reason, or to understand why you have failed.

>The summary report indicated that five of the eleven problems were
>solved. However, the solution given on my input for problem number
>eleven may have been wrong, and so Spector's match for it may be
>spurious. The other four problems were solved, not by tree search,
>but by the new position database I installed a couple of weeks ago.
>This is what I learned after I made the EPD posting. Spector sees no
>reason to do a search if some other method will work for the given
>position; this is its standard tournament practice.

Agreed for tournaments, even though it can be dangerous to get into
tactical lines which are too deep for the program to handle correctly.
But it is a case of opening preparation rather than tactical skills.

>However, there is
>no provision for the summarizer code to distinguish among problems
>solved by search, database probe, or fortuitous cosmic ray influence.

It would be really useful though...

>-- Steven (s...@world.std.com)

Steven J Edwards

unread,

Aug 1, 1994, 6:23:36 AM8/1/94

to

m...@cnam.cnam.fr (Marc-Francois Baudot) writes:

# s...@world.std.com (Steven J Edwards) writes:

# >Summary". The report does not give predicted variations or other
# >similar data, but instead gives counts of total and solved positions
# >along with lists of which problems fall into which categories

# Then it is useless for tests where you must give a score and
# a variation to make sure you have found the right move for
# the right reason, or to understand why you have failed.

Perhaps, but one should ask how useful such tests are in the first
place. For better or worse, experts in human psychometry have come to
the consensus that the more questions are superior to fewer questions
when producing a test for aptitude or a personality inventory. The
nearly complete practice of such human tests rely on summarizing in
the way my statistics program works: match or fail to match. If one
is interested in measuring performace and only a finite amount of
experimentation time is available, then the obvious conclusion is to
maximize the reliability of the performance metric by increasing the
question count.

One could argue that attempting to gauge a chess program's performace
with a single sumber produced by a bunch of match/no-match results is
flawed in itself. Maybe so, but is it any different from using a
rating number handed out by a chess federation?

-- Steven (s...@world.std.com)

Marc-Francois Baudot

unread,

Aug 1, 1994, 6:50:24 AM8/1/94

to

s...@world.std.com (Steven J Edwards) writes:

>Now, is position database look-up cheating? The mechanism used in

That's not the point. Database look-up just doesn't answer the question.

>Perhaps we need a new concensus on what "solve" means. To me, "solve"
>means "get the right answer to a problem". The solutions Spector
>derived for the four position database probe matches would have been
>located under tounament conditions as well as on the test. Now, if
>someone wants "solve" to mean something other than the above, then an
>explicit redefinition is in order. We could have "solve without an

I don't think we need a new consensus or any kind of redefinition.
Database lookup is only interesting to find the games in which
the positions arose. It was obvious that Pierre Nolot was interested
in something else than that. I really don't want solve to mean what you
say in a test of a program's tactical abilities!
As for over the board play, what would you say if Spector played
the first move of the combination from its opening book, but due to an
unexpected reply from his opponent got out of book and failed to
find the right moves, and then lost the game?

Marc-Francois Baudot

Steven Rix

unread,

Aug 1, 1994, 7:09:07 AM8/1/94

to

In article <Ctpz4...@world.std.com>, s...@world.std.com (Steven J Edwards) writes:
->f...@watson.ibm.com (Feng-Hsiung Hsu) writes:
->
-># Now I am agreeing with the Luddites. This is cheating:-). DT-2 would
-># have "solved" 8 of the 11 positions by using the same method in one
-># second total:-). Seriously, what would Spector do if opponents do not play
-># the game moves? Aren't you getting your program into something that
-># it cannot comprehend?
->
->Now, is position database look-up cheating? The mechanism used in
->Spector for this is the new, general opening library subsystem. It's
->configured to be active unless explicitly deselected; I honestly
->hadn't thought of of it at the time. I had just wanted to get the EPD
->data validated so that others could use it.

The challenge is for a computer program to "solve" given positions,
by analysing each position on its merits. Of course looking up the
answer in a database is cheating: the program just knows that somebody
in this position once played this move and went on to win, possibly
because the move was good, possibly because of a later blunder from his
opponent. Now, does the program think that the given move is a mistake
because of a superficial tactical refutation, does it think that the move
wins because it has analysed everything out to checkmate, or has it just
noted that the move was once played in a game "and so it must be good"?

->Perhaps we need a new concensus on what "solve" means. To me, "solve"
->means "get the right answer to a problem".

How about "get the right answer to a problem for the right reasons"?
In a typical legal position, there may be 50 moves available, some of
which are "irrelevant". One can use a random number generator to help
choose which one to go for, but just because one happens to hit on the
correct answer doesn't mean that one has "solved" the position. Similarly,
getting the answer right because earlier you have overlooked the main try
at refutation is dangerous, because next time round there might not be a
way out...

FHH seems to be using "solve" in the sense defined above, because he
has stated that DT-2 wants to play certain moves after a certain amount
of computation, yet it only saw the full solution some time later (eg
it only then realised the significance of that pin in eight moves time).
This is the time FHH claims is necessary to solve the problem. FHH is only
happy with DT-2's analysis when he can freeze the analysis, save the
results and therein find all the main variations of the problem. Then he
knows that DT-2 has not only found the "refutations", but also the
refutations of the "refutations".

--
Steve Rix (ste...@chemeng.ed.ac.uk)
"A morbid, Edinburgh-based Chemical Engineer" - and no misprint!

Steven J Edwards

unread,

Aug 1, 1994, 7:43:32 AM8/1/94

to

m...@cnam.cnam.fr (Marc-Francois Baudot) writes:

>s...@world.std.com (Steven J Edwards) writes:

[deletia]

>As for over the board play, what would you say if Spector played
>the first move of the combination from its opening book, but due to an
>unexpected reply from his opponent got out of book and failed to
>find the right moves, and then lost the game?

I would say that the deficiency was in the inability of the program to
correctly locate the follow-up move. The fault is not that the
combination is included in the book. Removing the combination from
the book may fix the symptom, but will not treat the underlying cause
of the problem.

-- Steven (s...@world.std.com)

T. M. Cuffel

unread,

Aug 2, 1994, 1:23:39 AM8/2/94

to

In article <31il4j$2...@aban.chemeng.ed.ac.uk>,

Steven Rix <ste...@chemeng.ed.ac.uk> wrote:
>
>In article <Ctpz4...@world.std.com>, s...@world.std.com (Steven J Edwards) writes:

>->Now, is position database look-up cheating? The mechanism used in
>->Spector for this is the new, general opening library subsystem. It's
>->configured to be active unless explicitly deselected; I honestly
>->hadn't thought of of it at the time. I had just wanted to get the EPD
>->data validated so that others could use it.
>
>The challenge is for a computer program to "solve" given positions,
>by analysing each position on its merits. Of course looking up the
>answer in a database is cheating: the program just knows that somebody
>in this position once played this move and went on to win, possibly
>because the move was good, possibly because of a later blunder from his
>opponent.

How is this any different from what humans do?

What would you do if I asked you so solve a calculus problem, one you knew
you could solve?

Would you first derive arithmetical theorems, build them into algebraic
theorems, prove a few derivative forms, move on to the Fundemental
Theory of Calculus, prove the integral forms you need, then solve
your problem? If is required a numerical solution, would you do all
the exponentiation and logarithms by hand, instead of consulting a
table or device somebody once constructed?

Of course not. You would consult a database, your brain in some cases,
perhaps a reference in others, and rely on them to produces results
that somebody, perhaps yourself, once found.

That is all the computer is doing. A computer should not have to
recompute whether two bishops are worth more than a rook each position,
nor should it be forbidden from remembering other things. Don't
penalize a computer for being good at this.

Jeff Mallett

unread,

Aug 2, 1994, 6:10:35 AM8/2/94

to

Ah, I just finished a program that plays random moves. I ran the Nolot
test a few times, but I get different results each time. I'll upload the
final stats when it gets 8 out of 11. :)

Jeff Mallett
"All positional sacrifices are just very deep tactical combinations" - me

Steven Rix

unread,

Aug 3, 1994, 5:38:27 AM8/3/94

to

In article <31kl8r$d...@lace.Colorado.EDU>, cuf...@cs.colorado.edu (T. M. Cuffel) writes:
->In article <31il4j$2...@aban.chemeng.ed.ac.uk>,
->Steven Rix <ste...@chemeng.ed.ac.uk> wrote:
->>
->>The challenge is for a computer program to "solve" given positions,
->>by analysing each position on its merits. Of course looking up the
->>answer in a database is cheating: the program just knows that somebody
->>in this position once played this move and went on to win, possibly
->>because the move was good, possibly because of a later blunder from his
->>opponent.
->
->How is this any different from what humans do?

These problems have been selected because a difficult combination was
played in the game. If you just look up the game moves, then you get the
right answer. However, I could just as easily devise a different set of
problems, by collecting games where one side made a bad blunder. In fact,
early ChessBase magazine disks featured small collections of recent
blunders; I'd just have to look up some of the more outrageous ones.
Here, if you just look up the game moves, then you get the wrong answer.

There is nothing wrong with looking up positions in order to *suggest*
Candidate moves, but it is very dangerous to trust these moves explicitly,
for a number of reasons. So, someone once sacrificed his queen in this
position. Before playing the move, you need to know whether the sacrifice
was sound, ie you must combine the look-up with some analysis. Did the
sacrifice lead to a win by force? Did it only win because of very poor
defence, or because the opponent lost on time? Should it have won with
best follow-up, only the player made a mistake and had to take a perpetual
check? This is what humans do, on the whole; they don't play a move just
because Karpov once played it too. Of course, Karpov playing a move *might*
be a *recommendation*, but even Karpov plays sub-optimally some of the time
(for example, he might just chop wood and agree a draw when he's tired).

[How to solve an integral?]
->You would consult a database, your brain in some cases,
->perhaps a reference in others, and rely on them to produces results
->that somebody, perhaps yourself, once found.
->
->That is all the computer is doing. A computer should not have to
->recompute whether two bishops are worth more than a rook each position,
->nor should it be forbidden from remembering other things. Don't
->penalize a computer for being good at this.

I'm not proposing to. If a computer looks up a previously-played move,
finds that it's surprising, analyses for double or triple its normal
thinking time and then plays it because it has now seen a refutation of
what it thought was the refutation, then fine. In the deleted example,
after looking up a standard integral, I'd sit down for a few moments and
check that it looks right. It is dangerous to look something up and trust
it completely; maybe you are copying a bad mistake... Also, an opponent
need only copy a recent drawn game from your book to achieve an easy draw.

Stephen Edwards hasn't actually said that his computer *would* have played
the key move in the 11 positions of Nolot's test, just that it identified
a previously-played move in this position. Now, does that mean that his
program would play the move automatically, or would it analyse first? If
it analysed first, would it play the move nevertheless after an hour's
computation?