Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

The new YAZGAC - test-suite

115 views
Skip to first unread message

mclane

unread,
Apr 15, 1997, 3:00:00 AM4/15/97
to


Hello,

the famous Computer Schach and Spiele edition 2/97 is on the market.

Again mathematician and Dipl.Ing. YAZGAC has published an incredible
test.

It has 40 positions.

I have hacked in these positions and will give you the epd's for
CSTAL (because chris is so kind still using a non-epd-conform own
format! GRRRRRRRRR!) and for normal-epd-programs:

You have to run the positions and write down the time in seconds for
the
key-moves (e.g. in CSTal format the key moves are behind the epd in
algebraic-notation. Some positions have more than one key-move, chris
format is in the moment not capable handling this...).

10 Minutes are the maximum computing time.
Above this, you get sero points for any position.
Each position has its own weightings from 20-110. A good idea!
But: Yazgac is doing a strange way of finding out about the
solution-time. For my own purposes I use the usual way, writing down
the
time when the program first finds the move, and hold it to the end.

Formula for points is:

points = weigthing * (1 - seconds / 600 )
You can find the weigthings in my list below.
If you have a p0 value, you can either find the elo out of this points
by using 2 other before-rated program-measurements out of a chart, or,
because not any of us has Computer-Schach & Spiele, and I have no time
to hack in the charts... use formula 5 (which is only working from
2000-2600 ELO.)
All in all I think some of the positions are to easy for state of the
art programs. Maybe Mr. Yazgac should update some of his tested
programs, or buy a faster pc than his used P90 , so he will get better
formula's and more working test-suites. But maybe for him formulas and
beeing a mathematician is more important than having working values ?!


Yaz2.epd:
4rk2/2p2prp/pq2b2N/1p6/8/2PR1Q2/PP4PP/5R1K w - -
2r2rk1/pp4pp/2n1pq2/2bp4/8/2N3PB/PP2PP1P/R2Q1RK1 w - -
8/4Pn2/7p/3N1k2/4q3/5pK1/3Q4/8 w - -
5rk1/1pp3pp/p1p5/2b5/3n2bq/2NPB3/PPPQ2PP/R3N1K1 b - -
1rb2rk1/3nqppp/p1n1p3/1p1pP3/5P2/2NBQN2/PPP3PP/2KR3R w - -
3R4/5bkp/6p1/3P1p2/3q4/Q5PP/P1r1r1B1/5R1K b - -
r1b2rk1/3n1ppp/p7/Qp1q2B1/P1pPp3/2P1P3/2B2PPP/R3R1K1 w - -
r4rk1/1b2bppp/ppq1pn2/2ppB3/5P2/1P1BP1N1/P1PPQ1PP/R4RK1 w - -
rn1qr1k1/p3bppp/bp3n2/2p2N2/2Q5/3BB1N1/PPP2PPP/2KR3R w - -
3r2k1/pq1P1rb1/1p3ppp/2p5/1nQ2B1P/5N2/P4PP1/1R1R2K1 w - -
1k1rr3/1p3p1p/p1p5/2P3p1/Q2b2q1/1R1BB1P1/PP3P1P/6K1 w - -
3r1r1k/1b4p1/pb5p/2ppPp1q/2p1nP2/2P1BN1N/PPQ3PP/3R1R1K b - -
r2q1rk1/pb3ppp/1p2pnn1/6B1/2BP4/P1NQ4/1P3PPP/3R1RK1 w - -
r3k3/1p3p2/p2p1b1r/3R3P/4q3/2P5/PP1QB3/R2K4 w q -
r1r2qk1/pR3pp1/4p3/6P1/8/4PQ2/5PP1/5RK1 b - -
3r2k1/5pp1/7p/pq2r3/R2P4/6P1/1P1R3P/Q5K1 b - -
r3nrk1/ppq2pbp/2n1b1p1/4pP2/2P5/4BNN1/PP4PP/RQ2KB1R b QK -
8/p2Q2p1/1k3qbp/1pn5/2p4P/2P2P2/5BPK/5B2 w - -
r2rb1k1/pp1n1ppp/2p1pq2/4N3/3P4/4Q1P1/PPP2PBP/2KR3R w - -
q4r2/1bn1ppkp/3p1np1/1ppP4/r3P3/2PBQNNP/1P3PP1/3RR1K1 w - -
4k1r1/1b1r4/1nppNp2/1p3Pp1/1P2P1P1/2N3KR/2P5/7R w - -
r4k2/2p5/P1pp1n1p/2r2qp1/Q7/6BP/2P2PP1/R3R1K1 w - -
4r1k1/2rb1pp1/p1n1p2p/1p1pP3/3N1P2/1P1BK1P1/P1P2R1P/R7 w - -
3r2k1/1pq1r1p1/2pnp1Bp/p2p4/3P1P2/4PR1P/PPQ3P1/2R3K1 w - -
2kr3r/ppp1qppp/2p2n2/2b2b2/2P1pP2/1P2P3/PBQPB1PP/RN2K2R b QK -
r4rk1/pp2bppp/3p1n2/4pP2/2q1P3/2N1B3/PPP3PP/R3QRK1 w - -
2rq1rk1/pp2bpp1/2pn1n1p/3p4/3P1B1P/2NBP3/PPQ2PP1/R3K2R w QK -
1n3rk1/r4ppp/pp1q4/3p4/Q2P4/5N2/PP3PPP/2R1R1K1 w - -
6r1/3bbk2/4p2p/1p1pPp2/2pP1Pr1/P4K2/1P4NP/1R2B1R1 b - -
2nq2rb/3pr2k/1pp1p2p/p3Pp1B/3P1P2/PPN5/5Q1P/R5RK w - -
7k/8/p4n2/2pP1r2/2P1p2P/8/P2N1p2/3K3R b - -
8/8/8/8/p2K4/3Q4/kp3P2/8 w - -
8/8/8/5R2/4K1k1/pp6/8/8 w - -
8/5p2/p3p1p1/1pk1P2p/5P1P/1P1K2P1/P7/8 b - -
8/8/8/1P2kp2/P2p2p1/6P1/3K4/8 b - -
8/1K6/P7/4p3/4Pk2/2pBb3/8/8 w - -
8/3k4/4R3/1p1r1p2/p4P2/P1K1P3/1P6/8 w - -
3K4/3P1k2/8/8/8/2r5/8/4R3 w - -
8/N5pp/8/8/3knP2/3p2P1/P6P/2K5 b - -
8/5r2/3k1P2/3p4/1p1K4/1P6/5R2/8 w - -


Yaz2.pos:
40 positions : 40 Yazgac-Test. BT=2630
4rk2/2p2prp/pq2b2N/1p6/8/2PR1Q2/PP4PP/5R1K/w d3d7
2r2rk1/pp4pp/2n1pq2/2bp4/8/2N3PB/PP2PP1P/R2Q1RK1/w c3d5
8/4Pn2/7p/3N1k2/4q3/5pK1/3Q4/8/w e7e8Q
5rk1/1pp3pp/p1p5/2b5/3n2bq/2NPB3/PPPQ2PP/R3N1K1/b h4h6
1rb2rk1/3nqppp/p1n1p3/1p1pP3/5P2/2NBQN2/PPP3PP/2KR3R/w d3h7
3R4/5bkp/6p1/3P1p2/3q4/Q5PP/P1r1r1B1/5R1K/b f7e8
r1b2rk1/3n1ppp/p7/Qp1q2B1/P1pPp3/2P1P3/2B2PPP/R3R1K1/w a4b5
r4rk1/1b2bppp/ppq1pn2/2ppB3/5P2/1P1BP1N1/P1PPQ1PP/R4RK1/w g3h5
rn1qr1k1/p3bppp/bp3n2/2p2N2/2Q5/3BB1N1/PPP2PPP/2KR3R/w f5h6
3r2k1/pq1P1rb1/1p3ppp/2p5/1nQ2B1P/5N2/P4PP1/1R1R2K1/w b1b4
1k1rr3/1p3p1p/p1p5/2P3p1/Q2b2q1/1R1BB1P1/PP3P1P/6K1/w b3b7
3r1r1k/1b4p1/pb5p/2ppPp1q/2p1nP2/2P1BN1N/PPQ3PP/3R1R1K/b d5d4
r2q1rk1/pb3ppp/1p2pnn1/6B1/2BP4/P1NQ4/1P3PPP/3R1RK1/w f2f4
r3k3/1p3p2/p2p1b1r/3R3P/4q3/2P5/PP1QB3/R2K4/w d5d3
r1r2qk1/pR3pp1/4p3/6P1/8/4PQ2/5PP1/5RK1/b a7a5
3r2k1/5pp1/7p/pq2r3/R2P4/6P1/1P1R3P/Q5K1/b b5e8
r3nrk1/ppq2pbp/2n1b1p1/4pP2/2P5/4BNN1/PP4PP/RQ2KB1R/b e5e4
8/p2Q2p1/1k3qbp/1pn5/2p4P/2P2P2/5BPK/5B2/w g2g3
r2rb1k1/pp1n1ppp/2p1pq2/4N3/3P4/4Q1P1/PPP2PBP/2KR3R/w f2f4
q4r2/1bn1ppkp/3p1np1/1ppP4/r3P3/2PBQNNP/1P3PP1/3RR1K1/w e4e5
4k1r1/1b1r4/1nppNp2/1p3Pp1/1P2P1P1/2N3KR/2P5/7R/w e4e5
r4k2/2p5/P1pp1n1p/2r2qp1/Q7/6BP/2P2PP1/R3R1K1/w c2c4
4r1k1/2rb1pp1/p1n1p2p/1p1pP3/3N1P2/1P1BK1P1/P1P2R1P/R7/w d4c6
3r2k1/1pq1r1p1/2pnp1Bp/p2p4/3P1P2/4PR1P/PPQ3P1/2R3K1/w g2g4
2kr3r/ppp1qppp/2p2n2/2b2b2/2P1pP2/1P2P3/PBQPB1PP/RN2K2R/b f6g4
r4rk1/pp2bppp/3p1n2/4pP2/2q1P3/2N1B3/PPP3PP/R3QRK1/w e3g5
2rq1rk1/pp2bpp1/2pn1n1p/3p4/3P1B1P/2NBP3/PPQ2PP1/R3K2R/w g2g4
1n3rk1/r4ppp/pp1q4/3p4/Q2P4/5N2/PP3PPP/2R1R1K1/w a4c2
6r1/3bbk2/4p2p/1p1pPp2/2pP1Pr1/P4K2/1P4NP/1R2B1R1/b b5b4
2nq2rb/3pr2k/1pp1p2p/p3Pp1B/3P1P2/PPN5/5Q1P/R5RK/w c3e4
7k/8/p4n2/2pP1r2/2P1p2P/8/P2N1p2/3K3R/b e4e3
8/8/8/8/p2K4/3Q4/kp3P2/8/w d3c2
8/8/8/5R2/4K1k1/pp6/8/8/w f5f1
8/5p2/p3p1p1/1pk1P2p/5P1P/1P1K2P1/P7/8/b c5b4
8/8/8/1P2kp2/P2p2p1/6P1/3K4/8/b f5f4
8/1K6/P7/4p3/4Pk2/2pBb3/8/8/w d3b1
8/3k4/4R3/1p1r1p2/p4P2/P1K1P3/1P6/8/w e6e5
3K4/3P1k2/8/8/8/2r5/8/4R3/w e1f1
8/N5pp/8/8/3knP2/3p2P1/P6P/2K5/b e4d6
8/5r2/3k1P2/3p4/1p1K4/1P6/5R2/8/w f2a2

positions with >1 key-moves are:
4 h4g5, h4h6
24 g2g4, g1h1, g1h2
all other key-moves can be taken from
the algebraic-list of yaz2.pos

The magnificent formulas that came with a 3 pages explanation
can be found on page 49-51.
I wonder why a shallow magazine like CCS reports 3 pages about
formulas,
but I think this was done to impress the readers, to show how
high-standarized the value of the CSS writers is (Hello Dieter, Hello
Pitters!) ! I am impressed too. To find 3 pages formula about a
non-topic formula definition any pupil knows from ZIRKELTRAINING in
school-sports (weightings of LIEGESTUETZE compared to weigthings of
ARMDRUECKEN), but only to find 1 1/2 pages about the
Paderborn-tournament.
What is more important ? CSS shows what is important here (for them) !
Anyway...., or - as my friend Jaap van den Herik always says:

NEVERTHELESS.... :-)

I have done this test on my Penitum120 machine with a Version
224 of Chess System Tal, because the Diogenes-programmer (impressed
that
his program was tested by somebody at all) told me
(although his program is not strong, he knows enough about chess - so
I trusted his advises, and we confirmed that the test looks
interesting...:-))
that the test seem to represent good ELO-values.

I have the SENDINFO-FILES that show in evidence the solution times
of cstal with me, you can get them on demand.

Moritz, can you make a post about this article in rgcc, together with
the positions and maybe with some formulas, and FAQ or HOW TO DO
instructions ?? I have not the time because of AEGON.

Chris:
I have many problems with the fact that CSTal is still not capable to
find position 38 Rf1+.
I thought we spoke about these EASY endgames often often often often.
Maybe not often enough.... GRRRRRR! Lazy britains ! :-)

Here are the results of Chess System Tal:

(To differenciate this yazgac-test from his older predecessor, I call
the new one no. 2. , a time in [brackets] means, it is over 10' and
was
too long... no points for that, but maybe on a faster machine... )

Yazgac2:
1-40 1"-10' 20-110
No. time points weigthings
1 1" 19,96 20
2 6'5" 7,83 20
3 21" 19,3 20
4 13" 19,56 20
5 0" 20 20
6 0" 30 30
7 6" 39,6 50
8 7' 15 60
9 5'44" 25,6 60
10 [11'36"] - 70
11 4" 69,53 70
12 - - 110
13 - - 20
14 26" 19,13 20
15 1'59" 24,05 30
16 4" 29,8 30
17 1'53" 40,58 50
18 [12'21"] - 50
19 - - 70
20 4" 69,53 70
21 4'35" 43,33 80
22 - - 90
23 0" 100 100
24 - - 100
25 14" 97,6 100
26 8" 108,53 110
27 - - 110
28 4" 109,26 110
29 - - 110
30 9'33" 4,95 110
31 14" 19,53 20
32 12" 19,6 20
33 7'14" 5,53 20
34 0" 20 20
35 1" 29,95 30
36 1'58" 24,1 30
37 - - 40
38 - - 60
39 9'20" 4,66 70
40 2'41" 58,53 80

So the P0 points of CSTal is 1095,04 ! (if I haven't done a
mistake...)


With formula 5 you should be (normally) able to calculate the elo:
ELO = 1050 + 5,817 * p0 - 0,008847 * p0*p0 + 0,0000046 * p0*p0*p0

When I input this into the formula 5 , and use a very good calculator,
I get 2851.144 ELO.

Because Yazgac claims that his formulas work for ELO-ranges between
2000-2600 we see that we cannot USE this elo here.

Ok, forget about the elo. The points are high enough.

Now I am interested in your comments,
ELO/points of Hiarcs6, Shredder, Mchess6 and Genius5
(Rebel8 and MCP5 , Fritz4 and Fritz3 and Wchess were tested).

Thanks.

I hope I have not made any (typing-/calculating-) mistake somewhere ,
but would be pleased if you tell me, if I have made a wrong position
or
wrong value somewhere.

Critics recommanded.


mclane

unread,
Apr 15, 1997, 3:00:00 AM4/15/97
to

Here are some results of chess-programs (if version number is unknown,
it is V?) in this test-suite
(it is interesting to see that many values correlate very exact to the
ssdf-elo's) so that you have a relation how good your program is:


Program Points Yazgac-Elo Diff to ssdf

ChessSystemTal 1095 2851 ? formula fails...

Mchess6 920 2496 +61
Rebel8 904 2477 +15
Isichess V? 860 2449
Rebel7 841 2421 +13
Genius4 828 2412 +1
The King V? 814 2404

Fritz4 785 2390 +54
Mchess5 779 2387 -24
Comet A.32 764 2382
Hiarcs3 752 2378 -1
Genius3 749 2377 -33
Hiarcs4 745 2375 -18
Socrates 3 728 2371
Zarkov V? 717 2368
Rebel Decade V? 716 2368

Kallisto 1.98 647 2356 +22
Fritz3 640 2355 +1
VirtuaChess V? 627 2353
Nimzo3.0 597 2348 +35
Y-chess 558 2340
Wchess1.06 517 2328 +4

Saitek Risc 2500 440 2289 +97
Rexchess 395 2251

Comet A.12 362 2215 -9
Diogenes 250 316 2150
Now1.1 310 2140
Novag Sapphire 303 2128 +42
CompleteChessSystem 298 2120
Gandalf2.01 259 2043 2

SiberianChess v2.01 200 1896

I am awaiting your own results with your chess-program.
Also it is very interesting for me to see, which programs have nearly
same Yazgac-Elo like ssdf-elo.

For some programs the difference is very low, for others very high.
Can it be that the difference has something to do with the NPS /
knowledge -ratio , and also with the brute-force/selective-search
relation ???

It is also very nice that we have many amateur-programs tested so far
to get an impression about their elo.

WE NEED MORE RESULTS!!!!!

Thanks to Computer-Schach & Spiele and of course to Mr. Yazgac for his
ideas.

Please Mr. Yazgac: make us a better formula, that I can calculate
Chess System Tal's Elo more properly. Or is the formula not broken for
these high values ??? Of course we need a formula that can handle
higher points. This test-suite was not published before, so we can be
sure, 'til now nobody has tuned his algorithms on it.


Moritz Berger

unread,
Apr 15, 1997, 3:00:00 AM4/15/97
to

On Tue, 15 Apr 1997 09:36:31 GMT, mcl...@prima.ruhr.de (mclane) wrote:
>Hello,

>the famous Computer Schach and Spiele edition 2/97 is on the market.

>Again mathematician and Dipl.Ing. YAZGAC has published an incredible
>test.

>It has 40 positions.


>Moritz, can you make a post about this article in rgcc, together with
>the positions and maybe with some formulas, and FAQ or HOW TO DO
>instructions ?? I have not the time because of AEGON.

>So the P0 points of CSTal is 1095,04 ! (if I haven't done a
>mistake...)

Yazgac recommends to use 1st 1 min./move[average] and try the
remaining (not solved) positions at 3 min./move or tournament level.
If a solution is not found within 10 minutes, the score would be 0 for
that position.

The most important thing missing from Thorstens post (and the reason
why you still will have to buy the CSS and he will probably not be in
a copyright mess) are - the solution lines (the author insists that
programs should find correct lines in each position, not just key
moves. He gives a 4-8 move line of play for each position, sometimes
with alternate solutions.). I don't have the time to post them here,
maybe somebody else could do this ...

>When I input this into the formula 5 , and use a very good calculator,
>I get 2851.144 ELO.

Is the calculator also programmed by Chris? ;-)))

The result certainly is ok and shows that you and Chris have
implemented the right chess knowledge to solve typical 'human'
problems that pose problems for other PC programs (the test suite
yields ratings as expected for Rebel, MCP, Genius etc.).

Well done, Thorsten and Chris!

>Because Yazgac claims that his formulas work for ELO-ranges between
>2000-2600 we see that we cannot USE this elo here.

Still nice ... I remember that Complete Chess System (SSDF ELO <2000)
also claimed an ELO of about 2700 ... As far as I remember, it can
still be downloaded from the Oxford Software homepage, so everybody
can talk a look and Chris W. probably will turn deep purple ;-)

>Ok, forget about the elo. The points are high enough.

I'm sure that nobody will question the validity of the Yagzac test,
since it's obvious that CST is so much better than Kasparov ... At
least we can be sure that it's better than 2600 ELO, although the test
doesn't give us enough data about how much stronger :-)

Thorsten, be alert: They will probably ban you from AEGON, because CST
is too strong for them ... The rate of suicides among grandmasters
would climb to intolerable heights if you were allowed to crush them
with this (attention, KK!) MONSTER. ;-)

Moritz
-------------
Moritz...@msn.com

"The truth will always come out in the end."
(Komputer Korner, 28/03/1997 08:22 on rec.games.chess.computer)

Andreas Mader

unread,
Apr 15, 1997, 3:00:00 AM4/15/97
to

mclane wrote:
>
>
> Hello,
>
> the famous Computer Schach and Spiele edition 2/97 is on the market.
>
> Again mathematician and Dipl.Ing. YAZGAC has published an incredible
> test.
>
> It has 40 positions.
>
> I have hacked in these positions and will give you the epd's for
> CSTAL (because chris is so kind still using a non-epd-conform own
> format! GRRRRRRRRR!) and for normal-epd-programs:
>
> You have to run the positions and write down the time in seconds for
> the
> key-moves (e.g. in CSTal format the key moves are behind the epd in
> algebraic-notation. Some positions have more than one key-move, chris
> format is in the moment not capable handling this...).
>
> 10 Minutes are the maximum computing time.
> Above this, you get sero points for any position.
> Each position has its own weightings from 20-110. A good idea!
> But: Yazgac is doing a strange way of finding out about the
> solution-time. For my own purposes I use the usual way, writing down
> the
> time when the program first finds the move, and hold it to the end.


Hello Thorsten!

What you are doing is what Thomas Mally called the "Piff Paff Puff
method". The tester looks at the display, sees that the key move is
found and writes down the time. It is unimportant for him/her if the
program has "understood" the position (= show a clearly high or low
evaluation AND the right main line) or not. Maybe the program made the
right move for wrong reasons? Unimportant, the move is on the display
and that's it! Every test which is based on such "results" is nonsense.

Please do not fall into such a way of testing just because CST gets good
results by doing so! You always argued against the "Piff Paff Puff"ers
and now your arguments are based on the same method!


>
> Formula for points is:
>
> points = weigthing * (1 - seconds / 600 )
> You can find the weigthings in my list below.
> If you have a p0 value, you can either find the elo out of this points
> by using 2 other before-rated program-measurements out of a chart, or,
> because not any of us has Computer-Schach & Spiele, and I have no time
> to hack in the charts... use formula 5 (which is only working from
> 2000-2600 ELO.)
> All in all I think some of the positions are to easy for state of the
> art programs. Maybe Mr. Yazgac should update some of his tested
> programs, or buy a faster pc than his used P90 , so he will get better
> formula's and more working test-suites. But maybe for him formulas and
> beeing a mathematician is more important than having working values ?!
>

I wouldn't say it so harsh, but you are on the right way with your
suggestions...

<test positions snipped>

> The magnificent formulas that came with a 3 pages explanation
> can be found on page 49-51.

Are you impressed by these formulas? I hope, not too much.
Problem is that when you take other programs as a base for the
parameters you probably will get clearly different results (e.g.
different weighing factors).

<usual complaining about CSS snipped> :)

> NEVERTHELESS.... :-)
>
> I have done this test on my Penitum120 machine with a Version
> 224 of Chess System Tal, because the Diogenes-programmer (impressed
> that
> his program was tested by somebody at all) told me
> (although his program is not strong, he knows enough about chess - so
> I trusted his advises, and we confirmed that the test looks
> interesting...:-))
> that the test seem to represent good ELO-values.
>
> I have the SENDINFO-FILES that show in evidence the solution times
> of cstal with me, you can get them on demand.

PLEASE be careful when you speak about "solution times"! Is the position
REALLY solved? Or has CST played the move by chance? You are not a "Piff
Paff Puff"er, are you?????

<snip>
> Critics recommanded.

Critics made.

Best wishes
Andreas

Moritz Berger

unread,
Apr 15, 1997, 3:00:00 AM4/15/97
to

On Tue, 15 Apr 1997 12:34:35 -0700, Andreas Mader
<ma...@p6.gud.siemens.co.at> wrote:
< snip >
>Hello Thorsten!

>What you are doing is what Thomas Mally called the "Piff Paff Puff
>method". The tester looks at the display, sees that the key move is
>found and writes down the time. It is unimportant for him/her if the
>program has "understood" the position (= show a clearly high or low
>evaluation AND the right main line) or not. Maybe the program made the
>right move for wrong reasons? Unimportant, the move is on the display
>and that's it! Every test which is based on such "results" is nonsense.

< snip >

The Yazgac test suite is not "crash boom bang" since it doesn't look
at key moves but complete lines of play ...

Andreas Mader

unread,
Apr 15, 1997, 3:00:00 AM4/15/97
to

Moritz Berger wrote:
>
> On Tue, 15 Apr 1997 12:34:35 -0700, Andreas Mader
> <ma...@p6.gud.siemens.co.at> wrote:
> < snip >
> >Hello Thorsten!
>
> >What you are doing is what Thomas Mally called the "Piff Paff Puff
> >method". The tester looks at the display, sees that the key move is
> >found and writes down the time. It is unimportant for him/her if the
> >program has "understood" the position (= show a clearly high or low
> >evaluation AND the right main line) or not. Maybe the program made the
> >right move for wrong reasons? Unimportant, the move is on the display
> >and that's it! Every test which is based on such "results" is nonsense.
> < snip >
>
> The Yazgac test suite is not "crash boom bang" since it doesn't look
> at key moves but complete lines of play ...
>

Maybe Mr. Yazgacs test siute isn't "crash boom bang", but Thorstens test
method is (Remember, Thorsten wrote that Yazgacs method is "strange" and
that he did it the old way to test CST: The Piff-Paff-Puff-way!)

Best wishes
Andreas

Robert Hyatt

unread,
Apr 15, 1997, 3:00:00 AM4/15/97
to

Andreas Mader (ma...@p6.gud.siemens.co.at) wrote:

: Best wishes
: Andreas

I'm going to run the suite. Any chance someone can post (a) the
solutions (Thorsten posted the key move or moves, but not the
expected score or a PV to evaluate "when" a program finds the right
move for the right reason.)

Meanwhile, I'm going to at least run the suite. If no one wants to
translate the key PV's and post them, I'll be happy to simply post
times and "best" PV from Crafty if someone would compare 'em to the
CSS article to see if a position should be counted as correct or
incorrect, and at what "time"...

Bob


chrisw

unread,
Apr 15, 1997, 3:00:00 AM4/15/97
to

--
http://www.demon.co.uk/oxford-soft

Andreas Mader <ma...@p6.gud.siemens.co.at> wrote in article
<3353F4...@p6.gud.siemens.co.at>...


> Moritz Berger wrote:
> >
> > On Tue, 15 Apr 1997 12:34:35 -0700, Andreas Mader
> > <ma...@p6.gud.siemens.co.at> wrote:
> > < snip >
> > >Hello Thorsten!
> >
> > >What you are doing is what Thomas Mally called the "Piff Paff Puff
> > >method". The tester looks at the display, sees that the key move is
> > >found and writes down the time. It is unimportant for him/her if the
> > >program has "understood" the position (= show a clearly high or low
> > >evaluation AND the right main line) or not. Maybe the program made the
> > >right move for wrong reasons? Unimportant, the move is on the display
> > >and that's it! Every test which is based on such "results" is
nonsense.
> > < snip >
> >
> > The Yazgac test suite is not "crash boom bang" since it doesn't look
> > at key moves but complete lines of play ...
> >
>
> Maybe Mr. Yazgacs test siute isn't "crash boom bang", but Thorstens test
> method is (Remember, Thorsten wrote that Yazgacs method is "strange" and
> that he did it the old way to test CST: The Piff-Paff-Puff-way!)

Then maybe Thorsten should test it properly ...... ?

I don't trust these tests.

They get generated like this:

1. Get a load of positions.
2. Get a load of solve times for as many programs as possible.
3. Generate a polynomial to predict Elo from the results, using a
coefficient for each position. 'adjust' the importance of each coefficient
until you get the best fit with the known (SSDF) Elo results - this is
quite easy by computer, just write an iteration program.
4. Hey presto, the graph of results and known Elo's fit nicely :)

Then use it to 'predict' the Elo of other programs, hopefully there won't
be too many of them, because lots got included in the test preparation.

Now, programs which behave typically, just like all the others, will
probably fit nicely onto the graph; but programs like CSTal, which don't
behave like the others could end up anyplace (conveniently for me,
apparently at the top, but maybe a non-piff-paff-puff test method would
give another result).

All this proves is that programs operating to one paradigm (the typical
one) will behave, and programs operating differently won't.

Another point: while it is true that this test is new and therefore
programs aren't tuned to it (yet), it is also true that the test, in its
creation, was tuned to the participating programs - therefore participating
programs with 'good' results are, by definition, tuned to the test.

Chris Whittington


>
> Best wishes
> Andreas
>

Robert Hyatt

unread,
Apr 15, 1997, 3:00:00 AM4/15/97
to

So the P0 points of CSTal is 1095,04 ! (if I haven't done a mistake...)

With formula 5 you should be (normally) able to calculate the elo:
ELO = 1050 + 5,817 * p0 - 0,008847 * p0*p0 + 0,0000046 * p0*p0*p0

When I input this into the formula 5 , and use a very good calculator,
I get 2851.144 ELO.

Because Yazgac claims that his formulas work for ELO-ranges between

I assume there is a typo above... otherwise your calculator is
broke??? IE, 5,817 * P0, or is that , a . (comma a period)???

I'm going to try 5.8*P0 - .008847*P0^2, etc... to see... I seem
to remember an oddity about European decimel numbers replacing our
"." with a "," but thought that was only in certain places???

"help"?? :)


chrisw

unread,
Apr 15, 1997, 3:00:00 AM4/15/97
to

--
http://www.demon.co.uk/oxford-soft

Robert Hyatt <hy...@crafty.cis.uab.edu> wrote in article
<5j006i$4...@juniper.cis.uab.edu>...
> Andreas Mader (ma...@p6.gud.siemens.co.at) wrote:


> : Moritz Berger wrote:
> : >
> : > On Tue, 15 Apr 1997 12:34:35 -0700, Andreas Mader
> : > <ma...@p6.gud.siemens.co.at> wrote:
> : > < snip >
> : > >Hello Thorsten!
> : >
> : > >What you are doing is what Thomas Mally called the "Piff Paff Puff
> : > >method". The tester looks at the display, sees that the key move is
> : > >found and writes down the time. It is unimportant for him/her if the
> : > >program has "understood" the position (= show a clearly high or low
> : > >evaluation AND the right main line) or not. Maybe the program made
the
> : > >right move for wrong reasons? Unimportant, the move is on the
display
> : > >and that's it! Every test which is based on such "results" is
nonsense.
> : > < snip >
> : >
> : > The Yazgac test suite is not "crash boom bang" since it doesn't look
> : > at key moves but complete lines of play ...
> : >
>
> : Maybe Mr. Yazgacs test siute isn't "crash boom bang", but Thorstens
test
> : method is (Remember, Thorsten wrote that Yazgacs method is "strange"
and
> : that he did it the old way to test CST: The Piff-Paff-Puff-way!)
>

> : Best wishes
> : Andreas
>
> I'm going to run the suite. Any chance someone can post (a) the
> solutions (Thorsten posted the key move or moves, but not the
> expected score or a PV to evaluate "when" a program finds the right
> move for the right reason.)
>
> Meanwhile, I'm going to at least run the suite. If no one wants to
> translate the key PV's and post them, I'll be happy to simply post
> times and "best" PV from Crafty if someone would compare 'em to the
> CSS article to see if a position should be counted as correct or
> incorrect, and at what "time"...

If anyone is worried about copyright of posting the suites and test method,
then would several of the Germans posting a section of the test, say 10
positions each, avoid copyright problems ?

I'm not, of course, advocating this, just posing the question.

Chris Whittington

>
> Bob
>
>

chrisw

unread,
Apr 15, 1997, 3:00:00 AM4/15/97
to


--
http://www.demon.co.uk/oxford-soft

Robert Hyatt <hy...@crafty.cis.uab.edu> wrote in article

<5j03p9$6...@juniper.cis.uab.edu>...

Arrogant American pig :)

You replaced our comma with a period.

And its not a period its a full stop.

Chris international goodwill Whittington

>
> "help"?? :)
>
>

Moritz Berger

unread,
Apr 15, 1997, 3:00:00 AM4/15/97
to

On Tue, 15 Apr 1997 09:36:31 GMT, mcl...@prima.ruhr.de (mclane) wrote:
>Hello,

>the famous Computer Schach and Spiele edition 2/97 is on the market.
>Again mathematician and Dipl.Ing. YAZGAC has published an incredible
>test.

>It has 40 positions.

Here are the results for Genius 5 at 1 min. / move on P5/133.
cpv means 'correct pv', only given where complete solution was not
found be Genius (ce) at this level. cpv moves in square brackets are
alternate lines starting at the move in front of the sq. brackets.

I propose to name this test suite 'Y' test (pronounciation: 'why
test?' :-))) after its author, Mr. Yazgac.

I think the below solution times show that the test suite is not too
easy if complete pvs are mandatory. Genius 5 gets only 10 out of 40
positions right at 1 min. / move. Thorsten, did you consider complete
main lines for your 2850 ELO result with CST?

Moritz

ce 0342; pv d3-d7 e8-b8 h6xf7 e6-g4 f3-f4 f8-g8 f7-h6+b6xh6 f4xh6 ;
cpv Rd7 Rb8 Nxf7 Bxd7 Nd8+ Ke7 Qf8#
ce 0048; pv c3xd5 f6-h6 g1-g2 c8-d8 d1-c1 e6xd5 c1xc5 ;
ce 0594; pv e7-eQ e4-g4+g3-f2 g4-g2+f2-e3 g2-g5+e3-d3 f7-e5+d3-c3 ;
cpv e8Q! Qxe8 Qf4+ Kg6 Qg4+ Ng5 Qh5+ Kxh5 Nf4#
ce 0866; pv h4-g5 h2-h3 d4-e2+d2xe2 c5xe3+e2xe3 g5xe3+g1-h2 f8-f1 ;
cpv ..Qg5! [..Qh6] Qf2 Qxe3 Qxe3 Ne2+ Nxe2 Bxe3+ Kh1 Rf1+ Ng1 Rxg1#
ce 0151; pv d3xh7+g8-h8 f3-d4 e7-h4 d4xc6 b8-b6 h7-d3 b6xc6 g2-g3 ;
cpv Bxh7+ Kxh7 Ng5+ Kg6 Qh3 Ndxe5 Qh7+ Kf6 Nce4+ dxe4 Nxe4#
ce 0125; pv f7-e8 h3-h4 d4-f6 d8xe8 e2xe8 f1-c1 c2xc1+a3xc1 e8-e2 ;
cpv Be8 Rg1 Qxg1+ Kxg1 Rxg2+ Kh1 Rh2+ Kg1 Rcg2+ Kf1 Bb5+ Qd3 Bxd3+ Ke1
Rh1#
ce 0160; pv a4xb5 d5xg5 c2xe4 a8-b8 b5xa6 g5-h4 e4-c2 b8-b2 a6-a7 ;
cpv axb5 Qxg5 Bxe4 Rb8 bxa6 Rb5 Qc7 Nb6 a7 [Reb1]
ce -0009; pv g3-h5 f6xh5 e2xh5 f7-f5 f1-f3 c5-c4 b3xc4 d5xc4 d3-f1 ;
cpv Nh5 Nxh5 Bxh7+ Kxh7 Qxh5+ Kg8 Bxg7 Kxg7 Qg4+ Kh7 Rf3 e5 Rh3+ Qh6
Rxh6+ Kxh6 Qd7
ce 0060; pv c4-f4 a6xd3 d1xd3 b8-d7 f4-g5 g7-g6 f5-h6+g8-f8 g5-e5 ;
cpv Nh6+ gxh6 Bxh7+ Nxh7 Qg4+ Kh8 Rxd8 Rxd8 Qe4 Nc6 Qxc6
ce 0057; pv b1xb4 c5xb4 f4-c7 b6-b5 c4xf7+g8xf7 c7xd8 ;
ce 0087; pv d3xa6 g4-d1+g1-g2 d4xe3 b3xb7+b8-a8 a4xc6 d1-d5+c6xd5 ;
cpv Rxb7+ Kxb7 Qxa6+ Kb8 Qb6+ Ka8 Qxc6+ Kb8 Qb6+ Ka8 Bb5
ce 0036; pv b7-c6 f1-e1 d8-e8 h3-f2 e8-e7 c2-b1 e7-e6 b1-c2 f8-e8 ;
cpv ..d4 cxd4 cxd4 Bxd4 Bxd4 Rxd4 Rxd4 Nxd4 Qxh3 gxh3 Nf2+ Kg1 Nxh3#
ce 0051; pv f2-f4;
ce 0042; pv d5-d3 e4-h1+d1-c2 h1xa1 d2xh6;
ce -0009; pv a7-a5;
ce 0067; pv b5-e8;
ce 0063; pv e5-e4;
ce 0000; pv f2xc5+b6xc5 d7xa7+c5-c6 a7-a6+c6-c5 a6-a7+c5-c6 a7-a6+;
cpv g3 a5 Bg2
ce 0042; pv f2-f4 a7-a5 h2-h4 f6-e7 d1-e1 c6-c5 e5xd7 e7xd7 d4xc5 ;
cpv f4 Nxe5 dxe5 Qe7 Rxd8 Qxd8 Rd1
ce -0027; pv e4-e5 d6xe5 e3xc5 f6xd5 f3xe5 c7-e6 c5xb5 d5-c7 b5-d7 ;
cpv e5 dxe5 Qxe5 Ncxd5 Bxb5
ce 0078; pv h3-h6 e8-e7 h6-h7+e7-e8 h7xd7 b6xd7 h1-h7 b7-c8 g3-f3 ;
cpv e5! dxe5 Ne4
ce 0096; pv a6-a7 f5xc2 a4xc2 c5xc2 e1-b1 f6-d7 b1-b7 c2-e2 b7xc7 ;
cpv c4
ce -0033; pv c2-c3 e8-c8 a1-d1 c6-e7 d4-e2 a6-a5 a2-a3 a5-a4 b3-b4 ;
cpv Nxc6 Bxc6 Kd4
ce 0027; pv g6-h7+g8-h8 h7-d3 d8-f8 a2-a3 e7-f7 d3-g6 f7-f6 f3-f2 ;
cpv g4
ce 0048; pv h8-e8 e1-g1 c8-b8 b2-e5 e7-d7 c2-c3 d7-e6 c3-a5 e6-e7 ;
cpv ...Ng4
ce -0012; pv e1-g3 f8-c8 e3-h6 e7-f8 a1-e1 c4-b4 h6-c1 c8-c4 g3-e3 ;
cpv Bg5 Rfe8 Bxf6 [Rd1 Rad8] Bxf6 Nd5
ce 0018; pv h4-h5 a7-a5 a2-a3 f8-e8 c3-a4 d6-e4 f2-f3 e4-g5 a4-c5 ;
cpv g4 Nfe4 g5 h5 Bxe4 Nxe4 Nxe4 dxe4 Qxe4
ce 0060; pv h2-h3 a7-e7 e1xe7 d6xe7 a4-b3 e7-d6 f3-e5 f7-f6 e5-f3 ;
cpv Qc2 Re7 Rxe7 Qxe7 Qc7 Qxc7 Rxc7
ce 0066; pv g4-g7 b1-c1 d7-e8 c1-c2 f7-f8 f3-e3 e8-h5 h2-h3 f8-f7 ;
cpv b4 axb4 Ba4
ce -0048; pv f2-h4 d8-f8 g1xg8 f8xg8 a1-g1 e7-g7 g1xg7+g8xg7 h5-f3 ;
cpv Ne4 fxe4 f5 Rg5 Rxg5 hxg5 f6 Kh6 fxe7
ce 0193; pv e4-e3 d2-f1 f5-e5 d1-e2 f6-h5 f1xe3 h5-g3+e2xf2 g3xh1+;
cpv e3 Nf1 Re5 Ke2 Nh5 Kf3 e2
ce 1069; pv d3-c2 a2-a3 c2-b1 a3-b3 d4-d3 b3-a3 d3-c2 a3-b4 c2xb2 ;
cpv Qc2 a3 Kc4 Ka1 Qc3
ce 0000; pv f5-f1 a3-a2 f1-g1+g4-h4 e4-f4 h4-h3 f4-f3 h3-h4 f3-f4 ;
cpv Rf1 b2 Rg1+ Kh3 Kf3 Kh4 Kf4
ce 0012; pv c5-b4 d3-c2 b4-a3 c2-b1 a6-a5 b1-a1 a5-a4 b3xa4 a3xa4 ;
ce 0506; pv f5-f4 g3xf4+e5-d6 a4-a5 g4-g3 a5-a6 d6-c7;
ce 0012; pv d3-c2 e3-d4 a6-a7 d4xa7 b7xa7 f4-e3 a7-b6 e3-d2 c2-b3 ;
cpv Bb1 Bd4 a7 Bxa7 Kxa7 Ke3 Kb6
ce 0206; pv e6-e5 d5xe5 f4xe5 d7-e7 c3-b4 e7-e6 b4xb5 e6xe5 b5xa4 ;
cpv Re5 Rxe5 fxe5 Ke7 3.Kd3 [3.Kd2, 3.e4] Kd7 e4 f4 Ke2 Ke6 Kf2
ce 0618; pv e1-f1+f7-g7 f1-f4 c3-c2 d8-e7 c2-e2+e7-d6 e2-d2+d6-e6 ;
cpv Rf1+ Kg7 Rf4 Rc1 Ke7 Re1+ Kd6 Rd1+ Ke6 [Kc6] Re1+ Kd5 Rd1+ Rd4
ce 0121; pv e4-d6 a7-c6+d4-e3 c6-a5 d6-e4 a5-c4+e3-e2 a2-a4 d3-d2+;
cpv Nd6 Kd2 Nc4+ Kc1 d2+ [Ke3 Nb5 d2+ Kc2 Na3+ Nxa3 Ke2] Kc2 Ke3 Nb5
Na3+ Nxa3 Ke2
ce 0157; pv f2-a2 d6-e6 a2-a6+e6-f5 d4xd5 f7-d7+d5-c5 d7-b7 a6-b6 ;
cpv Ra2 Rc7 Ra6+ Kd7 Rb6

Robert Hyatt

unread,
Apr 15, 1997, 3:00:00 AM4/15/97
to

Here's the results from Crafty, although the 5.8 *did* not quite
work: :)

solution times (seconds)
+---------------------------------+
1- 8 | 14 0 1 4 16 0 8 2 |
9-16 | 7 39 287 -- 3 104 0 102 |
17-24 | 12 -- -- 91 1 -- 0 -- |
25-32 | -- 265 -- 0 -- -- 0 1 |
33-40 | 5 0 1 0 0 0 205 0 |
+---------------------------------+


solution points
+-----------------------------------------------------------------+
1- 8 | 19.533 20.000 19.967 19.867 19.467 30.000 49.333 59.800 |
9-16 | 59.300 65.450 36.517 0.000 19.900 16.533 30.000 24.900 |
17-24 | 49.000 0.000 0.000 59.383 79.867 0.000 100.000 0.000 |
25-32 | 0.000 61.417 0.000 110.000 0.000 0.000 20.000 19.967 |
33-40 | 19.833 20.000 29.950 30.000 40.000 60.000 46.083 80.000 |
+-----------------------------------------------------------------+


rating = 3867.83


Robert Hyatt

unread,
Apr 15, 1997, 3:00:00 AM4/15/97
to

while posting, would someone check the formula that Thorsten posted?
It produced rediculous numbers when I fed Crafty's results into it.

BTW, I did not compare PV's since I didn't have 'em, but I agree with
Moritz that a program ought to produce the PV, or else the original
tests used to produce the rating equation need to be rerun using the
right move, wrong idea is still correct paradigm...

And I agree with Chris that this "rating" is hopeless. You do a
curve fit to a set of data points from known programs, (a small set
of known programs) which is a rough approximation at best. Then you
tweak the shape of the curve and the weights for each position to
get the best possible fit with those programs. Which is an
approximation of an approximation. And then you use this to predict
the rating of a program not in the original set, which might be much
faster, or much more tactical, or much more speculative, and you get
an approximation of an approximation of an approximation, "approximately"...

:)

Moritz Berger

unread,
Apr 15, 1997, 3:00:00 AM4/15/97
to

On 15 Apr 1997 16:25:52 GMT, hy...@crafty.cis.uab.edu (Robert Hyatt)
wrote:

>Here's the results from Crafty, although the 5.8 *did* not quite
>work: :)
>
> solution times (seconds)
> +---------------------------------+
> 1- 8 | 14 0 1 4 16 0 8 2 |
> 9-16 | 7 39 287 -- 3 104 0 102 |
>17-24 | 12 -- -- 91 1 -- 0 -- |
>25-32 | -- 265 -- 0 -- -- 0 1 |
>33-40 | 5 0 1 0 0 0 205 0 |
> +---------------------------------+

< snip >
>rating = 3867.83

You need to take into account the exact pv (see me previous post). I'm
sure that this will somewhat reduce your rating ...

Moritz

Robert Hyatt

unread,
Apr 15, 1997, 3:00:00 AM4/15/97
to

Moritz Berger (Moritz...@msn.com) wrote:
: On 15 Apr 1997 16:25:52 GMT, hy...@crafty.cis.uab.edu (Robert Hyatt)
: wrote:

: Moritz
: -------------
: Moritz...@msn.com

I had noted this of course. I don't have "PV" information, which makes
it difficult to compare. If some kind soul will send me some PV's, I'll
certainly check 'em and report... :)

Bob


Jouni Uski

unread,
Apr 16, 1997, 3:00:00 AM4/16/97
to

I run testsuite yesterday with Fritz3 (windows version) and noticed,
that something is
wrong at least in endgame positions, because Fritz3 finds all moves
except one in
0 - 2 seconds and it's not considered as endgame expert! Also tactical
positions are
mostly too easy for it... But I have not complete instructions yet ( CSS
takes 2 weeks
to go to Finland).

Remember always, that 100 real tournament games is equal to 5000 or more
individual test
positions!!!

I myself run BT2630, LCTII, WAC, endgametest and Kaufman test with all
new programs.
After that I know quite well the real strenght about program. E.g.
Hiarcs 6 got best
ever points for LCTII, endgametext and BT2630 so it must be strong!

Jouni Uski

Andreas Mader

unread,
Apr 16, 1997, 3:00:00 AM4/16/97
to

chrisw wrote:
>
> --
> http://www.demon.co.uk/oxford-soft
>
> Andreas Mader <ma...@p6.gud.siemens.co.at> wrote in article
> <3353F4...@p6.gud.siemens.co.at>...
> > Moritz Berger wrote:
> > >
> > > On Tue, 15 Apr 1997 12:34:35 -0700, Andreas Mader
> > > <ma...@p6.gud.siemens.co.at> wrote:
> > > < snip >
> > > >Hello Thorsten!
> > >
> > > >What you are doing is what Thomas Mally called the "Piff Paff Puff
> > > >method". The tester looks at the display, sees that the key move is
> > > >found and writes down the time. It is unimportant for him/her if the
> > > >program has "understood" the position (= show a clearly high or low
> > > >evaluation AND the right main line) or not. Maybe the program made the
> > > >right move for wrong reasons? Unimportant, the move is on the display
> > > >and that's it! Every test which is based on such "results" is
> nonsense.
> > > < snip >
> > >
> > > The Yazgac test suite is not "crash boom bang" since it doesn't look
> > > at key moves but complete lines of play ...
> > >
> >
> > Maybe Mr. Yazgacs test siute isn't "crash boom bang", but Thorstens test
> > method is (Remember, Thorsten wrote that Yazgacs method is "strange" and
> > that he did it the old way to test CST: The Piff-Paff-Puff-way!)
>
> Then maybe Thorsten should test it properly ...... ?
>
> I don't trust these tests.
>
> They get generated like this:
>
> 1. Get a load of positions.
> 2. Get a load of solve times for as many programs as possible.
> 3. Generate a polynomial to predict Elo from the results, using a
> coefficient for each position. 'adjust' the importance of each coefficient
> until you get the best fit with the known (SSDF) Elo results - this is
> quite easy by computer, just write an iteration program.
> 4. Hey presto, the graph of results and known Elo's fit nicely :)
>
> Then use it to 'predict' the Elo of other programs, hopefully there won't
> be too many of them, because lots got included in the test preparation.
>
> Now, programs which behave typically, just like all the others, will
> probably fit nicely onto the graph; but programs like CSTal, which don't
> behave like the others could end up anyplace (conveniently for me,
> apparently at the top, but maybe a non-piff-paff-puff test method would
> give another result).
>
> All this proves is that programs operating to one paradigm (the typical
> one) will behave, and programs operating differently won't.
>
> Another point: while it is true that this test is new and therefore
> programs aren't tuned to it (yet), it is also true that the test, in its
> creation, was tuned to the participating programs - therefore participating
> programs with 'good' results are, by definition, tuned to the test.
>
> Chris Whittington
>

You are quite right....

H. Bednorz and H. Thiele created a couple of test suites a few years
ago. Typical "Piff-Paff-Puff" suites, of course. Hubert Bednorz argued
that it isn't relevant if a program finds the right move for the right
reasons, the only important thing is that the move IS PLAYED by the
program. Many many people believed in the results and they still do. And
now comes the unbelievable thing: The same people are complaining
against the SSDF! I really don't know what a test suite with 40 test
positions can do that thousends of computer-computer games cannot do.

Test suites are a phenomenon: The same thing happens again with Yazgac's
suite. I believe that there will be loads of responses to CSS concerning
this test. The formulas are very impressive for the average chess
program user, so the suite "has to" be taken serious. Yazgac is no
"Piff-Paff-Puff"er, but just see what happened: Thorsten found this
method "strange" and did it just like before. Many people will do it
like Thorsten and there will be real "strange" results, just like the
one for CST...

I do not believe in test suites. Just take other programs for the
"calibration" of the parameters and you will get other weighings for the
positions. Take a more "complicated" polynomial and you will get
different results. There will be big differences.... And still one
cannot prove that the ratings for new programs are valid. So what's the
point?

Best wishes
Andreas

chrisw

unread,
Apr 16, 1997, 3:00:00 AM4/16/97
to

--
http://www.demon.co.uk/oxford-soft

Robert Hyatt <hy...@crafty.cis.uab.edu> wrote in article

<5j0ka7$d...@juniper.cis.uab.edu>...


>
> while posting, would someone check the formula that Thorsten posted?
> It produced rediculous numbers when I fed Crafty's results into it.
>
> BTW, I did not compare PV's since I didn't have 'em, but I agree with
> Moritz that a program ought to produce the PV, or else the original
> tests used to produce the rating equation need to be rerun using the
> right move, wrong idea is still correct paradigm...
>
> And I agree with Chris that this "rating" is hopeless.

I did not say it was hopeless.

I said I don't trust test suites.

However, given their limitations they are useful.

I'ld prefer a world with test-suites than one without.


> You do a
> curve fit to a set of data points from known programs, (a small set
> of known programs)

Actually, Yazgac used a very large set of programs, including some
amateurs. Very grundlich this Yazgac.

Chris Whittington

chrisw

unread,
Apr 16, 1997, 3:00:00 AM4/16/97
to


--
http://www.demon.co.uk/oxford-soft

Andreas Mader <ma...@p6.gud.siemens.co.at> wrote in article

<3354E2...@p6.gud.siemens.co.at>...

What's the point of life ?

What's the point of getting up in the morning ?

Some people find it interesting. Strange things, people.

Or there's nowt so queer as folk as they say in Yorkshire.

Robert Hyatt

unread,
Apr 16, 1997, 3:00:00 AM4/16/97
to

chrisw (chr...@cpsoft.demon.co.uk) wrote:

: --
: http://www.demon.co.uk/oxford-soft

: Robert Hyatt <hy...@crafty.cis.uab.edu> wrote in article
: <5j0ka7$d...@juniper.cis.uab.edu>...
: >
: > while posting, would someone check the formula that Thorsten posted?
: > It produced rediculous numbers when I fed Crafty's results into it.
: >
: > BTW, I did not compare PV's since I didn't have 'em, but I agree with
: > Moritz that a program ought to produce the PV, or else the original
: > tests used to produce the rating equation need to be rerun using the
: > right move, wrong idea is still correct paradigm...
: >
: > And I agree with Chris that this "rating" is hopeless.

: I did not say it was hopeless.

: I said I don't trust test suites.

: However, given their limitations they are useful.

: I'ld prefer a world with test-suites than one without.

I agree. but with reporting like "32 right, 8 wrong" rather than
"Elo=3800"... :)


: > You do a


: > curve fit to a set of data points from known programs, (a small set
: > of known programs)

: Actually, Yazgac used a very large set of programs, including some
: amateurs. Very grundlich this Yazgac.

I haven't seen/read what he did, but in computer chess, with the variety
that's there in algorithms, it would have to be a *very* big suite... and
a few programs are going to contribute solution times that would be called
"out-lying" and a curve-fit is going to nearly ignore them... Your program
might be one example... But many of the positions are way too easy
tactically. Most of the ones Crafty solved had big evals... showing at
least a couple of pawns +, to much more...


: Chris Whittington

: >
: >
: >

Keith Ian Price

unread,
Apr 16, 1997, 3:00:00 AM4/16/97
to

On Tue, 15 Apr 1997 09:36:31 GMT, mcl...@prima.ruhr.de (mclane) wrote:

I may be confused, but why are some of the best moves indicated here
in mclane's post not equivalent to the first move in the PVs posted in
Moritz' posting? Did someone type wrong, or am I reading one or the
other post wrong. Is there someplace in the US, that one can order
CSS? (And still get this issue?)

kp
>


Jean-Peter Fendrich

unread,
Apr 16, 1997, 3:00:00 AM4/16/97
to

Could it be that the difference has something to do with which programs
where used by Yazgac to construct the test?

>
> It is also very nice that we have many amateur-programs tested so far
> to get an impression about their elo.
>
> WE NEED MORE RESULTS!!!!!
>
> Thanks to Computer-Schach & Spiele and of course to Mr. Yazgac for his
> ideas.
>
> Please Mr. Yazgac: make us a better formula, that I can calculate
> Chess System Tal's Elo more properly. Or is the formula not broken for
> these high values ??? Of course we need a formula that can handle
> higher points. This test-suite was not published before, so we can be
> sure, 'til now nobody has tuned his algorithms on it.

Programs used to build up the whole thing don't even have to be tuned.
It would be intresesting to know which programs was used,
because their results are completely pointless.

Otherwise I'm sure test suites like this are useful for programmers
to get a first insight of what has happened when a new version of
the program is tested. Big differences mean something has happened!
However I wouldn't rely on ELO computed in this way.

--
J-P Fendrich

Moritz Berger

unread,
Apr 17, 1997, 3:00:00 AM4/17/97
to

On Wed, 16 Apr 1997 17:20:55 GMT, kpr...@teleport.com (Keith Ian
Price) wrote:

>I may be confused, but why are some of the best moves indicated here
>in mclane's post not equivalent to the first move in the PVs posted in
>Moritz' posting? Did someone type wrong, or am I reading one or the
>other post wrong. Is there someplace in the US, that one can order
>CSS? (And still get this issue?)

>kp

I have re-checked Thorsten's and my post: both are correct ;-)

Maybe my posts need some more clarification: I published some results
from Chess Genius 5 (lines that start with pv). In cases where Genius
deviated from the correct solution, I also gave the correct line
(cpv). Some of the cpv's contain alternate moves which I put in square
brackets.

Hope this helps

Moritz
-------------
Moritz...@msn.com

"The truth will always come out in the end."

Keith Ian Price

unread,
Apr 19, 1997, 3:00:00 AM4/19/97
to

On Thu, 17 Apr 1997 15:51:35 GMT, Moritz...@msn.com (Moritz Berger)
wrote:

>On Wed, 16 Apr 1997 17:20:55 GMT, kpr...@teleport.com (Keith Ian
>Price) wrote:
>

>>I may be confused, but why are some of the best moves indicated here
>>in mclane's post not equivalent to the first move in the PVs posted in
>>Moritz' posting? Did someone type wrong, or am I reading one or the
>>other post wrong. Is there someplace in the US, that one can order
>>CSS? (And still get this issue?)
>
>>kp
>

>I have re-checked Thorsten's and my post: both are correct ;-)
>
>Maybe my posts need some more clarification: I published some results
>from Chess Genius 5 (lines that start with pv). In cases where Genius
>deviated from the correct solution, I also gave the correct line
>(cpv). Some of the cpv's contain alternate moves which I put in square
>brackets.
>
>Hope this helps
>
>Moritz
>-------------
>Moritz...@msn.com
>
>"The truth will always come out in the end."
>(Komputer Korner, 28/03/1997 08:22 on rec.games.chess.computer)

Yes, I was wondering what the cpv meant. This explains it. Now, if I
could only find this magazine in the U.S. It is hard to find
foreign-language specialty periodicals here. If anyone knows where
this issue can be ordered, please speak up. Thanks.

kp

mclane

unread,
Apr 26, 1997, 3:00:00 AM4/26/97
to

hy...@crafty.cis.uab.edu (Robert Hyatt) wrote:

>Here's the results from Crafty, although the 5.8 *did* not quite
>work: :)

> solution times (seconds)
> +---------------------------------+
> 1- 8 | 14 0 1 4 16 0 8 2 |
> 9-16 | 7 39 287 -- 3 104 0 102 |
>17-24 | 12 -- -- 91 1 -- 0 -- |
>25-32 | -- 265 -- 0 -- -- 0 1 |

>33-40 | 5 0 1 0 0 0 205 0 |
> +---------------------------------+


> solution points
> +-----------------------------------------------------------------+
> 1- 8 | 19.533 20.000 19.967 19.867 19.467 30.000 49.333 59.800 |
> 9-16 | 59.300 65.450 36.517 0.000 19.900 16.533 30.000 24.900 |
> 17-24 | 49.000 0.000 0.000 59.383 79.867 0.000 100.000 0.000 |
> 25-32 | 0.000 61.417 0.000 110.000 0.000 0.000 20.000 19.967 |
> 33-40 | 19.833 20.000 29.950 30.000 40.000 60.000 46.083 80.000 |
> +-----------------------------------------------------------------+


>rating = 3867.83


Aha! Have you shown Kasparov these results ?!

Maybe you should send them to Computerschach & Spiele....

Robert Hyatt

unread,
Apr 27, 1997, 3:00:00 AM4/27/97
to

mclane (mcl...@prima.ruhr.de) wrote:
: hy...@crafty.cis.uab.edu (Robert Hyatt) wrote:


: >rating = 3867.83


I only hope someone will give me a good formula to use... then I can produce a sensible
number instead of the above nonsense... :)

Bob


Keith Ian Price

unread,
Apr 29, 1997, 3:00:00 AM4/29/97
to

On 27 Apr 1997 23:53:57 GMT, hy...@crafty.cis.uab.edu (Robert Hyatt)
wrote:


>I only hope someone will give me a good formula to use... then I can produce a sensible
>number instead of the above nonsense... :)
>
>Bob
>

Did you re-run the test suite using the cpv values from Moritz Berger
that I sent you? The 3800+ result was from the times required to first
find the best move, and not the correct pv. The result might be more
meaningful if you were to time it that way. If it was the same result,
then let me know what company makes the PPro you have; I want to order
from them.

Also on another note from a different post, you stated that 4
processors on a Pentium Pro would not make much difference due to
memory access conflicts between processors. Would the new PPro with
1MB on-chip L2 cache make a significant difference? In other words,
does the program loop a lot in a 1MB space, or is it accessing hash
tables so often that it would not help much?

kp

Robert Hyatt

unread,
Apr 30, 1997, 3:00:00 AM4/30/97
to

Keith Ian Price (kpr...@teleport.com) wrote:
: On 27 Apr 1997 23:53:57 GMT, hy...@crafty.cis.uab.edu (Robert Hyatt)
: wrote:

: kp

This last is a question I can't answer. Memory bandwidth is reasonably high,
however. Crafty has a pretty large "instruction/cache footprint" because a lot
of the code is written linearly rather than in loops, because it is much more
efficient to can the loops and let the cache pre-fetch instructions before they
are needed, rather than putting up with lots of bad branch predictions and the
problems of "skinny loops" with hardly anything in between branches...

in short, memory bandwidth is pretty high in Crafty, but I can't guarantee you
it's high enough that two cpu's won't produce decent performance. I hope we'll
find out later this year... I'll make crafty just as it is today, something that
will compile on such a platform and use both cpus when available, or which can compile
on a uniprocessor and still be efficient. We'll get data then. But, in general, a
single P6 cpu is memory-bound pretty badly, both from a bandwidth point-of-view as
well as a pure latency point-of-view... Both hurt significantly... When you factor
in fast/wide SCSI peripherals, suddenly memory becomes a focal point for activity...

I have not rerun the suite yet... actually, just comparing the PV's... However, I
don't see any way to do it... For example, in some positions, crafty has a different
pv, dramatically, but an eval of +5... I don't think that can be wrong if the right
move is chosen, but the PV is different. In fact, some of the positions might not
be verifiable like this anyway. For example, win at chess #2, which crafty solves
instantly, and the PV won't show the winning plan... because a short search can't
run down the many checks by the white rook that have to be handled before the passed
pawns advance and promote. But its eval "knows" about the passed pawns and that they
can't be stopped, so it likes the Rxb2 almost instantly. Is it wrong if it has the
right eval, right move, but wrong sequence of moves? Difficult to say...


0 new messages