Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

PGN and other related Issues. POINTS ARISING

28 views
Skip to first unread message

M.D.Cr...@bradford.ac.uk

unread,
Jul 29, 1994, 12:41:10 PM7/29/94
to
PGN and other related Issues.

Thanks for all the replies. Mostly via E-Mail. I was trying
to raise the issues as I see them, as both a provider of
games and as a consumer of PGN games (I use Chess Assistant.)
I put a lot of ideas in the air yesterday, but left one important
thing out, which is to thank those people who produce the
wonderful utilities and the people who archive PGN. They have taken
my comments in the spirit with which they were intended. Below
I'll try and round up some of the points made to me, and add
some further ideas.

SAN
---

One of the issues I was trying to raise was the difficulties
of those trying to post games in PGN. There is the worry
of some that they are not proper pgn. My mini guide to
posting was trying to say that within reason it is not
important to post in the version of SAN that is outlined
in the Standard for pgn. Pseudo PGN is good enough to make
the games portable in most instances. For instance the
text output from NICBASE is of the form:

1.e4 g6 2.d4 Bg7 3.Nf3 d6 4.Be3 Nf6 5.Nc3 O-O 6.Qd2 a6

whereas the SAN kits is of the form:

1. e4 g6 2. d4 Bg7 3. Nf3 d6 4. Be3 Nf6 5. Nc3 O-O 6. Qd2 a6

I don't believe any program rejects the 1st in preference to
the second. But if it did the SAN kit can do the conversion.

1 e4 g6 2 d4 Bg7 3 Nf3 d6 4 Be3 Nf6 5 Nc3 O-O 6 Qd2 a6

This format is a problem for the current version of CBASCII
but is easily dealt with by the SAN kit. This will turn
it into the SAN kit version with minimal effort. In fact provided
a game has some pgn headers and an unambiguous gametext,
turning it into PGN capable of being read by any utility is very
easy. The greatest effort is in the construction of the header
fields in the first place. [More effort is required with the
5. Nge2 problem outline in my original posting than to deal
with the above.] So people who post games should lose too much sleep
over exact adherence to SAN. [I think that when it comes to
archiving the material they should though.]

ELO
---

I had two E-Mails from people regarding the usefulness of
people adding the fields:

[WhiteElo "2430"]
[BlackElo "2225"]

which would appear in my template for pgn headers in the following
manner.

[Event ""]
[Site ""]
[Date "????.??.??"]
[Round ""]
[White ""]
[Black ""]
[Result ""]
[WhiteElo ""]
[BlackElo ""]

I'll get rid of my predujice against this first, before outlining
the reasons why this is so useful. When I'm posting games I find
the accurate discovery and addition of these grades a little too
fiddley. Also I know some people already think the header field
too long without the addition of these non-compulsary fields.
That aside there are a large number of ChessBase users for whom
the addition of the ELO field is important. It allows them direct
compatablity with their own Database system. The same may be true
for NIC too (a system I'm not familiar with.)

Franz Hemmer's postings in PGN are a marvelous object lesson in
the accurate addition of ELO ratings, naming of players and general
use of PGN to be taylored towards ChessBase. This very consistancy
means that global replacement of any decisions you don't like of
his, because they don't suit your database system are very easy.
An example is :

[Event "?"]
[Site "Dortmund, Cat.16"]
[Date "1994.07.15"]
[Round "01"]
[White "Piket,J"]
[Black "Kortchnoi,V"]
[Result "1-0"]
[ECO "E04"]
[WhiteElo "2640"]
[BlackElo "2615"]

1.d4 Nf6 2.c4 e6 3.g3 d5 4.Bg2 dxc4 5.Nf3 Nbd7 6.O-O Nb6 7.Nbd2 c5 8.Nxc4 Nxc4
9.Qa4+ Bd7 10.Qxc4 Rc8 11.Ne5 b5 12.Qd3 Bd6 13.Bg5 c4 14.Qe3 h6 15.Bxf6 gxf6
16.Nxd7 Qxd7 17.Rad1 h5 18.h4 Ke7 19.d5 e5 20.Kh2 Bc5 21.Qe4 Qd6 22.e3 Kf8
23.a4 bxa4 24.Qxc4 Rb8 25.Rc1 Bb4 26.Qe4 Kg7 27.Rc6 Qd8 28.Qf5 Bd6 29.Rfc1 Qe7
30.Be4 Rxb2 31.g4 Rb7 32.gxh5 Rd7 33.Qg4+ Kh6 34.Rg1 Qf8 35.Qxd7 1-0

Note that he uses Piket,J instead of Piket, J as outlined in the
official standard. [Steven J Edwards, the guardian of PGN would
have prefered Franz's method but was voted down. However global
replacement of comma to comma space is very easy, because you know
he has taken that decision. Unfortunately anyone who has uploaded
large amounts of PGN data will notice that even within the same file
spellings and decisions are different. (and I have been, inspite of
my best efforts as guilty of this as anyone.) ChessAssistant
automatically deletes the comma on import to be compatable with the
way its names are presented. Note: it must take Franz Hemmer hours
to achieve the level of accuracy he does.

Note the :

[Event "?"]
[Site "Dortmund, Cat.16"]

this is another good illustration of a problem. ChessBase, NicBase
and ChessAssistant all use this field for different purposes. No
standard is ever going to be be able to deal with the problem of
what to put in these fields. This has been my choice, closer to the
standard, but probably not much use to anyone.

[Event "Dortmund International Tournament"]
[Site "Dortmund FDR"]

If I wanted to import into the ChessAssistant system I would
probably have made the choice.

[Event "It 16"]
[Site "Dortmund (GER)"]

It seems to me that there is a need for a global editor that
might facilitate the fast editing of pgn headers to suit the
purpose that the user might want it for.


CARSTEN HANSEN'S UTILITIES.
---------------------------

I'm extremely glad to say that Carsten solved the Alien data
problem last March, or at least no-one has complained since!
Also as Steven Rix points out in his article that

"CB2PGN has a -short option, which can help. Maybe NIC2PGN does too."

So this may be worth investigating for those who output using
his utilities. [Its amazing the things you find out isn't it.]

SOME SUGGESTIONS
----------------

SAN KIT.
-------

1) It would be nice if the kit were faster!
2) That it skipped illegal games rather than breaking down.
3) It provided options to keep non essential fields such as
the ELO field.
4) It parsed LAN without resort to the emde function which
should be reserved for the worst cases.

PGN EDITOR
----------

It would be lovely to have a PGN editor to cut across the
problems outlined at the end of the ELO section. You could
add players ELOs automatically if it was written correctly.
Perhaps a standard spellchecker for chessplayers names and
places too.

COMPRESSION
-----------

It would be nice to have a binary compression of pgn
[perhaps one that lists the fields once, however many,
then deletes them. Then generally compresses the whole
file. Then obviously and util to unpack at the other end.
Some of the current pgn files take ages to upload. The
text files are really large. [It would have to be MAC, DOS
and UNIX compatable.]

Mark Crowther

Steven J Edwards

unread,
Jul 29, 1994, 4:08:32 PM7/29/94
to
M.D.Cr...@bradford.ac.uk writes:

>PGN and other related Issues.

>I had two E-Mails from people regarding the usefulness of
>people adding the fields:

>[WhiteElo "2430"]
>[BlackElo "2225"]

>which would appear in my template for pgn headers in the following
>manner.

>[Event ""]
>[Site ""]
>[Date "????.??.??"]
>[Round ""]
>[White ""]
>[Black ""]
>[Result ""]
>[WhiteElo ""]
>[BlackElo ""]

A minor point here: althoug the template may have empty strings for
many fields, the single character string "?" should be used for those
tag values that are unknown after the rest of the data is entered.

I had wanted it to be "Victor Kortchnoi", but there was too much
well-argued opposition. The comma was introduced as a necessary
measure to seperate a player's surname in a uniform manner. The space
following the comma is present due to traditional puncuation rules,
among other reasons. In any case where a name is abbrieviated to an
initial (including transliterated initials), the abbreviation should
have a single period suffix.

Minimum: [White "Kortchnoi"]

Slightly better: [White "Kortchnoi, V."]

Best: [White "Kortchnoi, Victor"]

Oh, and the Round tag value should be "1" instead of "01".

>replacement of comma to comma space is very easy, because you know
>he has taken that decision. Unfortunately anyone who has uploaded
>large amounts of PGN data will notice that even within the same file
>spellings and decisions are different. (and I have been, inspite of
>my best efforts as guilty of this as anyone.) ChessAssistant
>automatically deletes the comma on import to be compatable with the
>way its names are presented. Note: it must take Franz Hemmer hours
>to achieve the level of accuracy he does.

Quite true, I'm sure. Now that we finally have a standard for
representing player name and other information, let's all use it so we
can work on chess instead of clerical stuff. Mr. Hemmer, as well as
yourself, has done a great service by posting portable data to the
newsgroup. Many, many people have benefited as a result.

If a particular product unconditionally diddles with name data in an
inappropriate way, the vendor should be told so that the problem can
be corrected.

>Note the :

>[Event "?"]
>[Site "Dortmund, Cat.16"]

>this is another good illustration of a problem. ChessBase, NicBase
>and ChessAssistant all use this field for different purposes. No
>standard is ever going to be be able to deal with the problem of
>what to put in these fields. This has been my choice, closer to the
>standard, but probably not much use to anyone.

The above is one of the most troublesome issues. The legacy data we
have is rather wanting in the above area, and it's mostly because of
the temptation to cram as many things as possible into an unexpandable
format. The adverbs "what" and "where" are clearly distinct, even to
youngsters. That's why PGN has "Event" and "Site". What could be
simpler? These are separate things that database users would like to
see as separate key fileds.

Here's a suggestion for those who wish to move data between PGN and
some other format where the other format crams together event and site
information: Select some rarely used printing ASCII character like,
say, '^' and use it for a special internal delimiter for crammed
event and site data in a non-PGN format. A converter program can use
this to split (or merge) the data when importing (or exporting) to (or
from) PGN.

Example:

Old crammed way: "Liverpool Dockworker's Delight"

New crammed way: "Dockworker's Delight^Liverpool ENG"

PGN: [Event "Dockworker's Delight"]
[Site "Liverpool ENG"]

>SOME SUGGESTIONS
>----------------

>SAN KIT.
>-------

>1) It would be nice if the kit were faster!

Of course. No argument from me. However, the SAN Kit is also fairly
small and so it is a good candidate for a modest, second hand computer
that needs something to do. I've seen such for sale for under US$100.
Another idea is to use a multiprocessing system so the Kit can be run
in the background.

>2) That it skipped illegal games rather than breaking down.

When I convert a long file, I use the comment braces to mark off the
section of "known good" data starting at the top of the input file. I
reset the closing brace after each fix, and after the last fix I
remove the pair before making a final pass. I keep an editor in one
window while the Kit runs in another.

>3) It provided options to keep non essential fields such as
>the ELO field.

This will be fixed, as will other shortcomings.

>4) It parsed LAN without resort to the emde function which
>should be reserved for the worst cases.

At some point I had to decide what the effects of nmde and emde were
vs. the default. I suppose someone could add another option for
something between the default and full emde. The source is there for
the adventurous.

>PGN EDITOR
>----------

>It would be lovely to have a PGN editor to cut across the
>problems outlined at the end of the ELO section. You could
>add players ELOs automatically if it was written correctly.
>Perhaps a standard spellchecker for chessplayers names and
>places too.

This is starting to get into the realm of what probably should be a
commercial product. There are many users who will not likely be happy
unless they get the kind of support that "only money can buy", and
producing such a beast will need time and resources, both of which
must be financed somehow.

>COMPRESSION
>-----------

>It would be nice to have a binary compression of pgn
>[perhaps one that lists the fields once, however many,
>then deletes them. Then generally compresses the whole
>file. Then obviously and util to unpack at the other end.
>Some of the current pgn files take ages to upload. The
>text files are really large. [It would have to be MAC, DOS
>and UNIX compatable.]

This is described in the section on PGC (Coded PGN) in the PGN
Standard. PGC allows very good compression of game data. I have
moved PGC files among the three machines you've mentioned without
difficulty.

-- Steven (s...@world.std.com)

Steven Rix

unread,
Aug 1, 1994, 6:09:41 AM8/1/94
to

In article <CtpxA...@world.std.com>, s...@world.std.com (Steven J Edwards) writes:
->M.D.Cr...@bradford.ac.uk writes:
->
->If a particular product unconditionally diddles with name data in an
->inappropriate way, the vendor should be told so that the problem can
->be corrected.

Possibly because of the way it treats names, Chess Assistant-derived PGN
doesn't handle double names well. If Jean-Pierre Boudre is playing White
against Karpov, CA produces White "Boudre Jean", Black "Pierre Karpov
Anatoly". Jean-Rene Koch was another troublesome name.

->>Note the :
->>[Event "?"]
->>[Site "Dortmund, Cat.16"]
->>this is another good illustration of a problem. ChessBase, NicBase
->>and ChessAssistant all use this field for different purposes. No
->>standard is ever going to be be able to deal with the problem of
->>what to put in these fields. This has been my choice, closer to the
->>standard, but probably not much use to anyone.
->
->The above is one of the most troublesome issues. The legacy data we
->have is rather wanting in the above area, and it's mostly because of
->the temptation to cram as many things as possible into an unexpandable
->format. The adverbs "what" and "where" are clearly distinct, even to
->youngsters. That's why PGN has "Event" and "Site". What could be
->simpler? These are separate things that database users would like to
->see as separate key fileds.

Yeah, but by the time you've written Event "New York Open", Site "New York,
NY, USA", you might have taken up as much space as the game record itself
(when encoded). If the New York Open is obviously in New York, much of
the information is redundant. This may be the case in general for chess
tournaments, eg Linares 1994 is an adequate description of the event.
What else to put in the Event field? I don't like Chess Assistant's "It"
(presumably International Tournament), because most events are either
restricted nationality (a Championship) or "international".


->>SAN KIT.
->>-------
->>1) It would be nice if the kit were faster!
->Of course. No argument from me. However, the SAN Kit is also fairly
->small and so it is a good candidate for a modest, second hand computer
->that needs something to do.

CBASCII converts PGN to ChessBase at over ten times the rate of the SAN
kit doing pseudo-PGN to PGN; these must be comparable tasks. CBASCII is
only 40Kb or so when Zipped (eg pub/chess/Game-Databases/Tools/ChessBase/
cba1_3.zip on chess.uoknor.edu); some of that is documentation and an
illustrative test suite.

->>2) That it skipped illegal games rather than breaking down.
->
->When I convert a long file, I use the comment braces to mark off the
->section of "known good" data starting at the top of the input file. I
->reset the closing brace after each fix, and after the last fix I
->remove the pair before making a final pass. I keep an editor in one
->window while the Kit runs in another.

Why not run straight through, writing validated games to the output file
and invalid games to an error file (eg invalid.pgn)? Even better, highlight
in some way the bit in the PGN which caused the problem. This way, you just
leave the SAN kit chugging away, wake up a few hours later and fix the file
of dud games, appending them to the first file. Would this be hard to do?

->>4) It parsed LAN without resort to the emde function which
->>should be reserved for the worst cases.
->
->At some point I had to decide what the effects of nmde and emde were
->vs. the default. I suppose someone could add another option for
->something between the default and full emde. The source is there for
->the adventurous.

The point is that a conversion from pseudo-PGN to Standard PGN is a
many-to-one operation. Many alternative ways of writing the notation, one
Standard output. When trying to convert, plausible and common alternative
nomenclatures should be included naturally, if they are simply different
ways to write the move without ambiguity. Example are long algebraic, use
of colon rather than "x" to denote captures, presence or absence of a full-
stop after the move number (and space after this full stop), as well as
over-specified moves such as Nge2. All these are commonly-seen ways to
describe the moves, it's just that they aren't the standard. Mark thinks
that "svop emde" should only be needed for games copied from UK teletext,
where rather extraordinary effort is needed to work out what the hell is
going on...

->>PGN EDITOR
->>----------
->
->>It would be lovely to have a PGN editor to cut across the
->>problems outlined at the end of the ELO section. You could
->>add players ELOs automatically if it was written correctly.
->>Perhaps a standard spellchecker for chessplayers names and
->>places too.

It's called ChessBase and CBASCII, Mark! You enter the moves in ChessBase
with a mouse (fast and accurate), enter the player details, Elo ratings,
source, etc, leave ChessBase, run the ECO utility to include the ECO codes
(some of which will actually be correct), then CBASCII to create standard
PGN. Okay, so it doesn't spell check for you, but you do get a database
program thrown in!

->>COMPRESSION
->>-----------
->
->>It would be nice to have a binary compression of pgn
->>[perhaps one that lists the fields once, however many,
->>then deletes them. Then generally compresses the whole
->>file. Then obviously and util to unpack at the other end.
->>Some of the current pgn files take ages to upload. The
->>text files are really large. [It would have to be MAC, DOS
->>and UNIX compatable.]

I think it would be worth converting the PGN databases at uoknor to
ChessBase format. The advantages are that the files downloaded would be
significantly smaller than file.pgn.gz and so would arrive faster,
ChessBase and Chess Assistant users can import the games directly, NIC
users can do CB2NIC (which presumably takes about as long as PGN2NIC).
If you want the PGN, then tools like CBASCII and CUPGN can convert.
Can CBASCII be compiled to run on Unix as well as DOS?

--
Steve Rix (ste...@chemeng.ed.ac.uk)
"A morbid, Edinburgh-based Chemical Engineer" - and no misprint!

0 new messages