ISO 7185 itself

ashkotin

unread,

Aug 8, 2008, 4:25:42 AM8/8/08

to

Hi all,

I am reading Standard (thanks Scott Moore) and have a question.

At 6.1.7 we have reference forward to 6.4.2.2 that is not nice at all
and specially know when we are on lexical level.
But if we look at requirements for char-type they may be split in 2
parts: necessary chars and enumeration.

So the question is:

Is this true that smallest implementation-defined set of chars is:
Apostrophe, 0..9, a..z.

Alex

Scott Moore

unread,

Aug 9, 2008, 10:12:51 PM8/9/08

to

The characters '0'..'9', 'A'..'Z', 'a'..'z' simply have special
requirements of the standard, which it states:

6.4.2.2 Required simple-types

...

1) The subset of character values representing the digits 0 to 9
shall be numerically ordered and contiguous.

2) The subset of character values representing the upper case
letters A to Z, if available, shall be alphabetically ordered
but not necessarily contiguous.

3) The subset of character values representing the lower case
letters a to z, if available, shall be alphabetically ordered but
not necessarily contiguous.

It means that 'A' (say) cannot be lower than 'Z', and the letters
must be in order. The requirement that they not necessarily be
contiguous is EBCDIC. That was the only significant character set
that had this oddity when the standard was produced, and IBM
mainframes that represented the last vestige of non-ASCII-unicode
use have effectively all been converted to ASCII use.

Also note that the lower case letters are optional as well.

The purpose of the 6.4.2.2 d clauses are exactly what they say,
nothing more. They are not a minimum character set requirement. If
you were to make that argument, then you would have to talk about
6.1.2 "special symbols" as well, as well as explain why 6.1.1
uses lower case letters (when 6.4.2.2 d clearly specifies them
as optional).

More sensible is to say that the character set of ISO 7185 is
undefined, but that 6.4.2.2 d was felt necessary in order to
specify that ord(a)-ord('0') where a is a character type will
yeild a number from 0 to 9, which many, if not most, numeric
conversion routines require.

My 2 cents.

Scott Moore

ashkotin

unread,

Aug 14, 2008, 4:13:46 AM8/14/08

to

Scott,

thank you to correct my oversight with a..z as optional.

They definitely need to same something about set-of-implementation-
defined-characters
at the very beginning of 6.1. For me this topic is worth a special
section - let's say 6.1.0, before to speak about any tokens.
And in this section they should say explicitly about implementation
dependent characters without graphic representation. NOT in 6.4.2.2. -
it's too later;)
And in this section they should say explicitly about implementation
dependent character string - line-terminator (LT)!
As in "6.1.7 Character-strings" if we ask: what kind of requirements
we have for string between apostrophes?
We have:
- apostrophes must be doubled;
- LT, if available, is forbidden as substring.

For example in MS (DOS, Windows) LT = CR+LF and I hope CR, LF
separately are possible in character-string and it just depends of my
text editor or program generator to put them in.

Alex

CBFalconer

unread,

Aug 14, 2008, 8:30:34 AM8/14/08

to

ashkotin wrote:
>
> thank you to correct my oversight with a..z as optional.

Please do not top-post. Your answer belongs after (or intermixed
with) the quoted material to which you reply, after snipping all
irrelevant material. See the following links:

<http://www.catb.org/~esr/faqs/smart-questions.html>
<http://www.caliburn.nl/topposting.html>
<http://www.netmeister.org/news/learn2quote.html>
<http://cfaj.freeshell.org/google/> (taming google)
<http://members.fortunecity.com/nnqweb/> (newusers)

--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: <http://cbfalconer.home.att.net>
Try the download section.

Scott Moore

unread,

Aug 14, 2008, 2:30:54 PM8/14/08

to

ashkotin wrote:
> Scott,
>
> thank you to correct my oversight with a..z as optional.
>
> They definitely need to same something about set-of-implementation-
> defined-characters
> at the very beginning of 6.1. For me this topic is worth a special
> section - let's say 6.1.0, before to speak about any tokens.

Why? Even today, multiple character standards are in use. Unicode is
not universally used, and ISO 8859 is a family of standards.

Having undefined elements leaves the language flexible to deal with
different conventions for specific systems. In fact, there are a lot
of things in Pascal that are quite intentionally left undefined,
starting with the value of maxint.

> And in this section they should say explicitly about implementation
> dependent characters without graphic representation. NOT in 6.4.2.2. -
> it's too later;)
> And in this section they should say explicitly about implementation
> dependent character string - line-terminator (LT)!

The standard discusses this as end-of-line, and it is left undefined.

In fact, it can be treated as a condition instead of a character, which
is proper. That is, it is detected and generated by special procedures
that are not dependent on its code or length, and it is defined to be
a space when read over.

This enables the exact end of line to be implementation dependent, and
allows programs to deal with the end of line in a system independent
manner. This is an important advantage of Pascal. It is hard to refrain
from mentioning the counter example of C, which, although it also does
not specifically require a code for end of line, did imply a length of
end of line, resulting in gross portability problems outside of C's
original base operating system ("fixed" in ANSI, the "fix" was to
specify each file on opening as text or binary data, a requirement not
unlike Pascal's requirement).

Note that the original implementation of Pascal didn't have a character
for end of line. It was a true condition, not a character.

> As in "6.1.7 Character-strings" if we ask: what kind of requirements
> we have for string between apostrophes?
> We have:
> - apostrophes must be doubled;
> - LT, if available, is forbidden as substring.

The way the standard is written, there is no end-of-line to read in as
a character. If you try to read it, it is space. It can only be detected
as a condition, so there is no way to input it as a character even if
you tried.

And so, there is no requirement to forbid a character from a string that
does not exist.

>
> For example in MS (DOS, Windows) LT = CR+LF and I hope CR, LF
> separately are possible in character-string and it just depends of my
> text editor or program generator to put them in.
>
> Alex

If you place end-of-line in a character string (between quotes), it
split into two lines, assumedly resulting in an error.

This brings back the fundamental issue at hand. The standard describes
the minimal requirements of character set in two different ways, the
first being the character set the programs process, and the second being
the character set the implementation processes, i.e., the character
set the program is written in.

Scott

John Reagan

unread,

Aug 20, 2008, 3:32:12 PM8/20/08

to

About the minimum chars, I think it is '0' thru '9' plus 'T', 'R', 'U, 'E',
'F', 'A', 'L', 'S' in either upper or lower form.

As for what the "end of record" character is, on my systems (OpenVMS), we
don't need any special characters. We don't have stream files (at least by
default). Each record in a file has a length word prefix. The length is
known without reading the data bytes. All 8-bit values are legal in strings
on my system. There is no way to do a single WRITELN that turns into "two"
records. The standard is intentionally vague to allow systems such as mine
as well as dealing with Unix/Linux systems which traditionally use
byte-stream files with a special character(s) that ends the line.

John

ashkotin

unread,

Aug 21, 2008, 6:49:14 AM8/21/08

to

John,

1. nice to get message about the only(?) OS with smart file system;)
You are right - with other OS there is an agreement around
applications to use special chars as line-terminator.
And for OS file system, file is just sequence of chars.

2. Well, may be Apostrophe is not mandatory, then minimum is 0..9.
Why do you think that [TRUEFALS] is mandatory?

Alex

winston19842005

unread,

Aug 21, 2008, 7:22:54 AM8/21/08

to

On 8/21/08 6:49 AM, in article
89da9e76-c45c-45ce...@y21g2000hsf.googlegroups.com,
"ashkotin" <alex.s...@gmail.com> wrote:

> On Aug 20, 11:32 pm, "John Reagan" <johnrrea...@earthlink.net> wrote:
>> About the minimum chars, I think it is '0' thru '9' plus 'T', 'R', 'U, 'E',
>> 'F', 'A', 'L', 'S' in either upper or lower form.
>>
>> As for what the "end of record" character is, on my systems (OpenVMS), we
>> don't need any special characters. We don't have stream files (at least by
>> default). Each record in a file has a length word prefix. The length is
>> known without reading the data bytes. All 8-bit values are legal in strings
>> on my system. There is no way to do a single WRITELN that turns into "two"
>> records. The standard is intentionally vague to allow systems such as mine
>> as well as dealing with Unix/Linux systems which traditionally use
>> byte-stream files with a special character(s) that ends the line.
>>
>> John
>
> John,
>
> 1. nice to get message about the only(?) OS with smart file system;)
> You are right - with other OS there is an agreement around
> applications to use special chars as line-terminator.
> And for OS file system, file is just sequence of chars.

Not the only. My first micro, a TI-99, used a file-system that either was
FIXED (so you knew the length of each record) or VARIABLE, where each record
was preceded by a length byte (not part of the record, just like VMS).
Actually, INTERNAL format in BASIC, each field of a record was preceded by a
length byte as well.

No characters were considered special or "translated", and every character
could be printed directly to the screen without translation (although many
of them did not have a "definition", meaning you might not see them).

Scott Moore

unread,

Aug 21, 2008, 9:32:55 AM8/21/08

to

Because it is listed in the output formats for write/ln, see 6.9.3.5.
Again, be careful, the standard talks about two wildly different kinds of
character sets, one that is required for use in the program, and one that
specifies the characters used to form the program.

I would have to differ slightly from John in that I think it is incorrect
to state that there is a minimum requirement at all. The standard nowhere
states a minimum requirement or uses wording like that, and nowhere states
a particular character set to use. Attempting to divine a minimum character
set are reading something from nothing.

Finally, see:

"
2 Normative reference

The following standard contains provisions which, through reference in this text, constitute
provisions of this International Standard. At the time of publication, the edition indicated was
valid. All standards are subject to revision, and parties to agreements based on this International
Standard are encouraged to investigate the possibility of applying the most recent edition of the
standard listed below. Members of IEC and ISO maintain registers of currently valid International
Standards.

ISO 646 :1983, Information processing|ISO 7-bit coded character set for information interchange.
"

which does not mention Unicode!

Scott

John Reagan

unread,

Aug 21, 2008, 11:41:15 AM8/21/08

to

"Scott Moore" <sam...@moorecad.com> wrote in message

>
> I would have to differ slightly from John in that I think it is incorrect
> to state that there is a minimum requirement at all. The standard nowhere
> states a minimum requirement or uses wording like that, and nowhere states
> a particular character set to use. Attempting to divine a minimum
> character
> set are reading something from nothing.
>

Well, sometimes divining is all we get. :-)

From reading, 6.9.3, we see:

"Write(f,p), where f denotes a textfile and p is a write-parameter, shall
write a sequence of zero or more characters on the textfile f; for each
character c in the sequence, the equivalent of

begin ff^ := c; put(ff) end

where ff denotes the referenced textfile, shall be applied to the textfile
f. The sequence of characters written shall be a representation of the value
of the first expression in the write-parameter p, as specified in the
remainder of this subclause."

So each representation defined subsequently in 6.9.3 produces
representations that must fit into the character c specified above. That
ends up as '0' through '9', 'T', 'R', 'U', 'E', 'F', 'A', 'L', 'S', and '+'
and '-'. (I forgot those the first time.)

I do remember having a discussion about this at some meeting in the dark and
distant past. Might have been in actual meeting time with somebody asking a
serious question about exactly what characters are required or perhaps it
was in the bar with alcohol involved. Not sure.

As for Unicode, for Extended Pascal, we did discussion what, if any, changes
would need to be made to the standard for a system that had Unicode as its
underlying representation or even systems that provided multiple
representations. Given that the standard is sufficiently vague (perhaps too
much so) on any representation, we didn't see anything to add other than
perhaps how to provide entry for characters that don't have a glyph on the
compiling system. We just threw up our hands and waited for somebody to
come up with a real example/need.

John

ashkotin

unread,

Aug 23, 2008, 9:10:20 AM8/23/08

to

> I would have to differ slightly from John in that I think it is incorrect
> to state that there is a minimum requirement at all. The standard nowhere
> states a minimum requirement or uses wording like that, and nowhere states
> a particular character set to use. Attempting to divine a minimum character
> set are reading something from nothing.
>

Well but in 6.1.7 Character-strings
we have
"string-character = one-of-a-set-of-implementation-defined-
characters .
"
so we may ask Implementer: what is your set-of-implementation-defined-
characters?
And I think an answer will be like this:
Look actually we have 256 chars and there are funny among them:
- some has no graphic representation,
- some has many graphic representation,
- and be careful with such OS as Unix, MS(DOS,WIndows), Mac OS - they
don't support lines for text files and there is special agreement to
use my chars as line terminator.

> "
> 2 Normative reference
>
> The following standard contains provisions which, through reference in this text, constitute
> provisions of this International Standard. At the time of publication, the edition indicated was
> valid. All standards are subject to revision, and parties to agreements based on this International
> Standard are encouraged to investigate the possibility of applying the most recent edition of the
> standard listed below. Members of IEC and ISO maintain registers of currently valid International
> Standards.
>
> ISO 646 :1983, Information processing|ISO 7-bit coded character set for information interchange.
> "
>

It is interesting - I found only one reference to ISO 646:
In 6.1.9 Lexical alternatives:
"NOTE | 1 The character UP-ARROW that appears in some national
variants of ISO 646 is regarded as identical to the
character ^ . In this International Standard, the character " has been
used because of its greater visibility."

Alex

Scott Moore

unread,

Aug 23, 2008, 3:03:13 PM8/23/08

to

ashkotin wrote:
>> I would have to differ slightly from John in that I think it is incorrect
>> to state that there is a minimum requirement at all. The standard nowhere
>> states a minimum requirement or uses wording like that, and nowhere states
>> a particular character set to use. Attempting to divine a minimum character
>> set are reading something from nothing.
>>
>
> Well but in 6.1.7 Character-strings
> we have
> "string-character = one-of-a-set-of-implementation-defined-
> characters .
> "
> so we may ask Implementer: what is your set-of-implementation-defined-
> characters?
> And I think an answer will be like this:
> Look actually we have 256 chars and there are funny among them:
> - some has no graphic representation,
> - some has many graphic representation,
> - and be careful with such OS as Unix, MS(DOS,WIndows), Mac OS - they
> don't support lines for text files and there is special agreement to
> use my chars as line terminator.
>

Ok. Why does this mean that the standard should specify anything concerning
character sets?

By the way, the definition for "implementation-defined" is:

3.3 Implementation-defined

Possibly differing between processors, but defined for any particular processor.

3.5 Processor

A system or mechanism that accepts a program as input, prepares it for execution, and executes the
process so defined with data to produce results.

If your argument is that there are clearly dependencies out in the world,
and the standard should list them, then the standard should give all possible
values of maxint as well.

The value of the Pascal standard lies both in what it does, and what it does
not define. Niklaus Wirth originally defined type equivalence as dependent on
the "quality" of the processor, so that:

type a = array [1..10] of integer;
b = record c: a; d: integer end;
c = record e: array [1..10] of integer; f: integer end;

might leave b and c to be the same type. The standard defined type equivalence
absolutely, which perhaps implies that the original method of type equivalence
was too confusing to implementors. In fact and practice it was, since several
Pascal implementors went on to misunderstand such features. This, however, does
not change the fact that the original method was clear enough, certainly clear
enough for anyone smart enough to implement a compiler. The standard's attempt
to "nail jelly to a tree" was well founded enough, but I would assert failed
in any case on this point, since implementors who failed to understand type
equivalence on the original standard (J&W "report") generally ignored the standard
in total.

Scott moore