I am reading Standard (thanks Scott Moore) and have a question.
At 6.1.7 we have reference forward to 6.4.2.2 that is not nice at all and specially know when we are on lexical level. But if we look at requirements for char-type they may be split in 2 parts: necessary chars and enumeration.
So the question is:
Is this true that smallest implementation-defined set of chars is: Apostrophe, 0..9, a..z.
> I am reading Standard (thanks Scott Moore) and have a question.
> At 6.1.7 we have reference forward to 6.4.2.2 that is not nice at all > and specially know when we are on lexical level. > But if we look at requirements for char-type they may be split in 2 > parts: necessary chars and enumeration.
> So the question is:
> Is this true that smallest implementation-defined set of chars is: > Apostrophe, 0..9, a..z.
> Alex
The characters '0'..'9', 'A'..'Z', 'a'..'z' simply have special requirements of the standard, which it states:
6.4.2.2 Required simple-types
...
1) The subset of character values representing the digits 0 to 9 shall be numerically ordered and contiguous.
2) The subset of character values representing the upper case letters A to Z, if available, shall be alphabetically ordered but not necessarily contiguous.
3) The subset of character values representing the lower case letters a to z, if available, shall be alphabetically ordered but not necessarily contiguous.
It means that 'A' (say) cannot be lower than 'Z', and the letters must be in order. The requirement that they not necessarily be contiguous is EBCDIC. That was the only significant character set that had this oddity when the standard was produced, and IBM mainframes that represented the last vestige of non-ASCII-unicode use have effectively all been converted to ASCII use.
Also note that the lower case letters are optional as well.
The purpose of the 6.4.2.2 d clauses are exactly what they say, nothing more. They are not a minimum character set requirement. If you were to make that argument, then you would have to talk about 6.1.2 "special symbols" as well, as well as explain why 6.1.1 uses lower case letters (when 6.4.2.2 d clearly specifies them as optional).
More sensible is to say that the character set of ISO 7185 is undefined, but that 6.4.2.2 d was felt necessary in order to specify that ord(a)-ord('0') where a is a character type will yeild a number from 0 to 9, which many, if not most, numeric conversion routines require.
thank you to correct my oversight with a..z as optional.
They definitely need to same something about set-of-implementation- defined-characters at the very beginning of 6.1. For me this topic is worth a special section - let's say 6.1.0, before to speak about any tokens. And in this section they should say explicitly about implementation dependent characters without graphic representation. NOT in 6.4.2.2. - it's too later;) And in this section they should say explicitly about implementation dependent character string - line-terminator (LT)! As in "6.1.7 Character-strings" if we ask: what kind of requirements we have for string between apostrophes? We have: - apostrophes must be doubled; - LT, if available, is forbidden as substring.
For example in MS (DOS, Windows) LT = CR+LF and I hope CR, LF separately are possible in character-string and it just depends of my text editor or program generator to put them in.
Alex
On 10 Á×Ç, 06:12, Scott Moore <sam...@moorecad.com> wrote:
> > I am reading Standard (thanks Scott Moore) and have a question.
> > At 6.1.7 we have reference forward to 6.4.2.2 that is not nice at all > > and specially know when we are on lexical level. > > But if we look at requirements for char-type they may be split in 2 > > parts: necessary chars and enumeration.
> > So the question is:
> > Is this true that smallest implementation-defined set of chars is: > > Apostrophe, 0..9, a..z.
> > Alex
> The characters '0'..'9', 'A'..'Z', 'a'..'z' simply have special > requirements of the standard, which it states:
> 6.4.2.2 Required simple-types
> ...
> 1) The subset of character values representing the digits 0 to 9 > shall be numerically ordered and contiguous.
> 2) The subset of character values representing the upper case > letters A to Z, if available, shall be alphabetically ordered > but not necessarily contiguous.
> 3) The subset of character values representing the lower case > letters a to z, if available, shall be alphabetically ordered but > not necessarily contiguous.
> It means that 'A' (say) cannot be lower than 'Z', and the letters > must be in order. The requirement that they not necessarily be > contiguous is EBCDIC. That was the only significant character set > that had this oddity when the standard was produced, and IBM > mainframes that represented the last vestige of non-ASCII-unicode > use have effectively all been converted to ASCII use.
> Also note that the lower case letters are optional as well.
> The purpose of the 6.4.2.2 d clauses are exactly what they say, > nothing more. They are not a minimum character set requirement. If > you were to make that argument, then you would have to talk about > 6.1.2 "special symbols" as well, as well as explain why 6.1.1 > uses lower case letters (when 6.4.2.2 d clearly specifies them > as optional).
> More sensible is to say that the character set of ISO 7185 is > undefined, but that 6.4.2.2 d was felt necessary in order to > specify that ord(a)-ord('0') where a is a character type will > yeild a number from 0 to 9, which many, if not most, numeric > conversion routines require.
> thank you to correct my oversight with a..z as optional.
Please do not top-post. Your answer belongs after (or intermixed with) the quoted material to which you reply, after snipping all irrelevant material. See the following links:
> thank you to correct my oversight with a..z as optional.
> They definitely need to same something about set-of-implementation- > defined-characters > at the very beginning of 6.1. For me this topic is worth a special > section - let's say 6.1.0, before to speak about any tokens.
Why? Even today, multiple character standards are in use. Unicode is not universally used, and ISO 8859 is a family of standards.
Having undefined elements leaves the language flexible to deal with different conventions for specific systems. In fact, there are a lot of things in Pascal that are quite intentionally left undefined, starting with the value of maxint.
> And in this section they should say explicitly about implementation > dependent characters without graphic representation. NOT in 6.4.2.2. - > it's too later;) > And in this section they should say explicitly about implementation > dependent character string - line-terminator (LT)!
The standard discusses this as end-of-line, and it is left undefined.
In fact, it can be treated as a condition instead of a character, which is proper. That is, it is detected and generated by special procedures that are not dependent on its code or length, and it is defined to be a space when read over.
This enables the exact end of line to be implementation dependent, and allows programs to deal with the end of line in a system independent manner. This is an important advantage of Pascal. It is hard to refrain from mentioning the counter example of C, which, although it also does not specifically require a code for end of line, did imply a length of end of line, resulting in gross portability problems outside of C's original base operating system ("fixed" in ANSI, the "fix" was to specify each file on opening as text or binary data, a requirement not unlike Pascal's requirement).
Note that the original implementation of Pascal didn't have a character for end of line. It was a true condition, not a character.
> As in "6.1.7 Character-strings" if we ask: what kind of requirements > we have for string between apostrophes? > We have: > - apostrophes must be doubled; > - LT, if available, is forbidden as substring.
The way the standard is written, there is no end-of-line to read in as a character. If you try to read it, it is space. It can only be detected as a condition, so there is no way to input it as a character even if you tried.
And so, there is no requirement to forbid a character from a string that does not exist.
> For example in MS (DOS, Windows) LT = CR+LF and I hope CR, LF > separately are possible in character-string and it just depends of my > text editor or program generator to put them in.
> Alex
If you place end-of-line in a character string (between quotes), it split into two lines, assumedly resulting in an error.
This brings back the fundamental issue at hand. The standard describes the minimal requirements of character set in two different ways, the first being the character set the programs process, and the second being the character set the implementation processes, i.e., the character set the program is written in.
About the minimum chars, I think it is '0' thru '9' plus 'T', 'R', 'U, 'E', 'F', 'A', 'L', 'S' in either upper or lower form.
As for what the "end of record" character is, on my systems (OpenVMS), we don't need any special characters. We don't have stream files (at least by default). Each record in a file has a length word prefix. The length is known without reading the data bytes. All 8-bit values are legal in strings on my system. There is no way to do a single WRITELN that turns into "two" records. The standard is intentionally vague to allow systems such as mine as well as dealing with Unix/Linux systems which traditionally use byte-stream files with a special character(s) that ends the line.
On Aug 20, 11:32 pm, "John Reagan" <johnrrea...@earthlink.net> wrote:
> About the minimum chars, I think it is '0' thru '9' plus 'T', 'R', 'U, 'E', > 'F', 'A', 'L', 'S' in either upper or lower form.
> As for what the "end of record" character is, on my systems (OpenVMS), we > don't need any special characters. We don't have stream files (at least by > default). Each record in a file has a length word prefix. The length is > known without reading the data bytes. All 8-bit values are legal in strings > on my system. There is no way to do a single WRITELN that turns into "two" > records. The standard is intentionally vague to allow systems such as mine > as well as dealing with Unix/Linux systems which traditionally use > byte-stream files with a special character(s) that ends the line.
> John
John,
1. nice to get message about the only(?) OS with smart file system;) You are right - with other OS there is an agreement around applications to use special chars as line-terminator. And for OS file system, file is just sequence of chars.
2. Well, may be Apostrophe is not mandatory, then minimum is 0..9. Why do you think that [TRUEFALS] is mandatory?
"ashkotin" <alex.shko...@gmail.com> wrote: > On Aug 20, 11:32 pm, "John Reagan" <johnrrea...@earthlink.net> wrote: >> About the minimum chars, I think it is '0' thru '9' plus 'T', 'R', 'U, 'E', >> 'F', 'A', 'L', 'S' in either upper or lower form.
>> As for what the "end of record" character is, on my systems (OpenVMS), we >> don't need any special characters. We don't have stream files (at least by >> default). Each record in a file has a length word prefix. The length is >> known without reading the data bytes. All 8-bit values are legal in strings >> on my system. There is no way to do a single WRITELN that turns into "two" >> records. The standard is intentionally vague to allow systems such as mine >> as well as dealing with Unix/Linux systems which traditionally use >> byte-stream files with a special character(s) that ends the line.
>> John
> John,
> 1. nice to get message about the only(?) OS with smart file system;) > You are right - with other OS there is an agreement around > applications to use special chars as line-terminator. > And for OS file system, file is just sequence of chars.
Not the only. My first micro, a TI-99, used a file-system that either was FIXED (so you knew the length of each record) or VARIABLE, where each record was preceded by a length byte (not part of the record, just like VMS). Actually, INTERNAL format in BASIC, each field of a record was preceded by a length byte as well.
No characters were considered special or "translated", and every character could be printed directly to the screen without translation (although many of them did not have a "definition", meaning you might not see them).
ashkotin wrote: > On Aug 20, 11:32 pm, "John Reagan" <johnrrea...@earthlink.net> wrote: >> About the minimum chars, I think it is '0' thru '9' plus 'T', 'R', 'U, 'E', >> 'F', 'A', 'L', 'S' in either upper or lower form.
>> As for what the "end of record" character is, on my systems (OpenVMS), we >> don't need any special characters. We don't have stream files (at least by >> default). Each record in a file has a length word prefix. The length is >> known without reading the data bytes. All 8-bit values are legal in strings >> on my system. There is no way to do a single WRITELN that turns into "two" >> records. The standard is intentionally vague to allow systems such as mine >> as well as dealing with Unix/Linux systems which traditionally use >> byte-stream files with a special character(s) that ends the line.
>> John
> John,
> 1. nice to get message about the only(?) OS with smart file system;) > You are right - with other OS there is an agreement around > applications to use special chars as line-terminator. > And for OS file system, file is just sequence of chars.
> 2. Well, may be Apostrophe is not mandatory, then minimum is 0..9. > Why do you think that [TRUEFALS] is mandatory?
> Alex
Because it is listed in the output formats for write/ln, see 6.9.3.5. Again, be careful, the standard talks about two wildly different kinds of character sets, one that is required for use in the program, and one that specifies the characters used to form the program.
I would have to differ slightly from John in that I think it is incorrect to state that there is a minimum requirement at all. The standard nowhere states a minimum requirement or uses wording like that, and nowhere states a particular character set to use. Attempting to divine a minimum character set are reading something from nothing.
Finally, see:
" 2 Normative reference
The following standard contains provisions which, through reference in this text, constitute provisions of this International Standard. At the time of publication, the edition indicated was valid. All standards are subject to revision, and parties to agreements based on this International Standard are encouraged to investigate the possibility of applying the most recent edition of the standard listed below. Members of IEC and ISO maintain registers of currently valid International Standards.
ISO 646 :1983, Information processing|ISO 7-bit coded character set for information interchange. "
"Scott Moore" <sam...@moorecad.com> wrote in message
> I would have to differ slightly from John in that I think it is incorrect > to state that there is a minimum requirement at all. The standard nowhere > states a minimum requirement or uses wording like that, and nowhere states > a particular character set to use. Attempting to divine a minimum > character > set are reading something from nothing.
Well, sometimes divining is all we get. :-)
From reading, 6.9.3, we see:
"Write(f,p), where f denotes a textfile and p is a write-parameter, shall write a sequence of zero or more characters on the textfile f; for each character c in the sequence, the equivalent of
begin ff^ := c; put(ff) end
where ff denotes the referenced textfile, shall be applied to the textfile f. The sequence of characters written shall be a representation of the value of the first expression in the write-parameter p, as specified in the remainder of this subclause."
So each representation defined subsequently in 6.9.3 produces representations that must fit into the character c specified above. That ends up as '0' through '9', 'T', 'R', 'U', 'E', 'F', 'A', 'L', 'S', and '+' and '-'. (I forgot those the first time.)
I do remember having a discussion about this at some meeting in the dark and distant past. Might have been in actual meeting time with somebody asking a serious question about exactly what characters are required or perhaps it was in the bar with alcohol involved. Not sure.
As for Unicode, for Extended Pascal, we did discussion what, if any, changes would need to be made to the standard for a system that had Unicode as its underlying representation or even systems that provided multiple representations. Given that the standard is sufficiently vague (perhaps too much so) on any representation, we didn't see anything to add other than perhaps how to provide entry for characters that don't have a glyph on the compiling system. We just threw up our hands and waited for somebody to come up with a real example/need.
> I would have to differ slightly from John in that I think it is incorrect > to state that there is a minimum requirement at all. The standard nowhere > states a minimum requirement or uses wording like that, and nowhere states > a particular character set to use. Attempting to divine a minimum character > set are reading something from nothing.
Well but in 6.1.7 Character-strings we have "string-character = one-of-a-set-of-implementation-defined- characters . " so we may ask Implementer: what is your set-of-implementation-defined- characters? And I think an answer will be like this: Look actually we have 256 chars and there are funny among them: - some has no graphic representation, - some has many graphic representation, - and be careful with such OS as Unix, MS(DOS,WIndows), Mac OS - they don't support lines for text files and there is special agreement to use my chars as line terminator.
> " > 2 Normative reference
> The following standard contains provisions which, through reference in this text, constitute > provisions of this International Standard. At the time of publication, the edition indicated was > valid. All standards are subject to revision, and parties to agreements based on this International > Standard are encouraged to investigate the possibility of applying the most recent edition of the > standard listed below. Members of IEC and ISO maintain registers of currently valid International > Standards.
> ISO 646 :1983, Information processing|ISO 7-bit coded character set for information interchange. > "
It is interesting - I found only one reference to ISO 646: In 6.1.9 Lexical alternatives: "NOTE | 1 The character UP-ARROW that appears in some national variants of ISO 646 is regarded as identical to the character ^ . In this International Standard, the character " has been used because of its greater visibility."
ashkotin wrote: >> I would have to differ slightly from John in that I think it is incorrect >> to state that there is a minimum requirement at all. The standard nowhere >> states a minimum requirement or uses wording like that, and nowhere states >> a particular character set to use. Attempting to divine a minimum character >> set are reading something from nothing.
> Well but in 6.1.7 Character-strings > we have > "string-character = one-of-a-set-of-implementation-defined- > characters . > " > so we may ask Implementer: what is your set-of-implementation-defined- > characters? > And I think an answer will be like this: > Look actually we have 256 chars and there are funny among them: > - some has no graphic representation, > - some has many graphic representation, > - and be careful with such OS as Unix, MS(DOS,WIndows), Mac OS - they > don't support lines for text files and there is special agreement to > use my chars as line terminator.
Ok. Why does this mean that the standard should specify anything concerning character sets?
By the way, the definition for "implementation-defined" is:
3.3 Implementation-defined
Possibly differing between processors, but defined for any particular processor.
3.5 Processor
A system or mechanism that accepts a program as input, prepares it for execution, and executes the process so defined with data to produce results.
If your argument is that there are clearly dependencies out in the world, and the standard should list them, then the standard should give all possible values of maxint as well.
The value of the Pascal standard lies both in what it does, and what it does not define. Niklaus Wirth originally defined type equivalence as dependent on the "quality" of the processor, so that:
type a = array [1..10] of integer; b = record c: a; d: integer end; c = record e: array [1..10] of integer; f: integer end;
might leave b and c to be the same type. The standard defined type equivalence absolutely, which perhaps implies that the original method of type equivalence was too confusing to implementors. In fact and practice it was, since several Pascal implementors went on to misunderstand such features. This, however, does not change the fact that the original method was clear enough, certainly clear enough for anyone smart enough to implement a compiler. The standard's attempt to "nail jelly to a tree" was well founded enough, but I would assert failed in any case on this point, since implementors who failed to understand type equivalence on the original standard (J&W "report") generally ignored the standard in total.