> Why do so many languages offer (at least) two forms of conditional > loop: one with the test at the beginning and another with the test at > the end? Why not just offer an infinite loop and a way to break out > that can be tied to any conditional?
With a look at grammars themselves, I found EBNF much easier to read and write than BNF, except for (some) repetitions. The >=0 repetition {...} does not allow to express >0 repetition, so that one has to write list ::= some lengthy construct { "," some lengthty construct }. That's why constructs like list ::= ( some lengthy construct )/",". have been added to e.g. Borland EBNF.
[The typo in the first list left in intentionally...]
Obviously the required loop constructs vary with the domain of a language. That's why IMO general purpose languages should allow for several loop constructs, which should be easy to distinguish by humans.
PL/I style keywords (as opposed to Pascal style reserved words) are an interesting topic to me, having used PL/I (as well as Pascal) and a host of other older languages. The PL/I solution seemed elegant enough that we used it in Yacc++ and documented how grammar designers could apply it to their languages. The core idea behind the PL/I solution is that there are many (or at least some) words that are only reserved in certain contexts, and that if one isn't using the word in that context there is no use in restricting the user from having it in his vocabulary (i.e. available as a user defined identifier).
Froma a grammar writing point of view, it is not particularly difficult to introduce PL/I style keywords to any LR grammar. The same changes should generally work for LL grammars also. If people are interested I can document them here.
The harder question one has to ask is whether they make writing correct programs easier or harder. They certainly make more programs legal and lessen the burden of remembering all the keywords for a given language.
However, if one has written an incorrect program, they might make deciphering the error message harder and may prevent the error being detected at the spot where it occurs. For example, consider the case where someone doesn't realize that a specific keyword has a language defined meaning at a particular spot and in the same spot an ordinary user defined identifier is allowed. Then, in that place if the user types what they think is their user defined identifier, but it happens to be a reserved word in that context, something bad will happen. If the user is lucky, the mistake will cause some kind of error because the keyword will have additional syntax following it that the user will not specify. In the unlucky case, the program will appear correct, but silently do the wrong thing, possibly not detected until it has caused some later catastrophic failure.
This is the same reason implicit declarations can be dangerous. Certain types of errors are easy to make and systems that make that harder provide more protection, even if they penalize those who don't make those errors.
Hope this helps, -Chris
*************************************************************************** *** Chris Clark email: christopher.f.cl...@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres 23 Bailey Rd voice: (508) 435-5016 Berlin, MA 01503 USA twitter: @intel_chris --------------------------------------------------------------------------- --- [IF THEN = ELSE THEN IF = ELSE; ELSE IF = THEN; -John]
> <compil...@is-not-my.name> >> I have always wondered why so many languages use English words > The answer that jumps out at me is most languages were developed in > America
Pascal was designed in Europe, and it is in English.
> However, if one has written an incorrect program, they might make > deciphering the error message harder and may prevent the error being > detected at the spot where it occurs. For example, consider the case > where someone doesn't realize that a specific keyword has a language > defined meaning at a particular spot and in the same spot an ordinary > user defined identifier is allowed. Then, in that place if the user > types what they think is their user defined identifier, but it happens > to be a reserved word in that context, something bad will happen. If > the user is lucky, the mistake will cause some kind of error because > the keyword will have additional syntax following it that the user > will not specify. In the unlucky case, the program will appear > correct, but silently do the wrong thing, possibly not detected until > it has caused some later catastrophic failure.
I don't know if the source for it was ever made public, but PL/C, a variant of PL/I written at Cornell University would be a great case study on this topic.
I'm sure many guys on the list are old enough to remember, but for those who aren't or who didn't work on IBM platforms, the idea behind PL/C was to hammer anything you handed it into a legal PL/I program and generate a working load module (executable). The program may not have done what you wanted, but it would do *something*. It was targeted at CS101 students and from what I saw sitting behind the Help Desk it did a credible job. I wish I could remember more about it.
The diagnostics ranged from helpful to hilarious. When it detected a syntax error it would correct the statement as well as it could and produce a message "PL/C USES... " and give a working syntax for the statement, which was inserted in the program at that point.
I don't know how much the thinking or logic behind it would lend itself to other languages. But it was a very interesting teaching concept.
> This is the same reason implicit declarations can be dangerous.
I'm not sure I agree with this or else FORTRAN couldn't have been very successful. A lot of non-optimal things in life do seem to work.
> Certain types of errors are easy to make and systems that make that > harder provide more protection, even if they penalize those who don't > make those errors.
Ada!
> [IF THEN = ELSE THEN IF = ELSE; ELSE IF = THEN; -John]
Ha! When I get a chance I may try compiling that under PL/I...! [For more info about PL/C. see http://ecommons.cornell.edu/handle/1813/5952 I don't know if the code is still around, but it'd be easy enough to ask if anyone wants to run it on Hercules. With respect to the danger of implicit declarations in Fortran, there are plenty of stories of broken code due to statements like DO 10 I = 1.10 which is an assignment, not a loop. -John]
> <compil...@is-not-my.name> > With respect to the danger of implicit declarations in Fortran, there are > plenty of stories of broken code due to statements like DO 10 I = 1.10 > which is an assignment, not a loop. -John]
This kind of thing has more to with a poor syntax rather than implicit declarations.
The old (orignal syntax) of FORTRAN permitted spaces anywhere (or none of them) because spaces were ignored [except in strings].
Had spaces been significant, DO 10 I would have been parsed as three separate tokens. As it was, FORTRAN parsed it as the single token "DO10I", which was a legal identifier. [That's an egregious example, but I've written plenty of buggy code where I spelled a variable name in two ways. Not really a compiler issue, though, since it's easy enough to implement either way. -John]
"robin" <robi...@dodo.com.au> writes: > Well, it did. However, FORTRAN programmers couldn't perceive that the > language was of any benefit to them.
I have seen examples of FORTRAN programmers rejecting newer languages (like Pascal, C, or C++) because of a perceived ineffectiveness of the newer language compared to FORTRAN.
While there sometimes is a real difference in effectiveness -- such as when Pascal is compiled to interpreted P-code -- the perception is often based on simple-minded experiments porting a few programs from FORTRAN to, say, C and not taking the different array layout into account: If you translate a nested loop walking over a multi-dimensional array from FORTRAN to C, you are likely to get a suboptimal order of access -- the original FORTRAN program was optimised to column-major array layout, which doesn't work well with the row-major array layout of C or Pascal.
Torben [This is an awfully long way from compilers. -John]
>Froma a grammar writing point of view, it is not particularly >difficult to introduce PL/I style keywords to any LR grammar. The >same changes should generally work for LL grammars also. If people >are interested I can document them here.
>The harder question one has to ask is whether they make writing >correct programs easier or harder. They certainly make more programs >legal and lessen the burden of remembering all the keywords for a >given language.
If you want this feature to work well then you have to design other parts of the grammar to accommodate it.
Explicit statement delimiters help, because most keywords can only occur at the start of a statement, or are only valid after a particular statement-introducing keyword.
If you are using keywords to delimit blocks then block-dependent end keywords are better, e.g. if..fi rather than if..end, so that the end keyword doesn't become de facto reserved because it's special in so many contexts. Or just use {..} punctuation.
>[IF THEN = ELSE THEN IF = ELSE; ELSE IF = THEN; -John]
Maybe we should re-introduce stropping using U+0332 COMBINING LOW LINE :-)
>> Now, why do some languages have DO ... UNTIL, where others have >> DO ... WHILE for "test at the end" loops?
If the test is after the loop, it makes sense. Mind you, I do not think that putting the test at the end makes sense. I want my loop control up-front and would rather see until <cond> body and just know that the body will be executed once for sure. Going through the body when I know the condition means that I may be able to desk-check much faster.
>I've always prefered languages that have both. I picked on a specific >Basic dialect because that was the example that irritated me the most. >*My* experience of Basic was mainly with another dialect that had >neither WHILE mor REPEAT - you had to use IF/GOTO or FOR.
BTDT. I loved it as much as you did.
>BBC Basic was also irritating to me because, unlike the earlier Basic >implementations that I knew on on micros, the ROM was large enough >to support both control structures, so why pick just one?
Microsoft BASIC 5 had WHILE but not UNTIL.
>So I can only guess why the implementers made that choice. To be fair, >a lot of choices made in Basic implementation of that era seem bizzarre >to me today. They seemed pretty odd to me at the time, but I learned a >lot about the pitfalls of language design by studying them, so at least >they had some value for me. It's a small design space, but that may have >helped me - at some point, language design comes down to the very >small details that will matter to programmers and implementers. I can >recommend this technique to anyone interested in language design - study >entire families of dialects, their evolution, their implementions, the costs >and trade-offs made, the context(s) and general family history.
I always look for the philosophy of a programming language.
>Looking at the subject for this thread, I might suggest starting with the >Algol family. ;)
Discussion subject changed to "PL/I arcana, was language design implications for variant records in a pascal-like language" by compil...@is-not-my.name
Gene Wirchenko wrote: >> BBC Basic was also irritating to me because, unlike the earlier Basic >> implementations that I knew on on micros, the ROM was large enough >> to support both control structures, so why pick just one?
> Microsoft BASIC 5 had WHILE but not UNTIL.
Yes, that struck me as odd at the time.
> I always look for the philosophy of a programming language.
Yes, that's useful too. I like Paul Graham's question about the problems a language is intended to solve. That's part of the philosophy, I guess.
We could probably ask similar questions about compilers...Error reporting and recovery has always fascinated me, probably because I was frequently frustrated by the unhelpfulness of the error msgs given by so many tools.
My first lexer buffered text at the line level and counted the characters and lines read so far, then provided them to the error reporting code so that the line number and the line itself could be given to the user, along with a '^' under the character at which the error was detected.
This only worked in my compiler because it ran in a single pass. When I began writing multipass compilers, I had to save the line and offset numbers in the parse and/or syntax trees. Yes, it does require extra effort, but several decades later my compilers still use this technique. The line text itself, however, was only ever used in error msgs in my first compiler.
From: <compil...@is-not-my.name> Sent: Tuesday, 18 January 2011 9:51 AM
>> > [IF THEN = ELSE THEN IF = ELSE; ELSE IF = THEN; -John]
>> Ha! When I get a chance I may try compiling that under PL/I...!
> Ok. Here it is. If I had not compiled it myself I would not have believed > it! > 1 LOGIC: PROC OPTIONS(MAIN); > 2 DCL (IF, THEN, ELSE) CHAR; > 3 BEGIN; > 4 IF THEN = ELSE THEN IF = ELSE; ELSE IF = THEN; > 6 END; > 7 END;
In Fortran, a similar successful compilation can be obtained for :--
IF (THEN == ELSE) THEN; IF = ELSE; ELSE; IF = THEN; END IF
In Algol, a similar constaruct is possible, but typically reserved words are enclosed in apostrophes, so it's not quite so dramatic. [I stopped writing Fortran compilers with F77. Let me tell you, it was a pain in the patoot to tell a FORMAT statement from a statement function FORMAT(A3,I2) = A3**I2 (Yes, I know how to do it. I did it, after all.) -John]
From: "robin" <robi...@dodo.com.au> Sent: Monday, 17 January 2011 11:55 AM
>> <compil...@is-not-my.name> > Had spaces been significant, DO 10 I would have been parsed as three > separate tokens. As it was, FORTRAN parsed it as the single token > "DO10I", which was a legal identifier. > [That's an egregious example, but I've written plenty of buggy code where > I spelled a variable name in two ways. Not really a compiler issue, though, > since it's easy enough to implement either way. -John]
The reason that many mis-spellings passed by unnoticed in FORTRAN was that most compilers of the time produced only a compilation listing.
IBM PL/I compilers at that time produced not only a compilation listing, but also an attribute listing (a list of identifiers, attributes, and cross-references). That way it was easy to detect mis-spelled identifiers.
Now, Fortran has not only free-form source, where blanks are significant -- which avoids constructs such as DO I = 1.10 becoming an assignment -- plus an IMPLICIT NONE statement which, when employed, reports as errors any undeclared identifiers.
IBM's current PL/I compilers also have a compiler option that causes undeclared identifiers to be classified as errors.
> From a grammar writing point of view, it is not particularly > difficult to introduce PL/I style keywords to any LR grammar. The > same changes should generally work for LL grammars also. If people > are interested I can document them here.
I'm definitely interested in seeing a description of how to do this (in my case for LL grammars). Its probably worth starting a different thead for it though.
Peter Canning <9cn6w6...@sneakemail.com> writes: >On 1/14/2011 2:04 PM, Chris F Clark wrote: >> From a grammar writing point of view, it is not particularly >> difficult to introduce PL/I style keywords to any LR grammar. The >> same changes should generally work for LL grammars also. If people >> are interested I can document them here.
>I'm definitely interested in seeing a description of how to do this (in >my case for LL grammars).
It's pretty easy for LL(1) grammars: At each position, treat all keywords as identifiers if they don't occur in the first set. E.g., in many languages "IF" occurs only as keyword at the start of a statement, so in a PL/I-style variant of those languages "IF" could be treated as identifier everywhere else.
A disadvantage of this approach is that some syntax errors can only be discovered later. E.g, to pick up the example above, if the semicolon in front of an IF is missing, a Pascal compiler will notice a syntax error when it sees the "IF", but not necessarily a compiler for a PL/I-style language. I don't know how relevant this is in practice. If only new keywords (for language extensions beyond the original standard, where new keywords often cause problems) are treated in this way, this disadvantage is probably small compared to the benefit.
Such a scheme is probably best implemented in the parser (which knows about the first-sets), but at the interface to the scanner (so the higher levels don't need to have special treatment of such identifiers).
>We could probably ask similar questions about compilers...Error reporting >and recovery has always fascinated me, probably because I was frequently >frustrated by the unhelpfulness of the error msgs given by so many tools.
You, too? I used to like reading the appendix of error messages for a compiler. Philosophy of the compiler.
>My first lexer buffered text at the line level and counted the >characters and lines read so far, then provided them to the error >reporting code so that the line number and the line itself could be >given to the user, along with a '^' under the character at which the >error was detected.
Well, I like this sort of thing. It makes it much easier for me to find the error. An ugly counterexample: SQL statements can get very long, and it can make non-specific error messages not very useful and occasionally quite frustrating.
From: "Anton Ertl" <an...@mips.complang.tuwien.ac.at> Sent: Thursday, 20 January 2011 1:45 AM
> It's pretty easy for LL(1) grammars: At each position, treat all > keywords as identifiers if they don't occur in the first set. E.g., > in many languages "IF" occurs only as keyword at the start of a > statement, so in a PL/I-style variant of those languages "IF" could be > treated as identifier everywhere else.
> A disadvantage of this approach is that some syntax errors can only be > discovered later. E.g, to pick up the example above, if the semicolon > in front of an IF is missing, a Pascal compiler will notice a syntax > error when it sees the "IF", but not necessarily a compiler for a > PL/I-style language.
A compiler for a PL/I style language will pick that up as a syntax error (i.e., a missing semicolon). IBM's PL/I will probably advise that it has assumed a semicolon and continued normally, treating the omission as a trivial error.
Such an omission is detected because there would otherwise appear to be a missing operator or some such before the "IF".
See compiler output below:-- ______________________
On Thursday 13 Jan 2011 at 18:09, noitalmost <noitalm...@cox.net> wrote:
> My language solution addresses this sort of compromise. I'm providing > traditional While, infinite Loop, and Break statements. If you have a > Break, you only need one loop construct to provide pre-, post-, and > mid-test loops. The While is provided simply for programmer convenience.
My language (WSL) has WHILE loops, FOR loops and loops with multi-level EXITs. A loop of the form DO ...statements... OD can only be terminated by an EXIT(n) statement, where n is an integer, not a variable or expression. EXIT(n) will terminate the "n" enclosing nested DO...OD loops.
WSL also includes a restricted type of GOTO in the form of an action system and action calls. Roughly speaking, labels can only appear at the top level of the program structure. CALLs (i.e. GOTOs) can only appear within IF statements and DO...OD loops, not any other kind of loop. This means that whenever you see a WHILE loop in WSL you can guarantee that: (a) the condition is true at the top of the loop body, and (b) the condition is false on termination of the loop. A REPEAT loop ensures (b) but cannot ensure (a) since any conditions are possible on the first iteration of the loop.
One major application of WSL is for transforming unstructured code (translated from assembler code) to structured code, with the minimal reduction in efficiency and while preserving any existing structure where possible. In this context the multi-level EXITs provide a very useful intermediate stage between unstructured spaghetti code and fully structured code.
The FermaT program transformation system http://www.gkc.org.uk/fermat.html is implemented almost entirely in WSL and I have tried to use either a WHILE loop or a DO...OD loop, depending on which seemed most natural for the code in question. WSL also has two types of FOR loops: FOR v := start TO end STEP step DO -- the usual "counted repetition" and FOR v IN list DO -- iterate over the elements of a list or set. WSL also has FOREACH and ATEACH loops which iterate over the components of the current program which is being transformed.
Out of a total of 73,713 lines of WSL code there are currently:
849 FOREACH loops 174 ATEACH loops 705 FOR ... IN ... loops 102 FOR ... STEP ... loops 727 WHILE loops 578 DO...OD loops (approx) 921 EXIT(1) statements 30 EXIT(2) statements 0 EXIT(3) or higher statements
Conclusions:
(1) Domain-specific looping constructs are very useful: at least for the program transformation domain!
(2) Iterating over a list or set is used (in this system) much more that iterating over a sequence of integers
(3) WHILE loops are used more often than loops with EXITs from the middle (recall that a WSL WHILE loop cannot be terminated from the middle): the extra "analysability" of a WHILE loop versus the convenience of an exit from the middle suggests that it is useful to have both in a language.
(4) It is sometimes useful to EXIT directly from a double-nested loop, but higher levels of EXITs do not occur in my code at least.