language design implications for variant records in a pascal-like language

noitalmost

unread,

Dec 22, 2010, 3:03:54 PM12/22/10

to

A while back, I created a toy language and compiler with a syntax similar to
Pascal or Oberon (like what you'd do in an undergrad compiler course). Now I'm
working on expanding it to a "real" language. I plan eventually to add a class
type (in the sense of Java or C++ classes), but at present I'm working on
expanding some of the "procedural language" components, such as a selection
statement, pointers, floating point type, etc.

My compiler is rather unsophisticated. It's recursive-descent with a scanner
and parser written "from scratch". It loosely follows Wirth's Oberon0 compiler
in his Compiler Construction book.

My plan is to have records remain as essentially fixed, compile-time entities,
so that they can be efficiently compiled. Classes, on the other hand, provide
polymorphism and the like. The classes are designed to provide more programmer
flexibility at the price of efficiency.

My question is this: Would variant records provide significant value to a user
of my language (given that classes were implemented)? My programming
experience has been that I frequently see unions in C code, but not very often
in C++ or Java (unless it's specifically to interface with C libraries). So if
I'm trying to keep my language (and compiler) fairly small, is there a good
case for variant records, or could everything be handled through classes
without too much inconvenience to the user programmer?

George Neuner

unread,

Dec 23, 2010, 5:11:58 PM12/23/10

to

On Wed, 22 Dec 2010 15:03:54 -0500, noitalmost <noita...@cox.net>
wrote:

C++ forbids a union of classes ... so you won't ever see one.

With respect to POD variant record types, I guess they depends on how
you see your language being used. Packed POD records (including
variant types) are very useful for hardware interfacing in a "systems"
language, but in an "application" language the only real attraction of
POD types is that they are (usually) lighter weight and faster than an
equivalent class (though dynamic dispatch can be made O(1) if you are
serious about speed ... unfortunately very few class implementations
are that serious).

Personally, I would say add variant records if it isn't too hard and
won't cause conflict with your intended class implementation. IMO it
never hurts to give the programmer choices.

Wirth eliminated variant records from Oberon, but I think he made some
questionable choices there: he also got rid of enumerations,
subranges, displaced arrays (all array indices start at zero), and
counted loops. Doing so made (small parts of) the compiler simpler,
but made the programmer's job harder ... the wrong way to go IMO.
[Wirth reinstated counted loops in Oberon-2 ... an admission, I think
that he was wrong about leaving them out of Oberon.]

George

Daniel Zazula

unread,

Dec 24, 2010, 6:01:10 AM12/24/10

to

> My question is this: Would variant records provide significant value to a user
> of my language (given that classes were implemented)?

> So if

> I'm trying to keep my language (and compiler) fairly small, is there a good
> case for variant records, or could everything be handled through classes
> without too much inconvenience to the user programmer?

IMHO classes are just an evolution of records, properties are fields,
in heavily OO languages like Java and C# all classes including simple
types like integer and string (boxed as objects) descend from a primal
object in such case a variant record is not needed because you can
declare a property of that primal object that it will accept any data
type in the language.

Best Regards
Daniel Zazula

Robert A Duff

unread,

Dec 24, 2010, 12:50:20 PM12/24/10

to

noitalmost <noita...@cox.net> writes:

> My question is this: Would variant records provide significant value to a user
> of my language (given that classes were implemented)?

No.

Take a look at Ada as an example of a language that has both features.
It's for historical reasons: Ada 83 had variant records (pretty much
the same as Pascal), and object-oriented features were added in the
1995 version.

A variant record is basically an upside-down class hierarchy.

Ada's variant records can do some things that classes can't. Variant
records are not extensible, so the compiler knows all the variants,
and so does the programmer. If there are four variants, the tag field
can fit in 2 bits. You can't do that with classes (unless you're
willing to do whole-program analysis). And you can do case statements
on the tag field of a variant record, and the compiler can check that
you didn't forget any of the possibilities.

You could decide you don't care about these "features" of variant
records. Or you could design your class feature so that the
programmer can optionally specify all the possibilities up front.
Usually, you don't want to do that, because you want extensibility.
But there are exceptions. For example, in Smalltalk, Boolean is a
class with two subclasses True and False. But the extensibility there
is an illusion: if somebody added a third subclass of Boolean, it
would break most programs. Boolean is fundamentally two-valued; it's
not at all "object oriented".

You should also check out how types are declared in OCaml.

- Bob

Dmitry A. Kazakov

unread,

Dec 25, 2010, 5:21:46 PM12/25/10

to

On Fri, 24 Dec 2010 12:50:20 -0500, Robert A Duff wrote:

> Boolean is fundamentally two-valued; it's
> not at all "object oriented".

This is untrue. Boolean lattice can be extended to tri-valued lattice,
four-valued lattice (Belnap logic), infinitely valued lattice (fuzzy logic)
and many other logics. Yes, some properties of the Boolean lattice can be
lost (e.g. the law of excluded middle, some idempotent operations could
lose this property etc). But any non-trivial extension of any set always
loses some properties of the original set. Booleans are object oriented as
anything else. The mathematical notion of lattice is one of the classes of
which the Boolean type is an instance, very OO.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Hans-Peter Diettrich

unread,

Dec 25, 2010, 10:19:02 AM12/25/10

to

noitalmost schrieb:

> I'm trying to keep my language (and compiler) fairly small, is there a good
> case for variant records, or could everything be handled through classes
> without too much inconvenience to the user programmer?

Variant records can simplify memory management of polymorphic data,
eliminating additional class instances (=heap objects) when embedded in
other objects, and can reduce the number of new/additional objects
during type conversions. External libraries (databases, COM) can use
polymorphic records as arguments, so that such a feature may be a MUST
for using such APIs. Otherwise every programmer should be happy with
more type-safe polymorphic classes.

DoDi

Gene

unread,

Dec 27, 2010, 10:58:42 AM12/27/10

to

True. And the ability to trivially extend types by adding an enumerated set
of additional values would IMO be a good feature. For example, extension with
a value "Unknown" would obviate varied ad hoc idioms: null pointers, parallel
return values, etc. To bring this back to the variant records topic, I often
encounter needs for types like

type Possibly_Valid_Float(Valid : Boolean := False) is
record case Valid of
when True => Value : Float;
when False => Null;
end record;

and then replace Float with any other type. But the syntax for setting and
using these is of course completely different from that needed for Float.

Much nicer would be

type Possibly_Valid_Float is new Float with (Unknown);

X, Y : Possibly_Valid_Float := 0.0;

function "/" (Num, Den : Possibly_Valid_Float)
return Possibly_Valid_Float is
begin
if Num = Unknown or Den = Unknown then
return Unknown;
end if;
return X / Y;
exception
when others => return Unknown;
end;

Etc...

Merry Christmas, all.

Robert A Duff

unread,

Dec 27, 2010, 2:03:49 PM12/27/10

to

>> Boolean is fundamentally two-valued; it's
>> not at all "object oriented".
>
> This is untrue.

OK, you're right, but...

>...Boolean lattice can be extended to tri-valued lattice,
>...[etc]

But not without breaking programs that use ifTrue:
and whileTrue: and so forth (which is approximately
100% of them). ;-)

- Bob

noitalmost

unread,

Dec 27, 2010, 1:58:47 PM12/27/10

to

Thanks all for your replies. They've been helpful.

Tentatively, I'm calling my language Wipl (for Wirth Inspired Programming
Language).

Supposing Wipl, Oberon, and Ada to lie in the same "nanny language" category,
Wipl sits much closer to Oberon than to Ada, though I think Oberon somewhat of
a minimalist extreme.

As for the use of Wipl, I would like to keep it in the "general purpose"
category, mainly for applications, but also poentially for OS development. I
was hoping one day to translate minix or (a subset of) linux to Wipl. This
goal is what got me thinking about variant records.

I suppose I have an initial prejudice against variant records. They seem to me
to be a potential source of hard to find bugs (for the user programmer, I mean,
not the compiler designer). Am I wrong about this? Is it possible for the
compiler to always know which type is active in the variant, like say through
a hidden compiler-generated variable in the record?

BGB

unread,

Dec 27, 2010, 3:48:03 PM12/27/10

to

yeah.

admittedly, I am not familiar enough with Pascal-family languages to
have a good idea of whether or not supporting these as a language
feature is a good idea.

my thinking is that most cases where variability is needed, likely using
classes and inheritance is a better option.

this leaves structs/records as fixed-form, but the use of fixed-form
structs is where they are most useful IME.

providing only (reference-based) classes is likely to not be as good,
since classes hinder some of the sorts of semantics and optimizations
possible with record/struct/... types.

however, if both reference-based and value-based class semantics are
provided (more like in C++), then likely no explicit struct/record types
are not so much needed (a struct would be simply a non-inherited class
also lacking any virtual methods).

however, supporting the above is likely to be less trivial for a VM,
since struct/record/value-type style uses tend to imply a known physical
layout, which creates some additional issues when things like
inheritance and virtual methods are introduced.

hence, supporting only reference-based classes and value-based
records/structs may be a reasonable option, since this eliminates the
implied need that classes have a particular physical layout, or that
value-types support potentially problematic features.

for example, C# went the above route.

side note:

OTOH, I am designing (yet another) VM and language (BGBScript2 or BS2),
which uses both Class/Instance OO (single-inheritance, mostly
Java-style), and will supports struct (C# style semantics) and may
support unions (I am undecided on this, as there are pros to cons with
unions...).

the language design uses a syntax mostly based on Java and ActionScript3
with some C# elements (and possibly some elements retained from
BGBScript and Scheme).

its intention is mostly for scripting, but may also be used for
"portable components", probably with the VM supporting C, Java, C#
(likely), and BS2 as source languages (full C++ is unlikely).

the bytecode in use is mostly derived from that used in my BGBScript VM
and also by my C compiler. it will use a form of latent static-typing
(where the typesystem is statically typed, however types are not
necessarily determined until link-time or JIT).

previously I had looked some into using Java and using a JVM-based
design for my stuff, and got a largely working implementation (of a
J2ME-like subset), but changed course after concluding that the JVM is
just too ill-designed for my uses to be a worthwhile base, and decided
to design a different VM to be used here instead (for the time being,
JVM bytecode will remain, just I will halt extension efforts and likely
only use the bytecode for Java...).

I also got into a big argument with people over the matter of VM support
of struct types (at the time, still in the context of using them as a
hack onto the customized-JVM):
I was asserting that value semantics are useful and simplify VM-side
optimizations.

then a big argument ensued over whether or not a standard JVM can
perform the same optimizations with C/I objects, and over if this
actually matters in my case given I am not using a standard JVM (and so
could not likely support the same optimization).

it was my thinking that static lifetime analysis is unlikely to be
generally usable with C/I objects, as there would be far too many cases
where the lifetime of a C/I object will likely be unable to be
statically determined by the VM, meaning that likely many/most objects
being used as value-types will end up as heap garbage (whereas the
effective lifetime of a proper value-type is trivial to determine).

as another side note: the new language design also includes a 'delete'
keyword... so that, if one knows what they are doing, they can make the
object go away and not have to worry about whether or not the VM can
statically determine that it has gone away and can be reclaimed
immediately...

among this, and related issues (mostly with type semantics, declaration
scope handling, ...), I decided to just make another VM design, where I
could simply do things how I wanted without the need for design
contortions or social approval (or having to deal with likely trademark
issues).

yes, all this probably sounds like I am just recreating .NET or similar
in a different form, but IMO this is a fairly different design from .NET
as well.

in the end, maybe it is all pointless, but oh well...

Robert A Duff

unread,

Dec 28, 2010, 6:22:54 PM12/28/10

to

Gene <gene.r...@gmail.com> writes:

> True.

Not clear what you're replying to.

> Much nicer would be
>
> type Possibly_Valid_Float is new Float with (Unknown);

OK, but you can't expect existing code that uses Float to
work properly with Possibly_Valid_Floats.

Also, you can't have Possibly_Valid_Floats other than those
two possibilities -- it's either a Float or it's not.

You can do this sort of thing in OCaml. Type "Maybe(Float)" is either
a float or nothing. It's pretty elegant, and doesn't require any
special "enumeration types" or "enumeration type extensions".

- Bob

Dmitry A. Kazakov

unread,

Dec 29, 2010, 4:30:44 AM12/29/10

to

On Mon, 27 Dec 2010 14:03:49 -0500, Robert A Duff wrote:

>>...Boolean lattice can be extended to tri-valued lattice,
>>...[etc]
>
> But not without breaking programs that use ifTrue:
> and whileTrue: and so forth (which is approximately
> 100% of them). ;-)

Right, the logical inference becomes different. You can execute

while Condition loop
...
end loop;

at the level of certain truth, and this would mean the third value
Uncertain treated as False. Or you can execute it gullibly continuing when
Uncertain.

BTW, specifically to Ada, it would not be a problem for if- or
while-statement, because Condition there has the type Boolean, not
Boolean'Class, so it cannot become Uncertain anyway. At best when doing
this:

while Boolean (Get_User_Input) loop
...
end loop;

here function Get_User_Input returns Boolean'Class, which might be
Tri_State_Logical. Then the required explicit conversion to Boolean would
raise Constraint_Error for Uncertain. And the programmer who writes the
program in the terms of the class rather the type Boolean must be prepared
to deal with any instance from the class. Otherwise, it is broken per
design.

But this issue is rather to be addressed to the illusions about
substitutability of types. Nothing is absolutely substitutable, not only
Tri_State_Logical for Boolean. Any extension or specialization potentially
breaks something. This sad fact cannot serve an argument against derivation
of new types from old ones.

BartC

unread,

Dec 29, 2010, 6:42:08 AM12/29/10

to

"noitalmost" <noita...@cox.net> wrote in message

I thought the variant record also stored a tag field so that it is possible
to determine, at run time, exactly which fields are valid in the variant
part (although I can't remember exactly how this would be used in a
language; perhaps assign to the tag first, then write the appropriate
variant field which would then need to be runtime checked)).

This means the compiler wouldn't know what was going on in the record; it
would just have the headache of generating code to check the correct read
and write accesses are being done. (And I understand these variant fields
can be nested, an even bigger headache, if it's necessary to verify an
arbitrary path through a tree for any field access!)

In the case of a C union, no tag is stored; it's up to the programmer to
keep track of what shared field is currently active. But C is a low level
language and unions are one of the features it needs to support that aim.
And compiler support for unions can be minimal.

(In my own designs, I use neither variant fields nor unions. I use instead
field aliases:

record r (
int a, b
real x @ a
int c
)

so here x (64 bits) shares the same offset as a, and the same space as a and
b (32 bits each).

But this is also intended for nefarious purposes, and is an untidy feature
of a higher level language, with it's own problems (how many fields are in
this record?). However compiler support is straightforward: x is simply
assigned offset 0 instead of 8.)

So, my feeling would be to leave them out, provided you have enough other
features to make your OS work practical.

--
Bartc

Hans-Peter Diettrich

unread,

Dec 29, 2010, 11:01:59 AM12/29/10

to

noitalmost schrieb:

> I suppose I have an initial prejudice against variant records. They seem to me
> to be a potential source of hard to find bugs (for the user programmer, I mean,
> not the compiler designer). Am I wrong about this? Is it possible for the
> compiler to always know which type is active in the variant, like say through
> a hidden compiler-generated variable in the record?

Most (Wirthian) variant records have an (optional) Tag field, but most
implementations don't use it for the selection of one variant at
runtime, most probably for compiler simplicity and performance issues.
Ada may be more strict here (dunno).

At least there must exist rules for what happens when the Tag field is
changed. IMO the variant part should be cleared then - otherwise the Tag
if of no real value for the compiler and coder.

DoDi

Marco van de Voort

unread,

Dec 30, 2010, 12:14:41 PM12/30/10

to

In Delphi it doesn't make sense, but what I remember from reading ISO
Pascal sites (Scott Moore's, GPC) it made sense there, because in ISO Pascal
it is possible to dynamically allocate memory for different cases of the
variant record. And the allocated sizes of the cases could differ.

The tag field(s) were used to select the correct variant (and thus size)
using special new syntax like new(x,tag1[,tagn]);

George Neuner

unread,

Dec 30, 2010, 3:56:55 PM12/30/10

to

On Tue, 28 Dec 2010 18:22:54 -0500, Robert A Duff
<bob...@shell01.TheWorld.com> wrote:

>Gene <gene.r...@gmail.com> writes:
>
>> type Possibly_Valid_Float is new Float with (Unknown);
>

>You can do this sort of thing in OCaml. Type "Maybe(Float)" is either
>a float or nothing. It's pretty elegant, and doesn't require any
>special "enumeration types" or "enumeration type extensions".

Ocaml's alternation types are enumerated - it just isn't visible at
the language level. In the general case, alternation types are
implemented as variant records with compiler generated tags. When the
correct alternation can be statically determined, the compiler is
permitted to substitute the alternation's simple record type in place
of the variant type.

George

George Neuner

unread,

Dec 30, 2010, 4:06:08 PM12/30/10

to

On Mon, 27 Dec 2010 13:58:47 -0500, noitalmost <noita...@cox.net>
wrote:

>I suppose I have an initial prejudice against variant records. They
>seem to me to be a potential source of hard to find bugs (for the
>user programmer, I mean, not the compiler designer). Am I wrong about
>this? Is it possible for the compiler to always know which type is
>active in the variant, like say through a hidden compiler-generated
>variable in the record?

It isn't always possible to know statically which alternation is
active ... but it is possible to make certain the code covers all
cases and to optimize paths for which the correct alternation is
known. The ML family of languages (ML, SML, Caml, Ocaml, etc.)
guarantee to cover all cases and (usually) also can optimize the
different alternation paths.

George

George Neuner

unread,

Dec 30, 2010, 5:57:05 PM12/30/10

to

On Wed, 29 Dec 2010 11:42:08 -0000, "BartC" <b...@freeuk.com> wrote:

>"noitalmost" <noita...@cox.net> wrote in message
>>

>> I suppose I have an initial prejudice against variant records. They
>> seem to me to be a potential source of hard to find bugs (for the
>> user programmer, I mean, not the compiler designer). Am I wrong
>> about this? Is it possible for the compiler to always know which
>> type is active in the variant, like say through a hidden
>> compiler-generated variable in the record?
>
>I thought the variant record also stored a tag field so that it is possible
>to determine, at run time, exactly which fields are valid in the variant
>part (although I can't remember exactly how this would be used in a
>language; perhaps assign to the tag first, then write the appropriate
>variant field which would then need to be runtime checked)).

The tags can be compiler generated and invisible to the programmer. A
safe implementation would have one or more tag values that mean the
record data is not valid. Assuming the tag field itself can be
atomically updated, a "safe" update would be

- set the tag field to "invalid"
- update the fields of the selected subrecord
- set the tag field to indicate which subrecord is valid

[I know this would be invalid in Pascal, Ada, etc. ... see below.]

>This means the compiler wouldn't know what was going on in the record;

In general, it only means the compiler may not always know (but then
again, it might). Obviously, programmer alternation code may be buggy
or even deliberately lie to the compiler, however, the alternation may
be resolved by compiler generated code (which won't lie and is much
less likely to be buggy).

Just as in OO dispatch, variant records can be resolved at function
(or in general any scope) boundaries and the compiler then can know
the correct alternation to use within that scope and further down the
call chain.

>it would just have the headache of generating code to check the correct
>read and write accesses are being done. (And I understand these variant
>fields can be nested, an even bigger headache, if it's necessary to verify an
>arbitrary path through a tree for any field access!)

The problem here, I think, is that everyone is fixated on Pascal's
questionable implementation. IMNHO, Wirth royally screwed up by
permitting programmer access to the tag field and by requiring the
programmer to set it correctly ... particularly, in conjunction with
permitting the new() function to accept alternation tag values and
return differently sized blocks.

I can't imagine why Ada chose to perpetuate these misfeatures.

To see good implementations of variant records, take a look at
alternation (sum) types in the ML family of languages.

George

Robert A Duff

unread,

Dec 31, 2010, 8:57:37 AM12/31/10

to

George Neuner <gneu...@comcast.net> writes:

> The problem here, I think, is that everyone is fixated on Pascal's
> questionable implementation. IMNHO, Wirth royally screwed up by
> permitting programmer access to the tag field and by requiring the
> programmer to set it correctly ... particularly, in conjunction with
> permitting the new() function to accept alternation tag values and
> return differently sized blocks.

Permitting access to the tag field is not the problem,
nor is using differently sized blocks.
Permitting the tag to be arbitrarily modified is the problem.

> I can't imagine why Ada chose to perpetuate these misfeatures.

I can't imagine that either. ;-)

Ada doesn't perpetuate the misfeatures of Pascal in this area.
In Ada, you can't change the tag fields (called "discriminants")
without filling in all the other fields. For ex:

type T (Tag: Positive := 1) is
record
case Tag is
when 1 | 3 .. 10 =>
This : Boolean;
when 2 | 11 .. Positive'Last =>
That : Character;
end case;
end record;

(Note that all Positive values must be covered.
The tag could be of an enumeration type, instead.)

X : T := (Tag => 1, This => True); -- X.That doesn't exist.
Y : T := (Tag => 2, That => 'Z');

X := Y; -- modifies Tag, but also fills in That; This disappears.
X := (Tag => 10, This => True);

But "X.Tag := ...;" is illegal (at compile time).
So you can't get garbage for This and That, as you
can in Pascal. There's no need for a special "invalid"
tag value.

It is also possible in Ada to allocate just the right amount
of space for the particular variant, in which case you can't
modify the tag at all (not even by the whole-record assignments
as shown above). This works for stack-allocated variables
as well as heap-allocated ones. You can't do that with
C unions, which is an annoying limitation. You can do
it in Pascal, but as you said, it's unsafe, because
the tag field can be set to a wrong value.

If you do a case statement on the tag:

case X.Tag is
when 1 | 3 .. 10 =>
X.This := not X.This;
when 2 | 12 .. Positive'Last => -- Illegal -- 11 is missing
X.That := 'A';
end case;

it's illegal (at compile time) if any values are missing,
as above.

> To see good implementations of variant records, take a look at
> alternation (sum) types in the ML family of languages.

Yes, the ML way is better than the Ada way, for most purposes.
One thing you can do in Ada is to control the bit-level
layout, which may be useful in interfacing to hardware
devices and the like.

- Bob

Robert A Duff

unread,

Dec 31, 2010, 9:05:40 AM12/31/10

to

George Neuner <gneu...@comcast.net> writes:

> Personally, I would say add variant records if it isn't too hard and
> won't cause conflict with your intended class implementation. IMO it
> never hurts to give the programmer choices.

I rather strongly disagree with that last sentence. Giving the
programmer more choices means the programmer has more to learn,
and the compiler writer has more work to do. That's a good
idea only if the alternatives are sufficiently useful.

In this case, I don't think it's worth it -- I think it's possible
to design a single language feature that does everything classes
can do, and everything variant records can do. No need for
two completely different features, each with its own syntax
and semantics.

- Bob
[I have to agree. You want a language with lots and lots of choices,
look at PL/I. -John]

Florian Weimer

unread,

Dec 31, 2010, 3:06:32 PM12/31/10

to

* George Neuner:

> C++ forbids a union of classes ... so you won't ever see one.

C++0X adds a restricted form. Unions of classes are fairly common in
C++ libraries and are currently emulated with placement news and some
hopeful magic to deal with alignment concerns.

> With respect to POD variant record types, I guess they depends on how
> you see your language being used. Packed POD records (including
> variant types) are very useful for hardware interfacing in a "systems"
> language, but in an "application" language the only real attraction of
> POD types is that they are (usually) lighter weight and faster than an
> equivalent class (though dynamic dispatch can be made O(1) if you are
> serious about speed ... unfortunately very few class implementations
> are that serious).

It is very difficult to use memory-mapped I/O without machine code
insertions with modern compilers. It's also likely that the device's
idea of memory layout differs from that of the host system. This
means that useful language support in this area is going to be very
difficult to provide.

Variant types are somewhat challenging to support if their values can
be (partically) unbounded. For instance, an object of such a type
cannot be placed directly into a record. Adding an indirection when
necessary certainly helps here. If you don't want to do this, you're
really facing an uphill battle because without additional measures, a
simple module system will hide the variable-sized nature of a type.

On the other hand, if all objects of variant type have some reasonable
upper bound on their size, things should be fairly straightforward.
The nice thing about variant types (particularly in constrast to
objects) are exhaustiveness checks, that is, the compiler will make
sure you've covered all cases when iterating over a particular data
structure.

noitalmost

unread,

Dec 31, 2010, 5:02:23 PM12/31/10

to

On Thursday, December 30, 2010 04:06:08 pm George Neuner wrote:
> It isn't always possible to know statically which alternation is
> active ... but it is possible to make certain the code covers all

For now, I'm not going to include variant records in my language, but I'm
going to try to keep my options open for later. So I guess I'd like an example
of where it would be nice to have a variant record. So, assuming our language
is C++ (so we eliminate cases in C where a union is used only because we don't
have inheritance), can someone give me a system progamming example that
requires a union? I'm still looking for that "aha!" moment of its usefulness
if the language has classes.

Also, is alternation the general term for this type of construct, or is that
just the way one says union in ML?

glen herrmannsfeldt

unread,

Jan 2, 2011, 12:57:24 AM1/2/11

to

Robert A Duff <bob...@shell01.theworld.com> wrote:
(snip)

> I rather strongly disagree with that last sentence. Giving the
> programmer more choices means the programmer has more to learn,
> and the compiler writer has more work to do. That's a good
> idea only if the alternatives are sufficiently useful.

(snip)

> [I have to agree. You want a language with lots and lots of choices,
> look at PL/I. -John]

Part of the design of PL/I was that you wouldn't have to learn all of
it to use it. That was used as the reason for not having reserved
words. (You have to at least know the word to know not to use it.)
Also, the huge list of reserved words for COBOL may have been on the
mind of PL/I designers.

But in many cases PL/I does better than modern Fortran as far as
giving a reasonable number of useful choices.

To keep compatible with older programs, many features from older
versions of Fortran have been kept. PL/I only allows passing arrays
by descriptor, but Fortran has assumed shape (descriptor), and assumed
size (left over, pretty much pass by reference).

-- glen

Gene

unread,

Jan 2, 2011, 5:05:18 PM1/2/11

to

On Friday, December 31, 2010 5:02:23 PM UTC-5, noitalmost wrote:
> For now, I'm not going to include variant records in my language, but I'm
> going to try to keep my options open for later. So I guess I'd like an example
> of where it would be nice to have a variant record. So, assuming our language
> is C++ (so we eliminate cases in C where a union is used only because we don't
> have inheritance), can someone give me a system progamming example that
> requires a union? I'm still looking for that "aha!" moment of its usefulness
> if the language has classes.
>
> Also, is alternation the general term for this type of construct, or is that
> just the way one says union in ML?

You'd use a union or vr anywhere a block of memory must serve multiple
purposes in different contexts. One example is a buffer manager. A buffer
block can be either in use or free. When in use, as much space as possible
should be usable as buffer space. When free, that same space can be used
entirely for free list pointers and other administrative data. The VR tag
indicates the current use.

Disclaimer: I'm familiar with but far from an expert in ML.

Alternation types in ML--I've also heard them called disjunctive types--and
unions/VRs in Pascal-like languages do serve a common logical purpose of
programming language abstraction. When you want a name to represent one of a
set of other types, give it an AT or VR respectively.

On the other hand, they are not identical. Or rather their semantics differ
along with the storage models of the languages in which they're embedded. On
one hand are Pascal-like languages (including Ada, C, C++ in the current
context) where names refer to blocks of memory. On the other there are the
lisp descendants and functional languages (like Common Lisp, ML, Ruby, and in
part Java, Python, Perl, PHP, and others) where names correspond to references
to blocks of memory, i.e. pointers. At the implementation level of the latter
model, alternation types (I have also heard them called disjunctive types)
amount to a pointer being able to refer to objects of different types and
sizes. (I am ignoring "unboxing" optimizations; they lead to multiple types
being able to occupy the same 4- or 8-byte block in lieu of a pointer.)

In other words, unions/VRs in Pascal-like languages are often used explicitly
to allow a single block of storage to serve multiple purposes. Since the
lisp-like languages take the programmer out of the storage management
business, this purpose can't be served by their disjunctive types. In the
buffer manager example, ML might overlay admin data over buffer space in the
same block of memory, or it might use blocks of different sizes for the
different purposes, creating garbage every time a name takes on a different
variant value. The programmer doesn't have explicit control.

Happy New Year, all.

Torben �gidius Mogensen

unread,

Jan 4, 2011, 6:05:37 AM1/4/11

to

noitalmost <noita...@cox.net> writes:

> My question is this: Would variant records provide significant value
> to a user of my language (given that classes were implemented)?

You can also reverse the question: Given that you have records and
(tagged) unions (== sum types), will classes provide significant value?

If I had to chose either sum types or classes, my choice will be sum
types -- especially if you have ML-style pattern matching.

For example, syntax trees are much easier to represent using sum types
than using classes.

You can define subtyping without introducing classes:

A record Y is a subtype of a record X if for every field f of type t in
X there is a field f of type t' in Y such that t' is a subtype of t.

A sum type Y is a subtype of a sum type X if for every tag (constructor)
c in Y with type t there is a tag c in X with type t' such that t is a
subtype of t'.

Note the symmetry: Subtypes of a sum have fewer tags and subtypes of a
record have more fields. The empty record can be seen as the top type
in the subtype relation while the empty sum (no possible value) is the
bottom type.

Torben

Jean-Marc Bourguet

unread,

Jan 5, 2011, 8:57:36 AM1/5/11

to

tor...@diku.dk (Torben C gidius Mogensen) writes:

> You can also reverse the question: Given that you have records and
> (tagged) unions (== sum types), will classes provide significant value?

Classes provides modular addition of types, but unmodular addition of
operation.

Unions provides modular addition of operation, and unmodular addition of
types.

I choose one or the other depending on which evolution is more needed.

The problem is when both kinds is probable. Some names that the
expression problem (and last time I looked, it has no good solution)

Yours,

--
Jean-Marc

George Neuner

unread,

Jan 6, 2011, 3:45:23 PM1/6/11

to

On Tue, 04 Jan 2011 03:43:15 -0500, George Neuner
<gneu...@comcast.net> wrote:

>On Fri, 31 Dec 2010 08:57:37 -0500, Robert A Duff
><bob...@shell01.TheWorld.com> wrote:
>
>>George Neuner <gneu...@comcast.net> writes:
>>
>>> The problem here, I think, is that everyone is fixated on Pascal's
>>> questionable implementation. IMNHO, Wirth royally screwed up by
>>> permitting programmer access to the tag field and by requiring the
>>> programmer to set it correctly ... particularly, in conjunction with
>>> permitting the new() function to accept alternation tag values and
>>> return differently sized blocks.
>>
>>Permitting access to the tag field is not the problem,
>>nor is using differently sized blocks.
>>Permitting the tag to be arbitrarily modified is the problem.

That's what I thought I said. The problem was that the tag could be
modified independently and so might not correspond to the layout
selected at allocation.

>>> I can't imagine why Ada chose to perpetuate these misfeatures.
>>
>>I can't imagine that either. ;-)
>>
>>Ada doesn't perpetuate the misfeatures of Pascal in this area.
>>In Ada, you can't change the tag fields (called "discriminants")

>>without filling in all the other fields. For ex: ...

Yes, I goofed in saying Ada ... I was thinking about the differences
between variant handling in Pascal's follow-ons Modula 2 and Ada, and
then I typed the wrong one.

Modula 2 handled tagged variants just as did Pascal and added the
untagged variant for bad measure. The compiler would complain if you
accessed a member not in the selected alternation, but it did not
enforce setting all the members.

I do agree that Ada made exposed tags safe(r). Even so, I personally
don't like tags to be exposed even if the compiler does enforce
modifying all the variant members together.

>>> To see good implementations of variant records, take a look at
>>> alternation (sum) types in the ML family of languages.
>>
>>Yes, the ML way is better than the Ada way, for most purposes.
>>One thing you can do in Ada is to control the bit-level
>>layout, which may be useful in interfacing to hardware
>>devices and the like.

Yes. ML wasn't designed with device control in mind and hidden tags
(for any purpose) create problems for overlaying a record schema on an
arbitrary address range. But that can be handled with a memory
management scheme that allows the tag(s) to be stored separately from
the record.

George

George Neuner

unread,

Jan 6, 2011, 3:45:23 PM1/6/11

to

On Fri, 31 Dec 2010 21:06:32 +0100, Florian Weimer <f...@deneb.enyo.de>
wrote:

>* George Neuner:
>
>> C++ forbids a union of classes ... so you won't ever see one.
>
>C++0X adds a restricted form.

I hadn't noticed that, thanks!

>Unions of classes are fairly common in C++ libraries and are currently
>emulated with placement news and some hopeful magic to deal with
>alignment concerns.

Nothing forbids doing this, AFAICS, but the results would be
implementation dependent. I have to think about it for awhile to
convince myself it would be safe, but just offhand I would be worried
about issues with classes having multiple inheritance.

>> With respect to POD variant record types, I guess they depends on how
>> you see your language being used. Packed POD records (including
>> variant types) are very useful for hardware interfacing in a "systems"
>> language, but in an "application" language the only real attraction of
>> POD types is that they are (usually) lighter weight and faster than an
>> equivalent class (though dynamic dispatch can be made O(1) if you are
>> serious about speed ... unfortunately very few class implementations
>> are that serious).
>
>It is very difficult to use memory-mapped I/O without machine code
>insertions with modern compilers. It's also likely that the device's
>idea of memory layout differs from that of the host system. This
>means that useful language support in this area is going to be very
>difficult to provide.

Not really. I've done a quite a bit of device programming and I can
say from experience that it isn't that hard.

PIO registers are fairly simple to deal with, typically needing only
additional barrier instructions to guarantee back-to-back ordered
accesses. Marking the PIO range non-cacheable makes reads more
efficient, but you trade that against slightly slower writes if the
out-going data is static or slow changing.

It is not uncommon for a device's memory to be bit reversed vs the
host's memory ... I might have this backwards (it's been a while since
I did any hardware debugging) but, IIRC, DRAM address decoders are
little-endian, so you need to reverse the bus connections to a
big-endian CPU. That means data is stored backwards relative to the
CPU, but it doesn't matter because it is reversed again on the read
back. Every device I've seen that shared memory with the host had the
external bus connected so that the data appeared in the correct
bit-order for the host.

So really there are only three issues of consequence: the device
having byte endianness opposite the host, the device having memory of
a different width, and address translation between device and host.

Byte endianness is simple to deal with ... just agree on a transfer
format and decide which side will reverse multi-byte data.

Differing memory width can be a minor problem. Many DSPs, for
example, are VLIW (e.g., 48 bit instructions) and thus have oddly
sized memories which can be accessed at different widths (typically in
multiples of 16-bits). Downloading code, then, is a matter of
formatting it correctly (endianness) and then when copying it to
access the device memory using a width that allows the whole odd size
word to be filled.
Data transfers usually aren't a problem because data will be in
standard 8/16/32/64 bit widths.
[Many DSPs have a DMA channel that can repack data on the fly - for
example, 3 32-bit words into 2 48-bit words (or the reverse). Usually
repacking DMA also will take care of byte endianness so DMA often is
preferable to having shared memory.]

Address translation shouldn't be an issue unless the device is a DMA
bus master. With shared memory, the device and the host can access
the block using their own private addresses and, hopefully, the data
will not contain relative addresses (offsets are fine, though). For
DMA, the target side has to provide the address and a transfer size
has to be agreed on as well as data repacking (if relevant). If the
host will handle the DMA, it can easily translate a device relative
address by adding/subtracting the device's base address. For a
bus-mastering device however, the device needs a complete host target
addresses because the device typically does not know where it lives in
the host's memory map.

>Variant types are somewhat challenging to support if their values can
>be (partically) unbounded. For instance, an object of such a type
>cannot be placed directly into a record. Adding an indirection when
>necessary certainly helps here. If you don't want to do this, you're
>really facing an uphill battle because without additional measures, a
>simple module system will hide the variable-sized nature of a type.

Yes. Although you can use a facility similar to Ada's discriminated
records in which the variably sized members become fixed at
allocation.

The problem is that you must either fix the alternation at allocation
(making the variant case immutable) or somehow allocate for the
largest alternation. I can't say I ever tried it, but AFAIK, Ada
doesn't have any way to describe multiple discriminated cases when
allocating a variant record so that the compiler can allocate
sufficient space to handle any of them. I think in such a case the
programmer has to figure out which alternation case will be the
largest, allocate that case (which will be fixed) and then cast the
result back into a mutable record.

George

George Neuner

unread,

Jan 6, 2011, 3:45:23 PM1/6/11

to

On Fri, 31 Dec 2010 09:05:40 -0500, Robert A Duff
<bob...@shell01.TheWorld.com> wrote:

Rearranged a bit for continuity:

>George Neuner <gneu...@comcast.net> writes:
>
>> Personally, I would say add variant records if it isn't too hard and
>> won't cause conflict with your intended class implementation. IMO it
>> never hurts to give the programmer choices.
>
>I rather strongly disagree with that last sentence. Giving the
>programmer more choices means the programmer has more to learn,
>and the compiler writer has more work to do. That's a good
>idea only if the alternatives are sufficiently useful.
>

>- Bob
>[I have to agree. You want a language with lots and lots of choices,
>look at PL/I. -John]

Why do so many languages offer (at least) two forms of conditional
loop: one with the test at the beginning and another with the test at
the end? Why not just offer an infinite loop and a way to break out
that can be tied to any conditional?

You're absolutely right that a language doesn't need 10 ways to
accomplish the same thing ... I fully agree that having too many
equivalent choices is needless waste. But apparently redundant
features can be justified by programmer convenience as well as for
unique uses.

>In this case, I don't think it's worth it -- I think it's possible
>to design a single language feature that does everything classes
>can do, and everything variant records can do. No need for
>two completely different features, each with its own syntax
>and semantics.

What classes can do depends on your frame of reference. Most OO
languages have object implementations that are quite limited in
expressiveness.

In any event, I agree that it's possible to provide a single feature
that has both POD and polymorphic behavior using the same syntax. In
fact it has already been done.

When OO was new, some early Pascal implementations had no distinct
class type but rather offered extended record types (including
variants) which permitted "member" functions/procedures to be
logically attached to the type and called using the field selection
operator: e.g., "x = rec.func()". Sans user defined pointer fields,
these extended records still were POD types so they could be used
either as records or as objects.

George

George Neuner

unread,

Jan 6, 2011, 3:45:23 PM1/6/11

to

On Fri, 31 Dec 2010 17:02:23 -0500, noitalmost <noita...@cox.net>
wrote:

>For now, I'm not going to include variant records in my language, but I'm
>going to try to keep my options open for later. So I guess I'd like an example
>of where it would be nice to have a variant record. So, assuming our language
>is C++ (so we eliminate cases in C where a union is used only because we don't
>have inheritance), can someone give me a system progamming example that
>requires a union? I'm still looking for that "aha!" moment of its usefulness
>if the language has classes.

The prototypical system programming case is network packet
translation: a packet having an unknown format is read into a buffer
and then various "templates" are overlaid on the buffer to decode the
data.

And yes, this can be done without using variant records, but the
coding is somewhat simpler using them. Because network packet data
contains identifying tags, when using variant records the programmer
doesn't have to manually locate and examine the tags and select a
static record format to overlay.

On the application side, variant records can be useful for FFI
(foreign language functions) and for serialization. Generally, any
kind of tagged data transfer may be fodder for a variant record. It
doesn't matter whether the data is packetized or streamed - in the
stream case you just need to make sure you've got a whole record
before you try to do something with the data.

>Also, is alternation the general term for this type of construct, or is that
>just the way one says union in ML?

"Alternation" is a type theoretic term referring to the set of members
that make up a particular variant case. I use the term "alternation"
preferentially because, depending on the language under discussion,
the terms "variant" and "case" may be overloaded and, without
qualification everywhere, the discussion can quickly become confusing.

ML calls its variant records "algebraic" or "sum" types. In ML the
variant tags are hidden from the programmer and managed by the
compiler. The correct alternation of a sum type is selected by
pattern matching against the (maybe partial) list of field names to be
accessed.

George

Robert A Duff

unread,

Jan 6, 2011, 5:24:01 PM1/6/11

to

glen herrmannsfeldt <g...@ugcs.caltech.edu> writes:

> Part of the design of PL/I was that you wouldn't have to learn all of
> it to use it.

A worthy goal.

>...That was used as the reason for not having reserved

> words. (You have to at least know the word to know not to use it.)

I don't see that. I mean, you don't actually have to memorize the
list of reserved words. You get an error message, saying you used a
reserved word as an identifier, so you fix it and recompile. After a
while, you've memorized all the reserved words you're likely to use
incorrectly. Failure to know some obscure reserved word won't cause a
(run-time) bug.

At least, that's true if the compiler gives good error messages about
obscure reserved words. "Reserved word 'until' used as an
identifier.", as opposed to "Compiler saw 'until', and is now
hopelessly confused...<cascade of junk messages>". That requires the
language syntax to make such good error messages feasible -- which is
more-or-less the same sort of syntax that can allow a
reserved-word-free syntax.

> Also, the huge list of reserved words for COBOL may have been on the
> mind of PL/I designers.

Sounds likely. In other words, "COBOL has too many reserved words,
therefore reserved words are evil, therefore let's eliminate reserved
words." Overreaction?

- Bob
[Back when PL/I was designed, "recompile" meant resubmitting a deck of
cards and potentially waiting several hours. To avoid that problem,
COBOL programmers did wacky things like starting all variable names
with a digit, just to avoid the reserved words. These days I agree,
reserved words are less of a big deal, but modern languages don't have
400 reserved works like COBOL does. -John]

Hans-Peter Diettrich

unread,

Jan 8, 2011, 2:36:40 PM1/8/11

to

George Neuner schrieb:

> Why do so many languages offer (at least) two forms of conditional
> loop: one with the test at the beginning and another with the test at
> the end? Why not just offer an infinite loop and a way to break out
> that can be tied to any conditional?

There exist languages with a single loop statement, allowing to break at
every end :-)

do [while...] [until...]
...
loop [while...] [until...]

Breaking out of loops should always be possible, even if some people
don't like that, so that your suggestion should work with almost every
language. More critical are Continue statements, that allowed my C
decompiler to distinguish between For and While loops.

DoDi

robin

unread,

Jan 9, 2011, 9:00:53 PM1/9/11

to

"Robert A Duff" <bob...@shell01.TheWorld.com> wrote:

> glen herrmannsfeldt <g...@ugcs.caltech.edu> writes:
>
>> Part of the design of PL/I was that you wouldn't have to learn all of
>> it to use it.
>
> A worthy goal.
>
>>...That was used as the reason for not having reserved
>> words. (You have to at least know the word to know not to use it.)
>
> I don't see that. I mean, you don't actually have to memorize the
> list of reserved words. You get an error message, saying you used a
> reserved word as an identifier, so you fix it and recompile.

That's assuming that the compiler actually tells you. The classic
case was a CDC COBOL compiler that didn't. It crashed, giving no
indication of what was wrong. It tured out that the first user word
was a reserved word.

However, for COBOL, the list was around 300 reserved words. Having to
waste a day's time (or a week's time in some sites) to get a message
back with error messages relating to a reserved word that had been
used inadvertently as an identifier was frustrating, to say the least.

> After a while, you've memorized all the reserved words you're
> likely to use incorrectly. Failure to know some obscure reserved
> word won't cause a (run-time) bug.

When there are hundreds of them, it gets tedious after a while.

That was part of the reason. The other reason was to accommodate
possible future additions to the language (or even alterations to the
language). If words are not reserved, then such changes can be made
without affecting existing programs. That has been justified time and
time again in both Fortran and PL/I which have been extended over the
past 40 or so years.

>> Also, the huge list of reserved words for COBOL may have been on the
>> mind of PL/I designers.
>
> Sounds likely. In other words, "COBOL has too many reserved words,
> therefore reserved words are evil, therefore let's eliminate reserved
> words." Overreaction?

Not at all. Just good compiler design. AFAIK, at that time FORTRAN
did not have reserved words. Neither did ALGOL. (Typical
implementations of ALGOL used underlining of keywords, or enclosed
keywords in apostrophes, so that keywords as such were not an issue.)

glen herrmannsfeldt

unread,

Jan 10, 2011, 3:37:40 AM1/10/11

to

Robert A Duff <bob...@shell01.theworld.com> wrote:

(previously I wrote)

>> Part of the design of PL/I was that you wouldn't have to learn all of
>> it to use it.

> A worthy goal.

>>...That was used as the reason for not having reserved
>> words. (You have to at least know the word to know not to use it.)

> I don't see that. I mean, you don't actually have to memorize the
> list of reserved words. You get an error message, saying you used a
> reserved word as an identifier, so you fix it and recompile.

I usually try to fix more than one between compiles, but sometimes.

But when computer time was $100/CPU hour, and the time between job
submission and when you saw the results was hours or days, then it
mattered more.

> After a while, you've memorized all the reserved words you're likely
> to use incorrectly. Failure to know some obscure reserved word
> won't cause a (run-time) bug.

You hope that all misused reserved words cause a fatal compilation
error, but I am not sure that is always true.

> At least, that's true if the compiler gives good error messages
> about obscure reserved words. "Reserved word 'until' used as an
> identifier.", as opposed to "Compiler saw 'until', and is now

I believe disallowing implicit declaration is pretty important to be
sure that misused reserved words are detected, but then that is usual
in newer languages.

But also reserved words complicates adding new features, along with
new reserved words, to an existing language. Well, if someone plans
ahead and reserves some words for future use, such as Java reserving
goto just in case they some year decide to add it.

>> Also, the huge list of reserved words for COBOL may have been on the
>> mind of PL/I designers.

> Sounds likely. In other words, "COBOL has too many reserved words,
> therefore reserved words are evil, therefore let's eliminate reserved
> words." Overreaction?

Counting only English words, C has a fairly small number of reserved
words. (I don't usually use strlen in conversation, but I might use
length.) Even so, once in a while I will find one that I otherwise
would have used.

I do remember in some Fortran programs years ago using FORMAT as the
name of an array for storing a run-time format into. Also, one might
want if as the variable between ie and ig.

> [Back when PL/I was designed, "recompile" meant resubmitting a deck
> of cards and potentially waiting several hours. To avoid that
> problem, COBOL programmers did wacky things like starting all
> variable names with a digit, just to avoid the reserved words. These
> days I agree, reserved words are less of a big deal, but modern
> languages don't have 400 reserved works like COBOL does. -John]

I have still not written any COBOL, though some day maybe just to
say that I did. I do remember the stories, though, of having to
keep the list nearby to avoid using any of the words.

I didn't know you could start with a digit. Verilog also allows
for that, and I did have to do it once.

-- glen

Hans Aberg

unread,

Jan 10, 2011, 4:26:02 AM1/10/11

to

On 2011/01/06 21:45, George Neuner wrote:
> Why do so many languages offer (at least) two forms of conditional
> loop: one with the test at the beginning and another with the test at
> the end? Why not just offer an infinite loop and a way to break out
> that can be tied to any conditional?

The reason for the different forms is merely that they help structuring
code in common programming.

Semantically, the C form
do a while ( p ) ;
is equivalent to
while (a, p) ;
but the latter is less expressive. And
while ( p ) a;
is equivalent to
for (; p;) a;

comp...@is-not-my.name

unread,

Jan 10, 2011, 1:23:33 PM1/10/11

to

> Why do so many languages offer (at least) two forms of conditional
> loop: one with the test at the beginning and another with the test at
> the end? Why not just offer an infinite loop and a way to break out
> that can be tied to any conditional?

Because industry listened to academics, and they shouldn't have,
ever. I always said the two worst things that ever happened to the
software industry were Wirth and Dijkstra. Object COBOL?! Where will
it end?

Remember the good old days when there was no conditional loop? Either
you checked a condition and did a GO TO or you didn't have a
loop. Life was good and simple. And yes, I still prefer my source code
in all caps.

[Yeah, I remember those days, making drum cards for the keypunch where
I was fixing my Fortran programs. Can't say I miss them much. -John]

Martin Ward

unread,

Jan 12, 2011, 6:51:09 AM1/12/11

to

On Thursday 06 Jan 2011 at 22:24, John wrote:
> To avoid that problem,
> COBOL programmers did wacky things like starting all variable names
> with a digit, just to avoid the reserved words.

The method used at one shop was:

(1) All variable names must include at least one hyphen;

(2) The list of "reserved words containing hyphens" (which is substantial,
but dramatically smaller than the total list of all reserved words)
was printed out in a large font on a big sheet of paper and
pasted to the wall.

Essentially, *every* non-hyphenated word was treated as reserved.

--
Martin

STRL Reader in Software Engineering and Royal Society Industry Fellow
mar...@gkc.org.uk http://www.cse.dmu.ac.uk/~mward/ Erdos number: 4
G.K.Chesterton web site: http://www.cse.dmu.ac.uk/~mward/gkc/
Mirrors: http://www.gkc.org.uk and http://www.gkc.org.uk/gkc

Martin Ward

unread,

Jan 12, 2011, 7:31:39 AM1/12/11

to

On Monday 10 Jan 2011 at 18:23, comp...@is-not-my.name wrote:
> Because industry listened to academics, and they shouldn't have,

> ever. ...

> Remember the good old days when there was no conditional loop? Either
> you checked a condition and did a GO TO or you didn't have a
> loop. Life was good and simple. And yes, I still prefer my source code
> in all caps.

Suppose you are reading some code from the "good old days"
and you encounter a GO TO. Is there a loop or not?
First, you need to find the label that is the target of the GO TO:
if it is before the GO TO, then you have a loop, right?
Well, not necessarily. Only if there exists a control flow
path from the label back to the GO TO. Similarly, a forward GO TO
might still be a loop if there is a path back to the GO TO
from the label.

Suppose you have found out that there *is* a loop involved.
Now you want to know which statements are in the loop body
and which are not. To do that you need to construct the control
flow graph, find the dominator tree, the loop header nodes,
find all the nodes dominated by the loop header...
But that involves listening to those pesky academics!

--
Martin

STRL Reader in Software Engineering and Royal Society Industry Fellow
mar...@gkc.org.uk http://www.cse.dmu.ac.uk/~mward/ Erdos number: 4
G.K.Chesterton web site: http://www.cse.dmu.ac.uk/~mward/gkc/
Mirrors: http://www.gkc.org.uk and http://www.gkc.org.uk/gkc

[Hey, it wasn't all academics. Some of it was IBM. But I take your
point. -John]

Hans-Peter Diettrich

unread,

Jan 12, 2011, 8:30:29 AM1/12/11

to

glen herrmannsfeldt schrieb:

> I have still not written any COBOL, though some day maybe just to
> say that I did. I do remember the stories, though, of having to
> keep the list nearby to avoid using any of the words.

Interestingly only English speakers have such problems. In one of my
first IS lessons we've been encouraged to use natural (German) words
for identifiers all over, to prevent such problems. Obeying this
simple rule I never noticed even the existence of the keyword problem,
in all my COBOL code :-)

Now I wonder why APL then wasn't the big breakthrough, eliminating any
possible clashes with natural languages ;-)

DoDi

Martin Rodgers

unread,

Jan 12, 2011, 12:54:26 PM1/12/11

to

George Neuner wrote:

> Why do so many languages offer (at least) two forms of conditional
> loop: one with the test at the beginning and another with the test at
> the end? Why not just offer an infinite loop and a way to break out
> that can be tied to any conditional?

Because they behave differently. When you test at the entry point to
a loop, the loop may run zero or more times. When you test at the
end of the loop, just before the jump back to the beginning, you have
a loop that will run one or more times.

This is why I shuddered with horror when I read that BBC Basic had
REPEAT/UNTIL but no WHILE/WEND. I envisiged a host of bugs. Sure,
you can write a REPEAT/UNTIL loop and put an IF/ENDIF around the
code within it, but that's extra code and my cynical observation, even
back then, was that most programmers would wish to make their code
as short as possible. This was esp important in a token-based interpreter,
as most Basic implementations on small machines like the BBC Micro.
Every token was expensive, not just in memory terms, but also in run-time.
Code optimisations would therefore include rearranging the order of the
lines, placing frequently run code at the start of the program, to give them
low line numbers, and using short variable names. Removing a line
completely was the ultimate optimisation.

So I further imagined a generation of programmers being encouraged to
write code like this in *any* language, on *any* machine, out of habit.
Ironically, I saw GOTO as the least destructive feature of the language.

The old argument for leaving out certain language features was the finite
amount of memory available, in this case in the BBC Micro's ROM. I later
read about some other features that I'd have happily sacrified to make
room for WHILE/WEND support.

Another small point: almost every conditional loop I'd written up to that
put had been a WHILE loop, not a REPEAT loop. This was true even when
I wrote code in assembly language. It wasn't decided by any choice of
language features; it was simply the right control structure for the code.
Anything else would've been a bug. Several decades later, I still find that
I'm writing zero-or-more loops, and *very* rarely one-or-more.

WHILE/WEND:
"As there any work to be done? If so, do that work."

REPEAT/UNTIL:
"Do some work, then see if there's any more to be done."

This simple distinction helped me write much better code, even in
assembly language. ;) Imagine the task of picking items off a linked
list and processing the items in sequence. If you failed to use the
right control structure here, your code will likely crash and do
something very odd when the list is empty. Also consider code that
reads text from a file - what will happen when the file is empty? Will
your code cope? How will it fail, gracefully or painfully?

I don't believe this is an academic language design issue. It has
serious real-world effects, including bugs. I first read about it in
the Software Tools books (both of them). It also turns up in a lot of
academic literature, but that doesn't make it a purely academic issue
- it only means that it's so serious that even academics are aware of
it. ;)

robin

unread,

Jan 12, 2011, 8:12:21 PM1/12/11

to

"Peter Dassow" <z8...@arcor.de> wrote in message news:10-1...@comp.compilers...

| I have to compile some very old Fortran-66 programs,
| Are there any other PC compatible Fortran-66 compilers
| out there ?

You are likely to have much more success with a modern compuiler,
such as F90 or F95, which will probably have many of the
F66 extensions as standard features.
As well as that, some compilers still support old features.
Try the Silverfrost Fortran 95 compiler, which is one such
compiler.

As well as that, I think that you can still obtain their F77 compiler.

glen herrmannsfeldt

unread,

Jan 13, 2011, 2:31:01 AM1/13/11

to

Hans-Peter Diettrich <DrDiet...@aol.com> wrote:
(snip on remembering all the reserved words for COBOL)

> Interestingly only English speakers have such problems. In one of my
> first IS lessons we've been encouraged to use natural (German) words
> for identifiers all over, to prevent such problems. Obeying this
> simple rule I never noticed even the existence of the keyword problem,
> in all my COBOL code :-)

I have always wondered why so many languages use English words
for keywords, even though programmers may speak other languages.

Now, with C one could use the preprocessor to replace keywords
from other languages with the appropriate C keyword.

I do remember when first learning Fortran, having to learn
how to spell EQUIVALENCE, as was a word that I rarely used
otherwise. I also remember people from other parts of the US
trying to get Fortran to accept the INTERGER statement.

I don't remember ever knowing a non-native-English speaker who
wanted a programming language with keywords in another language.

> Now I wonder why APL then wasn't the big breakthrough,
> eliminating any possible clashes with natural languages ;-)

Many symbols to learn, and they are harder to remember even
than keywords in a non-native language.

-- glen
[Long ago I saw versions of languages like Fortranwith the keywords
translated into other languages. They weren't very popular. -John]

glen herrmannsfeldt

unread,

Jan 13, 2011, 2:59:55 AM1/13/11

to

Martin Rodgers <m...@wildcard.demon.co.uk> wrote:
> George Neuner wrote:

>> Why do so many languages offer (at least) two forms of conditional
>> loop: one with the test at the beginning and another with the test at
>> the end? Why not just offer an infinite loop and a way to break out
>> that can be tied to any conditional?

> Because they behave differently. When you test at the entry point to
> a loop, the loop may run zero or more times. When you test at the
> end of the loop, just before the jump back to the beginning, you have
> a loop that will run one or more times.

The latter like Fortran DO loops on many systems before Fortran 77.
(and even though the standard didn't require, or even allow for, it.)

> This is why I shuddered with horror when I read that BBC Basic had
> REPEAT/UNTIL but no WHILE/WEND. I envisiged a host of bugs. Sure,
> you can write a REPEAT/UNTIL loop and put an IF/ENDIF around the
> code within it, but that's extra code and my cynical observation,

(snip)

> Another small point: almost every conditional loop I'd written up to that
> put had been a WHILE loop, not a REPEAT loop. This was true even when
> I wrote code in assembly language.

Yes "test at the top" loops are more common, but once in a while the
"test at the bottom" loop is needed. If you only have WHILE, then
you either code the whole loop contents once, outside the loop,
and then again inside the loop (if it is small), or use a variable
whose only purpose is to get into the loop the first time.

I might even have done something in C like:

for(i=0; i==0 || (some other test) ; i++) {

where there is no other test on i, but that i might be needed
as a loop counter. (That is, there is no for with a test at the end.)

But otherwise, my first assembly programs were for S/360, where
the easiest loop instruction is BCT, something like "subtract one
and branch if not zero." That makes for a very easy "test at the end"
loop, using only one register. (But only if you can count backwards.)

That is much easier than BXH, which requires three registers, and
I would have to look up which one was which each time. There is
also the complementary BXLE, convenient for "test at the end" loops.
(That is, Branch if indeX High, and Branch if indeX Low or Equal.)

Even more, it has been found that on ESA/390 processors with
branch prediction, that BXH predicts not taken, and BXLE predicts
taken. It does that even for backward branch BXH and forward
branch BXLE.

> Several decades later, I still find that
> I'm writing zero-or-more loops, and *very* rarely one-or-more.

Rarely, but when they occur, you really want the right one.

(snip)

Now, why do some languages have DO ... UNTIL, where others have
DO ... WHILE for "test at the end" loops?

-- glen

comp...@is-not-my.name

unread,

Jan 13, 2011, 6:41:16 AM1/13/11

to

> On Monday 10 Jan 2011 at 18:23, comp...@is-not-my.name wrote:
> > Because industry listened to academics, and they shouldn't have,
> > ever. ...
> > Remember the good old days when there was no conditional loop? Either
> > you checked a condition and did a GO TO or you didn't have a
> > loop. Life was good and simple. And yes, I still prefer my source code
> > in all caps.
>
> Suppose you are reading some code from the "good old days"
> and you encounter a GO TO. Is there a loop or not?
> First, you need to find the label that is the target of the GO TO:
> if it is before the GO TO, then you have a loop, right?
> Well, not necessarily. Only if there exists a control flow
> path from the label back to the GO TO. Similarly, a forward GO TO
> might still be a loop if there is a path back to the GO TO
> from the label.

As interesting as that idea is on paper, there just isn't any practical
issue. IBM COBOL provided enough information in the listings to cross
reference and index things so that working on very large programs was not
difficult and most people coded things (at least in the early days up
through the 1980s) reasonably enough that the logic, which anyway was not
that complicated, could be followed. The reality is that not having control
structures in COBOL was not and is not an issue. It may have been an issue
if you pushed COBOL outside its envelope, but when used for what it was for,
it was an effective and simple tool and doesn't really lack anything. And
that statement goes for COBOL 20 years ago as well. It's ironic but a major
problem area in COBOL performance and complexity *is* the PERFORM verb which
does implement DO WHILE/DO UNTIL type processing. I understand that C
programmers are more comfortable working with PERFORM than GOTO when they
write COBOL or look at a COBOL application, but in my view PERFORM is more
trouble than it's worth. Some of that has to do with the realities of
virtual storage management for very large applications systems and some of
it has to do with the uncanny ability of people to abuse even things that
are designed to prevent abuse in the first place.

I don't suggest C coders use GOTOs, and C coders should stop suggesting
COBOL coders *not* use GOTOs. A place for everything, and everything in its
place...

> Suppose you have found out that there *is* a loop involved.
> Now you want to know which statements are in the loop body
> and which are not. To do that you need to construct the control
> flow graph, find the dominator tree, the loop header nodes,
> find all the nodes dominated by the loop header...

You only need to do that if you are a compiler. If you are a programmer, you
don't ;-)

I have personally written and worked on tens or hundreds of millions of
lines of COBOL code and millions of lines of FORTRAN. I'm not a COBOL or
FORTRAN programmer, but in the early days I spent a few years working on
large applications systems so I got a lot of exposure fast and wrote a ton
of COBOL. Later on I got involved in software performance and we looked at
and analyzed a lot of HLL application code, mostly COBOL and PL/I. I also
worked in tech support and had to diagnose and fix failures in applications
that used services our products provided. In that time I do not remember
ever having had a problem because of a lack of control structures available
for my own coding nor have I had a problem understanding and diagnosing and
repairing bugs in other peoples code. In context there are only so many
types of loops in COBOL and they're all pretty obvious and standard. Read a
record, do some processing, produce report headers and totals, etc. There's
nothing here that requires control structures. At the end of the day, over
50 years and billions of lines of code later, people get by. You can say
it's not perfect, but you can't say it doesn't work. The proof is in the
pudding. It does work.

You could argue FORTRAN would have been more readable with control
structures, especially since it *is* heavily used in problem domains
where those constructs make a lot of sense. I guess in that case it
was "you can't miss what you never had. I understand people who
started coding in the nineties don't have that view, but for those of
us who started decades earlier, nobody can tell us we can't get the
job done without all that, because the simple fact is that we
did. It's interesting that PL/I, which is a really nice language and
was available since the early 1960s never caught on, nor did it
replace any COBOL or FORTRAN, even though it had support for control
structures from the beginning and many other useful features. It turns
out that things other than what's hot in academia drive industry. The
most important factor is how productive people are with the tools they
have. Just like a bumble bee doesn't look like it should be able to
fly but it does, those old, simple languages got used and solved real
problems despite their lack of glitter, complexity, and academic
appeal and despite the vitriol and disgrace Wirth and Dijkstra tried
to heap on them ;-)

My post was a little tongue in cheek but not entirely. Everything has
to be understood in context. Things that make sense for certain
languages and problem domains don't for others. There isn't any "one
true way" as the academics would have you believe, and they tend to
focus on problems that aren't real problems or areas of computer
science that don't necessarily have much practical application. "Those
who can, do, those who can't, teach." With very few exceptions I found
this to be true. There are a few notable exceptions, some that come to
mind are Drs. Dewar and Stroustroup but aside from a small group we
don't find academia producing much of anything except textbooks and
mayhem.

I don't argue that control structures aren't worthwhile or necessary
for certain problems domains and languages, I'm just reminding
everybody that two of the world's most successful commercial languages
(COBOL and FORTRAN) got by for many decades without things that are
deemed essential in modern languages, and they still get the job done
today. They put men on the moon and built atomic bombs all without C++
or Java...
[Hey, they put men on the moon in Jovial, and built atomic bombs
with punch cards and accounting machines. -John]

noitalmost

unread,

Jan 13, 2011, 1:09:24 PM1/13/11

to

On Thursday, January 06, 2011 03:45:23 pm George Neuner wrote:
> Why do so many languages offer (at least) two forms of conditional
> loop: one with the test at the beginning and another with the test at
> the end? Why not just offer an infinite loop and a way to break out
> that can be tied to any conditional?
>
> You're absolutely right that a language doesn't need 10 ways to
> accomplish the same thing ... I fully agree that having too many
> equivalent choices is needless waste. But apparently redundant
> features can be justified by programmer convenience as well as for
> unique uses.

My language solution addresses this sort of compromise. I'm providing
traditional While, infinite Loop, and Break statements. If you have a Break,
you only need one loop construct to provide pre-, post-, and mid-test loops.
The While is provided simply for programmer convenience.

while x < y :
x := getAnother();
end;

loop :
if x >= y :
break
end;
x := getAnother()
end;

I'm trying to work out whether I can uniformly provide a
keyword : body end
construct and then provide shorthands in which eliminating the colon also
indicates a single-statement body with no end
if x >= y break;
I'm trying to make it so the compiler can give good error messages for common
errors (missing/extra colon, missing/extra end, etc).

As another convenience, I'm also considering an Unless statement (kind of like
in Perl).
unless x < y : break

comp...@is-not-my.name

unread,

Jan 14, 2011, 5:12:50 AM1/14/11

to

> I have always wondered why so many languages use English words
> for keywords, even though programmers may speak other languages.

The answer that jumps out at me is most languages were developed in
America and so were most of the hardware and operating
systems. FORTRAN and COBOL set the early standards by using English to
make it simpler for their target audience to use the languages in a
natural way as opposed to what came before- machine language,
autocoder, assembler, etc. I can't explain LISP, but maybe it's easier
for non native-English speakers to master, I can tell you the keywords
in LISP make no sense and may just have well been in German or Chinese
or Swedish ;-)

I don't know that there was any specific plot to exclude anybody, but
I do know the global view that's common today was not very common in
the times those languages were developed. I'm not sure anybody had any
idea how far things would go.
[The keywords in Lisp make perfect sense if you know IBM 704 assembler. -John]

Martin Rodgers

unread,

Jan 14, 2011, 5:50:29 AM1/14/11

to

glen herrmannsfeldt wrote:

> Now, why do some languages have DO ... UNTIL, where others have
> DO ... WHILE for "test at the end" loops?

I've always prefered languages that have both. I picked on a specific
Basic dialect because that was the example that irritated me the most.
*My* experience of Basic was mainly with another dialect that had
neither WHILE mor REPEAT - you had to use IF/GOTO or FOR.

BBC Basic was also irritating to me because, unlike the earlier Basic
implementations that I knew on on micros, the ROM was large enough
to support both control structures, so why pick just one?

So I can only guess why the implementers made that choice. To be fair,
a lot of choices made in Basic implementation of that era seem bizzarre
to me today. They seemed pretty odd to me at the time, but I learned a
lot about the pitfalls of language design by studying them, so at least
they had some value for me. It's a small design space, but that may have
helped me - at some point, language design comes down to the very
small details that will matter to programmers and implementers. I can
recommend this technique to anyone interested in language design - study
entire families of dialects, their evolution, their implementions, the costs
and trade-offs made, the context(s) and general family history.

Looking at the subject for this thread, I might suggest starting with the
Algol family. ;)

robin

unread,

Jan 14, 2011, 6:20:21 AM1/14/11

to

From: "Hans-Peter Diettrich" <DrDiet...@aol.com>

> Now I wonder why APL then wasn't the big breakthrough, eliminating any
> possible clashes with natural languages ;-)

APL is a language that's far too cryptic.
It's very difficult to work out from code precisely what
the writer intended. Hence it's diffucult for someone else to debug.
It's often jokingly referred to as "write once and throw away".
[It was too hard to stick all those little labels on the keycaps.
Reading APL isn't too hard if you know how to look for idioms, but
it takes a while to learn. -John]

robin

unread,

Jan 14, 2011, 6:31:28 AM1/14/11

to

From: "glen herrmannsfeldt" <g...@ugcs.caltech.edu>

> I have always wondered why so many languages use English words
> for keywords, even though programmers may speak other languages.

With PL/I, it has always been possible to substitute (for a keyword)
any suitable word in the user's natural language.

Often, programmers in countries whose national language is not
English also have command of English, so writing the programs
using the English keywords is not a problem.

robin

unread,

Jan 14, 2011, 7:03:24 AM1/14/11

to

From: <comp...@is-not-my.name>
Sent: Thursday, 13 January 2011 10:41 PM

> It's interesting that PL/I, which is a really nice language and
> was available since the early 1960s never caught on, nor did it
> replace any COBOL or FORTRAN,

Well, it did. However, FORTRAN programmers couldn't perceive that the
language was of any benefit to them.

In general, programming was out of their depth.
They failed to see any advantage in the fact that when
their PL/I program crashed they could get the statement number
where it crashed (instead of a hex error number). Or perhaps they were
ignorant of that.)
Three great facilities in PL/I were not seen as improvements
over FORTRAN, namely,
(1) dynamic arrays ;
(2) variable field widths in formatted output; and
(3) character strings.

From published code, it is evident that Fortran programmers went to
extraordinary lengths to make their code portable and flexible, in an
attempt to emulate dynamic arrays, for example, often doubling the
size of the code in the process, and even then the finished product
did not come close to what could be done with PL/I in terms of (1)
portability (2) bullet-proofing, and (3) ability to update.
[PL/I suffered from much less mature compilers than Fortran. Back in the
late 1960s, Fortran H produced great code, PL/I F produced pretty bad code.
By the mid 70s the PL/I optimizing and checkout compilers were a great
improvement, but nobody was interested in switching to PL/I then. -John]

Hans-Peter Diettrich

unread,

Jan 14, 2011, 3:43:11 AM1/14/11

to

George Neuner schrieb:

> Why do so many languages offer (at least) two forms of conditional
> loop: one with the test at the beginning and another with the test at
> the end? Why not just offer an infinite loop and a way to break out
> that can be tied to any conditional?

With a look at grammars themselves, I found EBNF much easier to read and
write than BNF, except for (some) repetitions. The >=0 repetition {...}
does not allow to express >0 repetition, so that one has to write
list ::= some lengthy construct { "," some lengthty construct }.
That's why constructs like
list ::= ( some lengthy construct )/",".
have been added to e.g. Borland EBNF.

[The typo in the first list left in intentionally...]

Obviously the required loop constructs vary with the domain of a
language. That's why IMO general purpose languages should allow for
several loop constructs, which should be easy to distinguish by
humans.

DoDi

Chris F Clark

unread,

Jan 14, 2011, 5:04:05 PM1/14/11

to

PL/I style keywords (as opposed to Pascal style reserved words) are an
interesting topic to me, having used PL/I (as well as Pascal) and a
host of other older languages. The PL/I solution seemed elegant
enough that we used it in Yacc++ and documented how grammar designers
could apply it to their languages. The core idea behind the PL/I
solution is that there are many (or at least some) words that are only
reserved in certain contexts, and that if one isn't using the word in
that context there is no use in restricting the user from having it in
his vocabulary (i.e. available as a user defined identifier).

Froma a grammar writing point of view, it is not particularly
difficult to introduce PL/I style keywords to any LR grammar. The
same changes should generally work for LL grammars also. If people
are interested I can document them here.

The harder question one has to ask is whether they make writing
correct programs easier or harder. They certainly make more programs
legal and lessen the burden of remembering all the keywords for a
given language.

However, if one has written an incorrect program, they might make
deciphering the error message harder and may prevent the error being
detected at the spot where it occurs. For example, consider the case
where someone doesn't realize that a specific keyword has a language
defined meaning at a particular spot and in the same spot an ordinary
user defined identifier is allowed. Then, in that place if the user
types what they think is their user defined identifier, but it happens
to be a reserved word in that context, something bad will happen. If
the user is lucky, the mistake will cause some kind of error because
the keyword will have additional syntax following it that the user
will not specify. In the unlucky case, the program will appear
correct, but silently do the wrong thing, possibly not detected until
it has caused some later catastrophic failure.

This is the same reason implicit declarations can be dangerous.
Certain types of errors are easy to make and systems that make that
harder provide more protection, even if they penalize those who don't
make those errors.

Hope this helps,
-Chris

******************************************************************************
Chris Clark email: christoph...@compiler-resources.com
Compiler Resources, Inc. Web Site: http://world.std.com/~compres
23 Bailey Rd voice: (508) 435-5016
Berlin, MA 01503 USA twitter: @intel_chris
------------------------------------------------------------------------------
[IF THEN = ELSE THEN IF = ELSE; ELSE IF = THEN; -John]

robin

unread,

Jan 15, 2011, 7:15:50 AM1/15/11

to

> <comp...@is-not-my.name>

>> I have always wondered why so many languages use English words

> The answer that jumps out at me is most languages were developed in
> America

Pascal was designed in Europe, and it is in English.

comp...@is-not-my.name

unread,

Jan 16, 2011, 6:02:27 AM1/16/11

to

> However, if one has written an incorrect program, they might make
> deciphering the error message harder and may prevent the error being
> detected at the spot where it occurs. For example, consider the case
> where someone doesn't realize that a specific keyword has a language
> defined meaning at a particular spot and in the same spot an ordinary
> user defined identifier is allowed. Then, in that place if the user
> types what they think is their user defined identifier, but it happens
> to be a reserved word in that context, something bad will happen. If
> the user is lucky, the mistake will cause some kind of error because
> the keyword will have additional syntax following it that the user
> will not specify. In the unlucky case, the program will appear
> correct, but silently do the wrong thing, possibly not detected until
> it has caused some later catastrophic failure.

I don't know if the source for it was ever made public, but PL/C, a
variant of PL/I written at Cornell University would be a great case
study on this topic.

I'm sure many guys on the list are old enough to remember, but for
those who aren't or who didn't work on IBM platforms, the idea behind
PL/C was to hammer anything you handed it into a legal PL/I program
and generate a working load module (executable). The program may not
have done what you wanted, but it would do *something*. It was
targeted at CS101 students and from what I saw sitting behind the Help
Desk it did a credible job. I wish I could remember more about it.

The diagnostics ranged from helpful to hilarious. When it detected a syntax
error it would correct the statement as well as it could and produce a
message "PL/C USES... " and give a working syntax for the statement, which
was inserted in the program at that point.

I don't know how much the thinking or logic behind it would lend
itself to other languages. But it was a very interesting teaching
concept.

> This is the same reason implicit declarations can be dangerous.

I'm not sure I agree with this or else FORTRAN couldn't have been very
successful. A lot of non-optimal things in life do seem to work.

> Certain types of errors are easy to make and systems that make that
> harder provide more protection, even if they penalize those who don't
> make those errors.

Ada!

> [IF THEN = ELSE THEN IF = ELSE; ELSE IF = THEN; -John]

Ha! When I get a chance I may try compiling that under PL/I...!
[For more info about PL/C. see http://ecommons.cornell.edu/handle/1813/5952
I don't know if the code is still around, but it'd be easy enough to
ask if anyone wants to run it on Hercules.
With respect to the danger of implicit declarations in Fortran, there are
plenty of stories of broken code due to statements like DO 10 I = 1.10
which is an assignment, not a loop. -John]

glen herrmannsfeldt

unread,

Jan 16, 2011, 3:18:18 PM1/16/11

to

robin <rob...@dodo.com.au> wrote:
(snip)

> Pascal was designed in Europe, and it is in English.

As I understand it, and as it says in Wikipedia, ALGOL-W was
developed at Stanford. (While Wirth was visiting.)

Pascal seems to be at least partially based on ALGOL-W.

While it may have been published in Europe, the design may
have taken years, and may have been done in different places.

I do wonder, though, if people in the US would ever accept a
programming language with keywords from a non-English language.

-- glen
[Well, there's APL, where the operators tend toward Greek. -John]

robin

unread,

Jan 16, 2011, 7:55:33 PM1/16/11

to

> <comp...@is-not-my.name>

> With respect to the danger of implicit declarations in Fortran, there are
> plenty of stories of broken code due to statements like DO 10 I = 1.10
> which is an assignment, not a loop. -John]

This kind of thing has more to with a poor syntax rather than implicit
declarations.

The old (orignal syntax) of FORTRAN permitted spaces anywhere (or none
of them) because spaces were ignored [except in strings].

Had spaces been significant, DO 10 I would have been parsed as three
separate tokens. As it was, FORTRAN parsed it as the single token
"DO10I", which was a legal identifier.
[That's an egregious example, but I've written plenty of buggy code where
I spelled a variable name in two ways. Not really a compiler issue, though,
since it's easy enough to implement either way. -John]

Torben Ægidius Mogensen

unread,

Jan 17, 2011, 6:35:17 AM1/17/11

to

"robin" <rob...@dodo.com.au> writes:
> Well, it did. However, FORTRAN programmers couldn't perceive that the
> language was of any benefit to them.

I have seen examples of FORTRAN programmers rejecting newer languages
(like Pascal, C, or C++) because of a perceived ineffectiveness of the
newer language compared to FORTRAN.

While there sometimes is a real difference in effectiveness -- such as
when Pascal is compiled to interpreted P-code -- the perception is often
based on simple-minded experiments porting a few programs from FORTRAN
to, say, C and not taking the different array layout into account: If
you translate a nested loop walking over a multi-dimensional array from
FORTRAN to C, you are likely to get a suboptimal order of access -- the
original FORTRAN program was optimised to column-major array layout,
which doesn't work well with the row-major array layout of C or Pascal.

Torben
[This is an awfully long way from compilers. -John]

Tony Finch

unread,

Jan 17, 2011, 12:28:25 PM1/17/11

to

Chris F Clark <c...@shell01.TheWorld.com> wrote:
>
>Froma a grammar writing point of view, it is not particularly
>difficult to introduce PL/I style keywords to any LR grammar. The
>same changes should generally work for LL grammars also. If people
>are interested I can document them here.
>
>The harder question one has to ask is whether they make writing
>correct programs easier or harder. They certainly make more programs
>legal and lessen the burden of remembering all the keywords for a
>given language.

If you want this feature to work well then you have to design other
parts of the grammar to accommodate it.

Explicit statement delimiters help, because most keywords can only
occur at the start of a statement, or are only valid after a
particular statement-introducing keyword.

If you are using keywords to delimit blocks then block-dependent end
keywords are better, e.g. if..fi rather than if..end, so that the end
keyword doesn't become de facto reserved because it's special in so
many contexts. Or just use {..} punctuation.

>[IF THEN = ELSE THEN IF = ELSE; ELSE IF = THEN; -John]

Maybe we should re-introduce stropping using U+0332 COMBINING LOW LINE :-)

Tony.
--
f.anthony.n.finch <d...@dotat.at> http://dotat.at/

Gene Wirchenko

unread,

Jan 17, 2011, 5:36:15 PM1/17/11

to

On Fri, 14 Jan 2011 10:50:29 +0000, Martin Rodgers
<m...@wildcard.demon.co.uk> wrote:

>glen herrmannsfeldt wrote:
>
>> Now, why do some languages have DO ... UNTIL, where others have
>> DO ... WHILE for "test at the end" loops?

If the test is after the loop, it makes sense. Mind you, I do
not think that putting the test at the end makes sense. I want my
loop control up-front and would rather see
until <cond>
body
and just know that the body will be executed once for sure. Going
through the body when I know the condition means that I may be able to
desk-check much faster.

>I've always prefered languages that have both. I picked on a specific
>Basic dialect because that was the example that irritated me the most.
>*My* experience of Basic was mainly with another dialect that had
>neither WHILE mor REPEAT - you had to use IF/GOTO or FOR.

BTDT. I loved it as much as you did.

>BBC Basic was also irritating to me because, unlike the earlier Basic
>implementations that I knew on on micros, the ROM was large enough
>to support both control structures, so why pick just one?

Microsoft BASIC 5 had WHILE but not UNTIL.

>So I can only guess why the implementers made that choice. To be fair,
>a lot of choices made in Basic implementation of that era seem bizzarre
>to me today. They seemed pretty odd to me at the time, but I learned a
>lot about the pitfalls of language design by studying them, so at least
>they had some value for me. It's a small design space, but that may have
>helped me - at some point, language design comes down to the very
>small details that will matter to programmers and implementers. I can
>recommend this technique to anyone interested in language design - study
>entire families of dialects, their evolution, their implementions, the costs
>and trade-offs made, the context(s) and general family history.

I always look for the philosophy of a programming language.

>Looking at the subject for this thread, I might suggest starting with the
>Algol family. ;)

Sincerely,

Gene Wirchenko

comp...@is-not-my.name

unread,

Jan 17, 2011, 5:51:04 PM1/17/11

to

> > [IF THEN = ELSE THEN IF = ELSE; ELSE IF = THEN; -John]
>
> Ha! When I get a chance I may try compiling that under PL/I...!

Ok. Here it is. If I had not compiled it myself I would not have believed
it!

OPTIONS SPECIFIED

OBJECT,NODECK;

OPTIONS USED

INSOURCE NOAGGREGATE NOCOMPILE(S)
LMESSAGE NOATTRIBUTES CMPAT(V2)
OBJECT NODECK FLAG(I)
OPTIONS NOESD LANGLVL(OS,NOSPROG)
SOURCE NOGONUMBER LINECOUNT(55)
STMT NOGOSTMT MARGINS(2,72,0)
NOGRAPHIC SEQUENCE(73,80)
NOIMPRECISE SIZE(4137144)
NOINCLUDE NOSYNTAX(S)
NOINTERRUPT SYSTEM(MVS)
NOLIST
NOMACRO
NOMAP
NOMARGINI
NOMDECK
NONEST
NONUMBER
NOOFFSET
NOOPTIMIZE
NOSTORAGE
NOTERMINAL
NOTEST
NOXREF
5688-235 IBM PL/I for MVS & VM LOGIC: PROC OPTIONS(MAIN); PAGE 2

SOURCE LISTING

STMT

1 LOGIC: PROC OPTIONS(MAIN); 00040005
2 DCL (IF, THEN, ELSE) CHAR; 00050005
00060005
3 BEGIN; 00070005
4 IF THEN = ELSE THEN IF = ELSE; ELSE IF = THEN; 00080005
6 END; 00090005
7 END;
00100000
5688-235 IBM PL/I for MVS & VM LOGIC: PROC OPTIONS(MAIN); PAGE 3

NO MESSAGES PRODUCED FOR THIS COMPILATION

COMPILE TIME 0.00 MINS SPILL FILE: 0 RECORDS, SIZE 4051

NUMBER OF TEMPORARY VARIABLES USED: 3. NUMBER AVAILABLE: 65532

END OF COMPILATION OF LOGIC

> [For more info about PL/C. see
> http://ecommons.cornell.edu/handle/1813/5952

Thanks for the link. It looks like a good doc resource!

> I don't know if the code is still around, but it'd be easy enough to
> ask if anyone wants to run it on Hercules.

I have not seen it available, although IBM's early PL/I compilers are. If
anyone does find a copy I'd be interested in installing it.

Martin Rodgers

unread,

Jan 18, 2011, 9:20:45 AM1/18/11

to

Gene Wirchenko wrote:

>> BBC Basic was also irritating to me because, unlike the earlier Basic
>> implementations that I knew on on micros, the ROM was large enough
>> to support both control structures, so why pick just one?
>
> Microsoft BASIC 5 had WHILE but not UNTIL.

Yes, that struck me as odd at the time.

> I always look for the philosophy of a programming language.

Yes, that's useful too. I like Paul Graham's question about the problems a
language is intended to solve. That's part of the philosophy, I guess.

We could probably ask similar questions about compilers...Error reporting
and recovery has always fascinated me, probably because I was frequently
frustrated by the unhelpfulness of the error msgs given by so many tools.

My first lexer buffered text at the line level and counted the
characters and lines read so far, then provided them to the error
reporting code so that the line number and the line itself could be
given to the user, along with a '^' under the character at which the
error was detected.

This only worked in my compiler because it ran in a single pass. When
I began writing multipass compilers, I had to save the line and offset
numbers in the parse and/or syntax trees. Yes, it does require extra
effort, but several decades later my compilers still use this
technique. The line text itself, however, was only ever used in error
msgs in my first compiler.

robin

unread,

Jan 18, 2011, 4:13:29 PM1/18/11

to

From: <comp...@is-not-my.name>
Sent: Tuesday, 18 January 2011 9:51 AM

>> > [IF THEN = ELSE THEN IF = ELSE; ELSE IF = THEN; -John]
>>
>> Ha! When I get a chance I may try compiling that under PL/I...!
>
> Ok. Here it is. If I had not compiled it myself I would not have believed
> it!

> 1 LOGIC: PROC OPTIONS(MAIN);

> 2 DCL (IF, THEN, ELSE) CHAR;

> 3 BEGIN;

> 4 IF THEN = ELSE THEN IF = ELSE; ELSE IF = THEN;

> 6 END;
> 7 END;

In Fortran, a similar successful compilation can be obtained for :--

IF (THEN == ELSE) THEN; IF = ELSE; ELSE; IF = THEN; END IF

In Algol, a similar constaruct is possible, but typically
reserved words are enclosed in apostrophes, so it's not quite so dramatic.
[I stopped writing Fortran compilers with F77. Let me tell you, it
was a pain in the patoot to tell a FORMAT statement from a statement
function FORMAT(A3,I2) = A3**I2
(Yes, I know how to do it. I did it, after all.) -John]

robin

unread,

Jan 18, 2011, 9:34:26 AM1/18/11

to

From: "robin" <rob...@dodo.com.au>
Sent: Monday, 17 January 2011 11:55 AM

>> <comp...@is-not-my.name>

> Had spaces been significant, DO 10 I would have been parsed as three
> separate tokens. As it was, FORTRAN parsed it as the single token
> "DO10I", which was a legal identifier.
> [That's an egregious example, but I've written plenty of buggy code where
> I spelled a variable name in two ways. Not really a compiler issue, though,
> since it's easy enough to implement either way. -John]

The reason that many mis-spellings passed by unnoticed in FORTRAN was
that most compilers of the time produced only a compilation listing.

IBM PL/I compilers at that time produced not only a compilation
listing, but also an attribute listing (a list of identifiers,
attributes, and cross-references). That way it was easy to detect
mis-spelled identifiers.

Now, Fortran has not only free-form source, where blanks are significant --
which avoids constructs such as DO I = 1.10 becoming an assignment --
plus an IMPLICIT NONE statement which, when employed,
reports as errors any undeclared identifiers.

IBM's current PL/I compilers also have a compiler option that causes
undeclared identifiers to be classified as errors.

Peter Canning

unread,

Jan 18, 2011, 11:27:24 PM1/18/11

to

On 1/14/2011 2:04 PM, Chris F Clark wrote:
> From a grammar writing point of view, it is not particularly

> difficult to introduce PL/I style keywords to any LR grammar. The
> same changes should generally work for LL grammars also. If people
> are interested I can document them here.

I'm definitely interested in seeing a description of how to do this (in
my case for LL grammars). Its probably worth starting a different thead
for it though.

- Peter Canning

Anton Ertl

unread,

Jan 19, 2011, 9:45:28 AM1/19/11

to

Peter Canning <9cn6...@sneakemail.com> writes:
>On 1/14/2011 2:04 PM, Chris F Clark wrote:
>> From a grammar writing point of view, it is not particularly
>> difficult to introduce PL/I style keywords to any LR grammar. The
>> same changes should generally work for LL grammars also. If people
>> are interested I can document them here.
>
>I'm definitely interested in seeing a description of how to do this (in
>my case for LL grammars).

It's pretty easy for LL(1) grammars: At each position, treat all
keywords as identifiers if they don't occur in the first set. E.g.,
in many languages "IF" occurs only as keyword at the start of a
statement, so in a PL/I-style variant of those languages "IF" could be
treated as identifier everywhere else.

A disadvantage of this approach is that some syntax errors can only be
discovered later. E.g, to pick up the example above, if the semicolon
in front of an IF is missing, a Pascal compiler will notice a syntax
error when it sees the "IF", but not necessarily a compiler for a
PL/I-style language. I don't know how relevant this is in practice.
If only new keywords (for language extensions beyond the original
standard, where new keywords often cause problems) are treated in this
way, this disadvantage is probably small compared to the benefit.

Such a scheme is probably best implemented in the parser (which knows
about the first-sets), but at the interface to the scanner (so the
higher levels don't need to have special treatment of such
identifiers).

- anton
--
M. Anton Ertl
an...@mips.complang.tuwien.ac.at
http://www.complang.tuwien.ac.at/anton/

Gene Wirchenko

unread,

Jan 19, 2011, 2:11:42 PM1/19/11

to

On Tue, 18 Jan 2011 14:20:45 +0000, Martin Rodgers
<m...@wildcard.demon.co.uk> wrote:

[snip]

>We could probably ask similar questions about compilers...Error reporting
>and recovery has always fascinated me, probably because I was frequently
>frustrated by the unhelpfulness of the error msgs given by so many tools.

You, too? I used to like reading the appendix of error messages
for a compiler. Philosophy of the compiler.

>My first lexer buffered text at the line level and counted the
>characters and lines read so far, then provided them to the error
>reporting code so that the line number and the line itself could be
>given to the user, along with a '^' under the character at which the
>error was detected.

Well, I like this sort of thing. It makes it much easier for me
to find the error. An ugly counterexample: SQL statements can get
very long, and it can make non-specific error messages not very useful
and occasionally quite frustrating.

[snip]

Sincerely,

Gene Wirchenko

robin

unread,

Jan 22, 2011, 12:40:25 AM1/22/11

to

From: "Anton Ertl" <an...@mips.complang.tuwien.ac.at>
Sent: Thursday, 20 January 2011 1:45 AM

> It's pretty easy for LL(1) grammars: At each position, treat all
> keywords as identifiers if they don't occur in the first set. E.g.,
> in many languages "IF" occurs only as keyword at the start of a
> statement, so in a PL/I-style variant of those languages "IF" could be
> treated as identifier everywhere else.
>
> A disadvantage of this approach is that some syntax errors can only be
> discovered later. E.g, to pick up the example above, if the semicolon
> in front of an IF is missing, a Pascal compiler will notice a syntax
> error when it sees the "IF", but not necessarily a compiler for a
> PL/I-style language.

A compiler for a PL/I style language will pick that up as a syntax
error (i.e., a missing semicolon). IBM's PL/I will probably advise
that it has assumed a semicolon and continued normally, treating the
omission as a trivial error.

Such an omission is detected because there would otherwise appear to be
a missing operator or some such before the "IF".

See compiler output below:--
______________________

5639-D65 IBM(R) VisualAge(TM) PL/I for Windows(R) V2.R1.00 2011.01.22 16:31:10 Page 3
Line.File

3.1 test: procedure options (main);
4.1 declare (a, b) fixed;

6.1 a = b
7.1 if a > b then b = a;

9.1 end test;

5639-D65 IBM(R) VisualAge(TM) PL/I for Windows(R) V2.R1.00 2011.01.22 16:31:10 Page 4

Compiler Messages

Message Line.File Message Description

IBM1122I W 7.1 Missing ; assumed before IF.

Martin Ward

unread,

Jan 24, 2011, 7:05:03 AM1/24/11

to

On Thursday 13 Jan 2011 at 18:09, noitalmost <noita...@cox.net> wrote:
> My language solution addresses this sort of compromise. I'm providing
> traditional While, infinite Loop, and Break statements. If you have a
> Break, you only need one loop construct to provide pre-, post-, and
> mid-test loops. The While is provided simply for programmer convenience.

My language (WSL) has WHILE loops, FOR loops and loops with
multi-level EXITs. A loop of the form DO ...statements... OD can only
be terminated by an EXIT(n) statement, where n is an integer, not a
variable or expression. EXIT(n) will terminate the "n" enclosing
nested DO...OD loops.

WSL also includes a restricted type of GOTO in the form of an action
system and action calls. Roughly speaking, labels can only appear at
the top level of the program structure. CALLs (i.e. GOTOs) can only
appear within IF statements and DO...OD loops, not any other kind of
loop. This means that whenever you see a WHILE loop in WSL you can
guarantee that: (a) the condition is true at the top of the loop body,
and (b) the condition is false on termination of the loop. A REPEAT
loop ensures (b) but cannot ensure (a) since any conditions are
possible on the first iteration of the loop.

One major application of WSL is for transforming unstructured code
(translated from assembler code) to structured code, with the minimal
reduction in efficiency and while preserving any existing structure
where possible. In this context the multi-level EXITs provide a very
useful intermediate stage between unstructured spaghetti code and
fully structured code.

The FermaT program transformation system
http://www.gkc.org.uk/fermat.html is implemented almost entirely in
WSL and I have tried to use either a WHILE loop or a DO...OD loop,
depending on which seemed most natural for the code in question. WSL
also has two types of FOR loops: FOR v := start TO end STEP step DO --
the usual "counted repetition" and FOR v IN list DO -- iterate over
the elements of a list or set. WSL also has FOREACH and ATEACH loops
which iterate over the components of the current program which is
being transformed.

Out of a total of 73,713 lines of WSL code there are currently:

849 FOREACH loops
174 ATEACH loops
705 FOR ... IN ... loops
102 FOR ... STEP ... loops
727 WHILE loops
578 DO...OD loops (approx)
921 EXIT(1) statements
30 EXIT(2) statements
0 EXIT(3) or higher statements

Conclusions:

(1) Domain-specific looping constructs are very useful: at least
for the program transformation domain!

(2) Iterating over a list or set is used (in this system) much more
that iterating over a sequence of integers

(3) WHILE loops are used more often than loops with EXITs from the middle
(recall that a WSL WHILE loop cannot be terminated from the middle):
the extra "analysability" of a WHILE loop versus the convenience of an exit
from the middle suggests that it is useful to have both in a language.

(4) It is sometimes useful to EXIT directly from a double-nested loop,
but higher levels of EXITs do not occur in my code at least.

--
Martin

STRL Reader in Software Engineering and Royal Society Industry Fellow
mar...@gkc.org.uk http://www.cse.dmu.ac.uk/~mward/ Erdos number: 4
G.K.Chesterton web site: http://www.cse.dmu.ac.uk/~mward/gkc/