New coding standards: use underscores, hyphens or mixed case in command (and identifier) names

James Harris

unread,

Jul 17, 2004, 8:17:03 AM7/17/04

to

Before I embark on a new long-term language project I'd appreciate your advice on how to
split up long names. I would like to keep the standards for command or instruction names
the same as that for variable and type names, if possible. Looking at the examples below,
which ones seem better?

Straight names
echoclient
lastcharoffset
helloworld

Internal underscores
echo_client
last_char_offset
hello_world

I could also use embedded hyphens as my minus sign must be surrounded by whitespace
(please suspend disbelief while looking at these. I know they will look unfamiliar!)
echo-client
last-char-offset
hello-world

Mixed case
EchoClient
LastCharOffset
HelloWorld

Initial lower case then mixed
echoClient
lastCharOffset
helloWorld

In some ways I like the mixed case versions using an inital capital, especially as I may
want to prefix some names with a code for an abstract data type, which, when present,
could begin with a lower case. Is this getting too Microsoft-ish? Is it getting to
Hungarian? Is Hungarian bad when used with abstract data types rather than inbuilt ones?

Advice on which is or is not thought to be acceptable would be much appreciated. Please
bear in mind that I intend these names for commands/instructions as well as variables and
types. Constants would be in all caps.

--
Thanks,
James

Marco van de Voort

unread,

Jul 17, 2004, 9:13:27 AM7/17/04

to

On 2004-07-17, James Harris <> wrote:
> EchoClient
> LastCharOffset
> HelloWorld
>
> Initial lower case then mixed
> echoClient
> lastCharOffset
> helloWorld
>
> In some ways I like the mixed case versions using an inital capital,
> especially as I may want to prefix some names with a code for an abstract
> data type, which, when present, could begin with a lower case. Is this
> getting too Microsoft-ish? Is it getting to Hungarian? Is Hungarian bad
> when used with abstract data types rather than inbuilt ones?

Hungarian notation is not bad or good. The point is, do you need that bit of
extra security of Hungarian notation? If you have a strong typed language
with good error messages you don't need it.

E.g. in Pascal nobody ever uses Hungarian notation, unless forced
externally. The compiler directly snaps at you that it wanted type x, while
you gave y. This can even be done pre-compilation stage by the syntax
highlighter. (e.g. Delphi uses the compiler binary for syntax highlighting
and Intellisense, much more powerful than a simple parser as highlighter)

Delphi coding style btw does use some Hungarian notation. Mostly enumaration
elements and class fields have a prefix. The latter is because of properties
occupying the same namespace, which is a valid reason. I'm not sure about the
exact reason for the former.

So saying that you _want_ Hungarian notation doesn't make sense. If you want
to do it right, you need proper motivation _why_ the programmer has to go
through the extra burden to do that extra type administration.

> Advice on which is or is not thought to be acceptable would be much
> appreciated. Please bear in mind that I intend these names for
> commands/instructions as well as variables and types. Constants would be
> in all caps.

Some other questions:

- Is your language case-sensitive? Complex capitalisation hurts more if a
slightly different capitalisation fails to compile.
- e.g. Delphi styleguide has different rules for native code and e.g. imported
code (header/api units). Do you plan such a thing?

Howard Ding <hading@hading.dnsalias.com>

unread,

Jul 17, 2004, 1:20:00 PM7/17/04

to

"James Harris" <no.email.please> writes:

> I could also use embedded hyphens as my minus sign must be surrounded by whitespace
> (please suspend disbelief while looking at these. I know they will look unfamiliar!)
> echo-client
> last-char-offset
> hello-world
>

Why? They look perfectly familiar to anyone who programs Lisp.

--
Howard Ding
<had...@hading.dnsalias.com>

Marcin 'Qrczak' Kowalczyk

unread,

Jul 17, 2004, 5:38:37 PM7/17/04

to

On Sat, 17 Jul 2004 13:17:03 +0100, James Harris wrote:

> Before I embark on a new long-term language project I'd appreciate your
> advice on how to split up long names.

http://groups.google.com/groups?selm=pan.2004.06.20.12.55.58.551616%40knm.org.pl
http://groups.google.com/groups?selm=pan.2003.10.28.12.31.07.163797%40knm.org.pl

--
__("< Marcin Kowalczyk
\__/ qrc...@knm.org.pl
^^ http://qrnik.knm.org.pl/~qrczak/

cr88192

unread,

Jul 17, 2004, 11:00:09 PM7/17/04

to

"James Harris" <no.email.please> wrote in message
news:40f918a1$0$7807$db0f...@news.zen.co.uk...

>
> Before I embark on a new long-term language project I'd appreciate your
advice on how to
> split up long names. I would like to keep the standards for command or
instruction names
> the same as that for variable and type names, if possible. Looking at the
examples below,
> which ones seem better?
>

depends...

> Straight names
> echoclient
> lastcharoffset
> helloworld
>

terminal part of a callback function.

void *pdscr_threads_makethread(PDSCR0_Context *ctx, void **args, int n);
also for local variable names.

> Internal underscores
> echo_client
> last_char_offset
> hello_world
>

some other cases, typically internal functions.

> I could also use embedded hyphens as my minus sign must be surrounded by
whitespace
> (please suspend disbelief while looking at these. I know they will look
unfamiliar!)
> echo-client
> last-char-offset
> hello-world
>

yes, that works, but can't be done in c and friends.
eg, scheme had used a lot of other symbols in names like this, eg:
(set! x y)

(if* (list? x) :then (display (car x)) :else (display x))
yes, non-standard, but this is to illustrate a point...

...

in general:
! is used for destructive operations;
? is used for predicates (they only return true or false);
* is typically used for alternate forms of something;
...

:foo is a keyword, basically meaning that it just evaluates to itself.

> Mixed case
> EchoClient
> LastCharOffset
> HelloWorld
>

typical of normal functions or types.

> Initial lower case then mixed
> echoClient
> lastCharOffset
> helloWorld
>

personally I don't as much like this one.

> In some ways I like the mixed case versions using an inital capital,
especially as I may
> want to prefix some names with a code for an abstract data type, which,
when present,
> could begin with a lower case. Is this getting too Microsoft-ish? Is it
getting to
> Hungarian? Is Hungarian bad when used with abstract data types rather than
inbuilt ones?
>
> Advice on which is or is not thought to be acceptable would be much
appreciated. Please
> bear in mind that I intend these names for commands/instructions as well
as variables and
> types. Constants would be in all caps.
>

yes.

I use hungarian sometimes, usually it is to deal with "name clashes", or
different types of whatever reffering to the same thing.

James Harris

unread,

Jul 20, 2004, 5:33:04 PM7/20/04

to

"Marco van de Voort" <mar...@stack.nl> wrote in message
news:slrncfi9fn....@toad.stack.nl...

> Hungarian notation is not bad or good. The point is, do you need that bit
of
> extra security of Hungarian notation? If you have a strong typed language
> with good error messages you don't need it.

My intention is that the language will be strongly typed - but will include
a Variant type (and something like a TypeOf operator).

> So saying that you _want_ Hungarian notation doesn't make sense. If you
want
> to do it right, you need proper motivation _why_ the programmer has to go
> through the extra burden to do that extra type administration.

Having seen a particular description of Hungarian I agree with you. I
didn't like what I saw!

I would like the language to show clarity of statement. I know we all want
that. I am thinking of something like this counterexample,

result = left + right

Does this add two integers and assign the result to another integer. Does
it add reals, ratios, complex numbers? It may do none of these. It may, in
fact, concatenate strings. Does it concatenate bit fields? None of this can
be determined without looking up the types of the three variables.

Here's another question which is also relevant, Does the output data type
permit the result of the operation to be assigned without coercion? If the
operation is multiply is there a danger of me losing high-order bits
because the output type is the same width as both inputs? Here is an
alternative

iResult = iLeft + iRight

though I must confess I don't much care for the look of that either. :-(

> - Is your language case-sensitive? Complex capitalisation hurts more if
a
> slightly different capitalisation fails to compile.

At this time the language is intended to be case sensitive. For familiarity
to users who are not computer-literate the hyphenated version is growing on
me, eg the user typing hello-world.

> - e.g. Delphi styleguide has different rules for native code and e.g.
imported
> code (header/api units). Do you plan such a thing?

No. I particularly don't want the language to enforce or require different
rules for different circumstances. I want the language to prescribe as
little as possible, leaving the system designer to choose the
representation where possible.

--
James

James Harris

unread,

Jul 20, 2004, 5:45:00 PM7/20/04

to

"cr88192" <cr8...@protect.hotmail.com> wrote in message
news:BIlKc.494$Qn6...@fe07.usenetserver.com...

> > echo-client
> > last-char-offset
> > hello-world
> >
> yes, that works, but can't be done in c and friends.
> eg, scheme had used a lot of other symbols in names like this, eg:
> (set! x y)
>
> (if* (list? x) :then (display (car x)) :else (display x))
> yes, non-standard, but this is to illustrate a point...
>
> ...
>
> in general:
> ! is used for destructive operations;
> ? is used for predicates (they only return true or false);
> * is typically used for alternate forms of something;
> ...
>
> :foo is a keyword, basically meaning that it just evaluates to itself.

I'm intending deliberately not distinguishing variable names and function
names. Then source code can be unchanged if the type changes. This is
particularly for array and function melding. E.g. result =
factorial.(input-value). I'd like to be able to extend that to functions
that do not require input.

> I use hungarian sometimes, usually it is to deal with "name clashes", or
> different types of whatever reffering to the same thing.

Do you mean, for example, to distinguish the number realSalary from the
printable representation stringSalary?

--
James

James Harris

unread,

Jul 20, 2004, 5:46:34 PM7/20/04

to

<had...@hading.dnsalias.com> wrote in message
news:m3wu121...@frisell.localdomain...

> "James Harris" <no.email.please> writes:
>
> > I could also use embedded hyphens as my minus sign must be surrounded
by whitespace
> > (please suspend disbelief while looking at these. I know they will look
unfamiliar!)
> > echo-client
> > last-char-offset
> > hello-world
> >
>
> Why? They look perfectly familiar to anyone who programs Lisp.

I think I could get used to these. Perhaps variable names are one of the
more readable bits of Lisp...... :-)

James Harris

unread,

Jul 20, 2004, 5:53:37 PM7/20/04

to

"Marcin 'Qrczak' Kowalczyk" <qrc...@knm.org.pl> wrote in message
news:pan.2004.07.17....@knm.org.pl...

Marcin, thanks. I had seen your good comments before posting.

I still didn't understand your signature, though. Is it a duck, a baby in a
pram? No, perhaps a Kogut...? ;-)

--
Cheers,
James

Marcin 'Qrczak' Kowalczyk

unread,

Jul 20, 2004, 6:13:13 PM7/20/04

to

On Tue, 20 Jul 2004 22:33:04 +0100, James Harris wrote:

> result = left + right
>
> Does this add two integers and assign the result to another integer. Does
> it add reals, ratios, complex numbers? It may do none of these.

Perhaps it does not matter. If it adds two amounts of time, I should not
care at this point if it's an integer or a rational.

Of course sometimes it does matter. But Hungarian notation is a poor
solution: it doesn't scale to many types (and representing almost
everything by integers is not a good idea, better introduce types
which more accurately describe the data), and looks ugly.

> It may, in fact, concatenate strings. Does it concatenate bit fields?

Assuming that the language use + for concatenation. Some languages and
some people, including me, don't use + for anything other than addition
of numbers. Precisely because it makes hard to infer what a piece of
code means without knowing a larger context.

> Does the output data type permit the result of the operation to be
> assigned without coercion? If the operation is multiply is there a
> danger of me losing high-order bits because the output type is the same
> width as both inputs?

Ah, so you first create a language with a broken addition, and then use
a naming convention to remind people each time that it's broken :-)

My favorite approach to adding integers is that it does not overflow and
is not coerced. If a programmer writes 'let result = left + right', and
both left and right are non-negative integers, then the result is a
non-negative integer, period. This is how addition works.

I accept "wrong" results only for floating point numbers, because people
haven't invented a representation of real numbers which would be able to
replace floating point, so unfortunately here the limitations of possible
implementation must influence the behavior. But implementation of integers
limited only by memory is a known issue with known solutions, there are
free libraries for this (e.g. GMP), etc.

Marcin 'Qrczak' Kowalczyk

unread,

Jul 20, 2004, 6:14:48 PM7/20/04

to

On Tue, 20 Jul 2004 22:53:37 +0100, James Harris wrote:

> I still didn't understand your signature, though. Is it a duck, a baby in a
> pram? No, perhaps a Kogut...? ;-)

It's meant to be a chicken :-)
"Kurczak" is "chicken" in Polish.

cr88192

unread,

Jul 20, 2004, 10:54:01 PM7/20/04

to

"James Harris" <no.email.please> wrote in message

news:40fd925a$0$7117$db0f...@news.zen.co.uk...

>
> "cr88192" <cr8...@protect.hotmail.com> wrote in message
> news:BIlKc.494$Qn6...@fe07.usenetserver.com...
>
> > > echo-client
> > > last-char-offset
> > > hello-world
> > >
> > yes, that works, but can't be done in c and friends.
> > eg, scheme had used a lot of other symbols in names like this, eg:
> > (set! x y)
> >
> > (if* (list? x) :then (display (car x)) :else (display x))
> > yes, non-standard, but this is to illustrate a point...
> >
> > ...
> >
> > in general:
> > ! is used for destructive operations;
> > ? is used for predicates (they only return true or false);
> > * is typically used for alternate forms of something;
> > ...
> >
> > :foo is a keyword, basically meaning that it just evaluates to itself.
>
> I'm intending deliberately not distinguishing variable names and function
> names. Then source code can be unchanged if the type changes. This is
> particularly for array and function melding. E.g. result =
> factorial.(input-value). I'd like to be able to extend that to functions
> that do not require input.
>

hmm, I am not totally sure of this one...

not all function names have a character appended on.
this was I guess a replacement for the p, n, ... postfixes for cl.

>
> > I use hungarian sometimes, usually it is to deal with "name clashes", or
> > different types of whatever reffering to the same thing.
>
> Do you mean, for example, to distinguish the number realSalary from the
> printable representation stringSalary?
>

well, just as an example:
int foo_function();
int (*foo_function_p)();

by default, foo_function_p points to foo_function, but may be overridden, so
&foo_function is not generally usable in this case.

similar is when one is the real function, and the other is an interpreter
builtin referring to the real function.

similar is because I like to stick to a convention of typically one-letter
base var names for functions, and sometimes the names clash, so they get
renamed:
void **a, **b;
double *fa, *fb; //renamed since a is allready in use
NetParse_Node *n0; //would have been n if 'int n' weren't around
int n;
int i, j, k, l; //these ones are most of the time ints
int ai, bi, ci;
float af, bf, cf;
void *p, *q, *r; //these are quite often void pointers
char *s, *t, *u, *v; //often strings
float s, t; //main contenders for s and t space
int t; //also often a contender for t
...

as can maybe be noted, types are prefixed in the case of pointers and
postfixed in the case of normal vars.

a lot has to do with order as well, as usually later added variables will
more often be renamed than earlier ones, ...

Marco van de Voort

unread,

Jul 21, 2004, 5:36:28 AM7/21/04

to

On 2004-07-20, James Harris <> wrote:
>
>> Hungarian notation is not bad or good. The point is, do you need that bit
> of
>> extra security of Hungarian notation? If you have a strong typed language
>> with good error messages you don't need it.
>
> My intention is that the language will be strongly typed - but will include
> a Variant type (and something like a TypeOf operator).

Then I wouldn't, or at least not for basic types.

You could still use it for some other things. Details depend on your language,
but I can give some examples from delphi:

- In Delphi, nearly all identifiers use the same (compilation unit)
namespace So types, const, vars etc. Types are therefore commonly
prefixed with a 'T'
- Resourcestrings are commonly prefixed too, with S (probably for string).
Probably only do indicate that it is a posteditable without recompiling)
string.
- Exceptions are classes, but are sometimes prefixed with E
- Interfaces use I as (type) prefix.
- classes can have both properties (with RTTI) and internal corresponding
fields. Therefore, classfields are commonly prefixed with "f"

> want
>> to do it right, you need proper motivation _why_ the programmer has to go
>> through the extra burden to do that extra type administration.
>
> Having seen a particular description of Hungarian I agree with you. I
> didn't like what I saw!

Also keep in mind that a good parsable language with a decent importing
system can more easily show types etc in tooltip like popups in editors.

> I would like the language to show clarity of statement. I know we all want
> that. I am thinking of something like this counterexample,
>
> result = left + right
>
> Does this add two integers and assign the result to another integer. Does
> it add reals, ratios, complex numbers? It may do none of these. It may, in
> fact, concatenate strings. Does it concatenate bit fields? None of this can
> be determined without looking up the types of the three variables.

Yes. But must it be encoded in the identifiers?
That has two serious problems:
- it must be done by the programmer (opposed to the IDE)
- programmers sometimes don't update it if the type changes.

> Here's another question which is also relevant, Does the output data type
> permit the result of the operation to be assigned without coercion? If the
> operation is multiply is there a danger of me losing high-order bits
> because the output type is the same width as both inputs? Here is an
> alternative

Better stuff this in compiler warnings etc. Or better, have decent runtime
range checking. (but that might be my Pascal biass)

>> - Is your language case-sensitive? Complex capitalisation hurts more if
> a
>> slightly different capitalisation fails to compile.
>
> At this time the language is intended to be case sensitive. For familiarity
> to users who are not computer-literate the hyphenated version is growing on
> me, eg the user typing hello-world.

Problem with hyphen is that it is the same char as minus. That can be
confusing. (and cause parser problems)

>> - e.g. Delphi styleguide has different rules for native code and e.g.
> imported
>> code (header/api units). Do you plan such a thing?
>
> No. I particularly don't want the language to enforce or require different
> rules for different circumstances

This was a styleguide, Delphi enforces nothing, though the default editor
encourages it a bit.

> I want the language to prescribe as little as possible, leaving the system
> designer to choose the representation where possible.

Of course. But keep in mind that the system that you devise for the initial
system is the one that usually sticks.

Delphi has different recommendations for imported header units, since those
are essentially translated C headers, while the normal styleguide is for native
Pascal code.

Marco van de Voort

unread,

Jul 21, 2004, 5:39:07 AM7/21/04

to

On 2004-07-20, James Harris <> wrote:

>> I use hungarian sometimes, usually it is to deal with "name clashes", or
>> different types of whatever reffering to the same thing.
>
> Do you mean, for example, to distinguish the number realSalary from the
> printable representation stringSalary?

Note that the seriousness and likelyness of name clashes is dependant on
your scoping and import precendence rules. A good module system reduces the
likeliness of clashes.

Howard Ding <hading@hading.dnsalias.com>

unread,

Jul 21, 2004, 8:47:50 AM7/21/04

to

"James Harris" <no.email.please> writes:

>
> I think I could get used to these. Perhaps variable names are one of the
> more readable bits of Lisp...... :-)
>

That's true. Along with the rest of it.

--
Howard Ding
<had...@hading.dnsalias.com>

cr88192

unread,

Jul 21, 2004, 11:46:02 AM7/21/04

to

<had...@hading.dnsalias.com> wrote in message
news:m3smbly...@frisell.localdomain...

> "James Harris" <no.email.please> writes:
>
> >
> > I think I could get used to these. Perhaps variable names are one of the
> > more readable bits of Lisp...... :-)
> >
>
> That's true. Along with the rest of it.
>

this is something I have heard claims of a lot but don't really agree
with...

a problem with lisp syntax is that it is too regular, and many syntactic
constructs don't "stand out" as much as in other languages (the use of
special syntax for some forms helps a little, but given the common
"character chaining" approaches they tend to be cryptic).

this factor also interferes with the ability to skim the source and get a
rough idea of the structure.

things also seem to end up a little more nested than one would hope, but
this is more a language issue than a syntax one (where in many languages a
nesting of 3 or 4 levels is pretty deep, much deeper nestings seem to
accumulate a lot faster).

not to mention the factor that the syntax has a lot of "scaring off of the
newbies" type powers. yes, any unfammilair languages have this property, but
imo lisp and smalltalk are more so then normal, c-like syntax is closer to
average but is much more common, and pascal syntax would probably be a
little less so (though I am not that fond of pascal syntax either...).
forth is especially bad (and also has the added bonus that once one stops
looking at it they start forgetting the structure, and re-fammiliarizing
oneself with their own code is extra difficult).

and yes, xml is worse than s-exps, but only rarely have I heard anyone
pushing direct use of xml as a programming language syntax...

I will argue that s-exps' power is their expressiveness, and not their
readability.

a possibile issue here though is that they tend to interfere some with the
"behind the scenes" abilities of the parser, leading to the occurance of
"syntax objects" and such to try to deal with the issues.

also, the workings of the compiler may find themselves constrained, and one
can't really change things around as much without interfering with the code
being compiled.
this is in contrast to languages like c where lots of weird crap can be done
in the parser, and where things can be changed around readily, at the cost
that the syntax is far less expressive and powerful macro systems are
largely eliminated...

or something...

James Harris

unread,

Jul 21, 2004, 5:36:54 PM7/21/04

to

"Marcin 'Qrczak' Kowalczyk" <qrc...@knm.org.pl> wrote in message

news:pan.2004.07.20....@knm.org.pl...

> On Tue, 20 Jul 2004 22:33:04 +0100, James Harris wrote:
>
> > result = left + right

<snip>

> > It may, in fact, concatenate strings. Does it concatenate bit fields?
>
> Assuming that the language use + for concatenation. Some languages and
> some people, including me, don't use + for anything other than addition
> of numbers. Precisely because it makes hard to infer what a piece of
> code means without knowing a larger context.

What I have in mind will allow the programmer to define meanings for words
or symbols - and to match these depending on context. The plus sign is
normally overloaded, representing integer addition of various precisions,
real addition and possibly others. I probably won't provide the
concatenation of strings per se but nor will I prevent "+" being used as a
method on new data types.

> > Does the output data type permit the result of the operation to be
> > assigned without coercion? If the operation is multiply is there a
> > danger of me losing high-order bits because the output type is the same
> > width as both inputs?
>
> Ah, so you first create a language with a broken addition, and then use
> a naming convention to remind people each time that it's broken :-)

LOL. As I say, the programmer can choose.

> My favorite approach to adding integers is that it does not overflow and
> is not coerced. If a programmer writes 'let result = left + right', and
> both left and right are non-negative integers, then the result is a
> non-negative integer, period. This is how addition works.

Well, I did mention multiplication rather than addition, but taking your
comment, wouldn't MOSTPOS + MOSTPOS be too wide to be assigned to an
integer. Perhaps you mean a negative and a non-negative....?

James Harris

unread,

Jul 21, 2004, 5:46:19 PM7/21/04

to

"Marco van de Voort" <mar...@stack.nl> wrote in message

news:slrncfse8s...@toad.stack.nl...
<snip>

> > I would like the language to show clarity of statement. I know we all
want
> > that. I am thinking of something like this counterexample,
> >
> > result = left + right
> >
> > Does this add two integers and assign the result to another integer.
Does
> > it add reals, ratios, complex numbers? It may do none of these. It may,
in
> > fact, concatenate strings. Does it concatenate bit fields? None of this
can
> > be determined without looking up the types of the three variables.
>
> Yes. But must it be encoded in the identifiers?

No. I'm not suggesting it /must/ be for anyone using the language. I am
wondering, however, whether to use some form of type identification in my
own code written in the language.

> That has two serious problems:
> - it must be done by the programmer (opposed to the IDE)
> - programmers sometimes don't update it if the type changes.

Isn't your second argument the converse of your first? If your IDE can
identify types could it not rename identifiers? Incidentally a hover-help
IDE is a good idea. I'm wanting the source to be preparable on simple
terminals, though.

> > Here's another question which is also relevant, Does the output data
type
> > permit the result of the operation to be assigned without coercion? If
the
> > operation is multiply is there a danger of me losing high-order bits
> > because the output type is the same width as both inputs? Here is an
> > alternative
>
> Better stuff this in compiler warnings etc. Or better, have decent
runtime
> range checking. (but that might be my Pascal biass)

The jury is out on this one. I may permit the programmer to specify whether
an operation is to be checked or not.

> >> - Is your language case-sensitive? Complex capitalisation hurts more
if
> > a
> >> slightly different capitalisation fails to compile.
> >
> > At this time the language is intended to be case sensitive. For
familiarity
> > to users who are not computer-literate the hyphenated version is
growing on
> > me, eg the user typing hello-world.
>
> Problem with hyphen is that it is the same char as minus. That can be
> confusing. (and cause parser problems)

I agree it is unfamiliar. Alowing it as a hyphen in identifier names would
require the minus sign to be separated from neighbouring character strings
by whitespace. Not everyone would like this.

James Harris

unread,

Jul 21, 2004, 5:57:08 PM7/21/04

to

"Marco van de Voort" <mar...@stack.nl> wrote in message

news:slrncfsedr...@toad.stack.nl...

Marco, I'm not sure what you mean by the last comment. Could you add more?

I am thinking generally of very small modules, each defining an op-code and
its context. That op-code is then to be usable as any inbuilt op-code.

Where packages of op-codes are needed I was thinking for the package to
define a prefix of any length (from zero) and that all op-codes in that
package would have the prefix of the package.

For example, say I wanted to define a package of operations on
extended-length words I could say

type ExtendedWord is <whatever>
package e : prefix "e."
to negate (ExtendedWord value)
value = (- value)
endto negate

then use the new negate function in this way,

ExtendedWord myExtendedWord
e.negate myExtendedWord

where the e. is the prefix from the package definition.

James Harris

unread,

Jul 21, 2004, 6:03:50 PM7/21/04

to

"cr88192" <cr8...@protect.hotmail.com> wrote in message

news:yUkLc.4957$Qn6....@fe07.usenetserver.com...

<snip>

> > I'm intending deliberately not distinguishing variable names and
function
> > names. Then source code can be unchanged if the type changes. This is
> > particularly for array and function melding. E.g. result =
> > factorial.(input-value). I'd like to be able to extend that to
functions
> > that do not require input.
> >
> hmm, I am not totally sure of this one...

Nor am I....! Maybe it gives too much flexibility to the programmer, making
another's code hard to understand. On the other hand maybe the programmer
should have this flexibility and then de facto standards can arise to keep
code clear. I'm not sure which way to go on this yet.

<snip>

> int ai, bi, ci;
> float af, bf, cf;

Interesting. Makes the variable types clear, doesn't it?

cr88192

unread,

Jul 21, 2004, 10:22:50 PM7/21/04

to

"James Harris" <no.email.please> wrote in message

news:40fee844$0$7129$db0f...@news.zen.co.uk...

>
> "cr88192" <cr8...@protect.hotmail.com> wrote in message
> news:yUkLc.4957$Qn6....@fe07.usenetserver.com...
>
> <snip>
>
> > > I'm intending deliberately not distinguishing variable names and
> function
> > > names. Then source code can be unchanged if the type changes. This is
> > > particularly for array and function melding. E.g. result =
> > > factorial.(input-value). I'd like to be able to extend that to
> functions
> > > that do not require input.
> > >
> > hmm, I am not totally sure of this one...
>
> Nor am I....! Maybe it gives too much flexibility to the programmer,
making
> another's code hard to understand. On the other hand maybe the programmer
> should have this flexibility and then de facto standards can arise to keep
> code clear. I'm not sure which way to go on this yet.
>

as far as I understand it, you were suggesting allowing functions to be
evaluated without an args list, or be used as fake objects.

depending on implementation, this can cause "weird" semantic issues (say, a
practical inability to usefully work with first-class functions).

of course, it could be generalized in another way:
all operations are implicitly application, so, for example, a normal
function can be used as an array or object, and can accept method calls and
slot assignments.

dunno your syntax, using my own.

function foo(var, val) {...}
foo.bar=baz;
which would be equivalent to a call:
foo(#bar, baz);

the issue, however, is when you pick up the "function with no arguments is
equivalent to its return value" type semantics.

function bar() 3;
x=bar;
x => 3

this eliminates a lot of possible uses of functions (eg: passing them around
and calling them from elsewhere).

> <snip>
>
> > int ai, bi, ci;
> > float af, bf, cf;
>
> Interesting. Makes the variable types clear, doesn't it?
>

yes, but the main point is that often I exhaust my supply of short
local-variable names, and need to do something about it.

I also stick highly to conventions related to the use of the variables as
well...

I don't like longer names, mostly because they take more effort to type and
require me to actually think up a good var name, and they interfere with my
ability to copy and paste chunks of code between functions with only minor
(and sometimes no) alterations.

Marco van de Voort

unread,

Jul 22, 2004, 4:25:57 AM7/22/04

to

On 2004-07-21, James Harris <> wrote:
>> > be determined without looking up the types of the three variables.
>>
>> Yes. But must it be encoded in the identifiers?
>
> No. I'm not suggesting it /must/ be for anyone using the language. I am
> wondering, however, whether to use some form of type identification in my
> own code written in the language.

I wouldn't for a strongtyped lang with a module system. HN is a cludge for
systems that don't have that.

>> That has two serious problems:
>> - it must be done by the programmer (opposed to the IDE)
>> - programmers sometimes don't update it if the type changes.
>
> Isn't your second argument the converse of your first? If your IDE can
> identify types could it not rename identifiers?

No. Since it might not have access to all occurances, parts may be
precompiled-only etc etc, people might not always use the IDE etc.

I don't like languages that are _only_ editable via their own IDE.

> Incidentally a hover-help
> IDE is a good idea. I'm wanting the source to be preparable on simple
> terminals, though.

Could be done on terminals too. Simply ident the type the cursor is on, and
display the type in the status bar.

Our own textmode IDE is of the Turbo Vision type (like Turbo Pascal IDE, and
dos edit), but extended to have a symbol browser, some intellisense like
features etc.

Most GUI IDE concepts translate quite well to the textmode too. Specially
since TV is event driven also.

>> > because the output type is the same width as both inputs? Here is an
>> > alternative
>>
>> Better stuff this in compiler warnings etc. Or better, have decent
> runtime
>> range checking. (but that might be my Pascal biass)
>
> The jury is out on this one. I may permit the programmer to specify whether
> an operation is to be checked or not.

That's what Pascal does also. Please also allow to _locally_ disable/enable
it, and not just global to the project or module.

> growing on
>> > me, eg the user typing hello-world.
>>
>> Problem with hyphen is that it is the same char as minus. That can be
>> confusing. (and cause parser problems)
>
> I agree it is unfamiliar. Alowing it as a hyphen in identifier names would
> require the minus sign to be separated from neighbouring character strings
> by whitespace. Not everyone would like this.

I wouldn't like it. But I don't like any significantly meaning placed on
whitespace. Call me old fashioned :-)

Marco van de Voort

unread,

Jul 22, 2004, 4:50:52 AM7/22/04

to

On 2004-07-21, James Harris <> wrote:
>

> "Marco van de Voort" <mar...@stack.nl> wrote in message
> news:slrncfsedr...@toad.stack.nl...
>> On 2004-07-20, James Harris <> wrote:
>>
>> >> I use hungarian sometimes, usually it is to deal with "name clashes",
> or
>> >> different types of whatever reffering to the same thing.
>> >
>> > Do you mean, for example, to distinguish the number realSalary from the
>> > printable representation stringSalary?
>>
>> Note that the seriousness and likelyness of name clashes is dependant on
>> your scoping and import precendence rules. A good module system reduces
> the
>> likeliness of clashes.
>
> Marco, I'm not sure what you mean by the last comment. Could you add more?

The scope of an identifier is the space in the source where the identifier
can be used. So for a variabele inside a
- procedure, the scope is the procedure
- for a global in a module the scope is the module itself, and if the global is
also exported the scope extends to all other modules that import the
module.

However some languages have multiple ways of importing a module. E.g. modula2
can import symbols from a module in two ways:
(my M2 is a bit rusty, others please don't point out mistakes, it is for the idea
only)

MODULE xxx;

FROM yyy IMPORT a,b,c,d,e;
IMPORT zzz,ooo;

END xxx.

The FROM line line causes to import identifiers a,b,c,d,e from yyy. These
identifiers can be used without module name, so a,b,c,d,e

The IMPORT zzz line imports module zzz (and ooo). Identifiers from zzz can
_only_ be used qualified with modulename, so zzz.someident.

The second way of importing avoids nameclashes, because in xxx, zzz.bla is
totally different from ooo.bla. Moreover, in Modula2 users will actually be
biassed to use the second (IMPORT) way because that way they don't have to
name all identifiers they want locally in the FROM way.

So the way how you allow identifiers to go from module to module, and how
many ways you allow to hide identifiers (local procedures, local modules)
will decrease the likelyness of nameclashes.

Note that the M2 system also allows the compiler barf already on importing
two identifiers with the same names (using FROM syntax above) without
actually having to look at the other modules source, since the same identifier
is then listed in two FROM .. IMPORT zzz; statements.

> I am thinking generally of very small modules, each defining an op-code and
> its context. That op-code is then to be usable as any inbuilt op-code.

.. treating "foreign" identifiers the same as locally defined increases
the likeliness of nameclasses (opposed to e.g. calling with a modulename
qualifier). That doesn't have to be bad, as long as you realise it.

>
> type ExtendedWord is <whatever>
> package e : prefix "e."

[...]

> where the e. is the prefix from the package definition.

If this prefixing is mandatory, that is pretty much what I meant.

Also think about nesting modules. Can be fun :-)

I got hooked on it using Modula2, and miss it in Pascal sometimes.

Lasse Hillerøe Petersen

unread,

Jul 22, 2004, 5:44:19 AM7/22/04

to

In article <slrncfuvvc....@toad.stack.nl>,

Marco van de Voort <mar...@stack.nl> wrote:

> However some languages have multiple ways of importing a module. E.g. modula2
> can import symbols from a module in two ways:
> (my M2 is a bit rusty, others please don't point out mistakes, it is for the
> idea
> only)
>
> MODULE xxx;
>
> FROM yyy IMPORT a,b,c,d,e;
> IMPORT zzz,ooo;
>
> END xxx.

This is much like Perl modules.

use Yyy qw(a b c d e);
use Zzz qw();
use Ooo qw();

Except that Perl gives the module writer a convenient way to "default" a
list of names to be imported locally. This default set is then imported
by simply:
use Mmmm;

Of course Perl gives you access to symbol tables, so all this can be
hacked in every imaginable (and unimaginable) way anyhow.

Eiffel has one feature which I don't think is found in any other
language I know (although it could be achieved with Perl, I suppose.)
Of course in Eiffel, the only way to import is to inherit from some
other class, but inherited features can be *renamed* which gives the
inherited feature a different name locally.

-Lasse

Marcin 'Qrczak' Kowalczyk

unread,

Jul 22, 2004, 6:17:16 AM7/22/04

to

On Wed, 21 Jul 2004 22:36:54 +0100, James Harris wrote:

> What I have in mind will allow the programmer to define meanings for words
> or symbols - and to match these depending on context. The plus sign is
> normally overloaded, representing integer addition of various precisions,
> real addition and possibly others. I probably won't provide the
> concatenation of strings per se but nor will I prevent "+" being used as a
> method on new data types.

I don't see a problem. This is fine. It means that looking at "x + y" you
don't know which implementation of addition will be used - so what? They
are all supposed to do analogous things to various types. They are all
supposed to give equal answer when applied to equal numbers represented
differently (be sure to distinguish integer division from real division),
modulo rounding errors. As I said, the behavior of floating point is the
only case when I accept a wrong answer motivated by ease of implementation.

> Well, I did mention multiplication rather than addition, but taking your
> comment, wouldn't MOSTPOS + MOSTPOS be too wide to be assigned to an
> integer. Perhaps you mean a negative and a non-negative....?

There is no such thing as the most positive integer, in any other sense
than the longest string. Sure, a billion of digits might not fit in
memory; when someone uses numbers *that* big, he starts getting out of
memory errors. Errors are better than a wrong answer, and in this case
errors are unavoidable. But 12345678901234567890 times 9876543210987654321
is 121932631137021795223746380111126352690, no problem.

Marco van de Voort

unread,

Jul 22, 2004, 7:57:42 AM7/22/04

to

On 2004-07-22, Lasse Hillerře Petersen <lhp+...@toft-hp.dk> wrote:
>> idea
>> only)
>>
>> MODULE xxx;
>>
>> FROM yyy IMPORT a,b,c,d,e;
>> IMPORT zzz,ooo;
>>
>> END xxx.
>
> This is much like Perl modules.
>

> Except that Perl gives the module writer a convenient way to "default" a
> list of names to be imported locally. This default set is then imported
> by simply:
> use Mmmm;

Modula2 has that too IIRC. But the not entirely standard compiler that I had
didn't implement that (IIRC, it could also be that I simply missed it, I was
a beginner then).

cr88192

unread,

Jul 22, 2004, 12:52:31 PM7/22/04

to

"Marco van de Voort" <mar...@stack.nl> wrote in message

news:slrncfuvvc....@toad.stack.nl...

something similar is possible in my lang per-se, though I don't explicitly
have a module system...
the idea is that a module is an object containing all code and variables for
a module. objects can be used as toplevels, albeit I am lacking in
(currently completed) means of usefully exploiting this feature.

my lang involves a "delegation" system, with the possibility of using this
for creating code in toplevels where it is not possible to do certain things
or access certain data (eg: because it is not visible in the scope of the
code being run). (better security may be needed eventually, but this may be
good as a basic system).
as a result, "modules" are also relative to the point of execution.

assuming there is a module of "system" features, with an "io" submodule,
with a function called openfile, one has a few ways to reference it:
system.io.openfile //a full reference

var io=system.io;
io.openfile //a shorter reference

var _io=system.io;
openfile //because now "self" delegates to system.io (note the '_')

and finally:
var openfile=system.io.openfile;
openfile //because the binding was imported directly

of course, this may be cumbersome. new syntax could be used, or a hack based
on a function:
function import(module, vars...)
{
local i;
for(i=0; vars[i]; i++)
self[vars[i]]=module[vars[i]];
}

now, if I wanted, I could write:
import(system.io, #openfile);

(imo this is one cool feature of not using lexical scoping...).

or something (then I just remember that file io in my lang sucks at present,
I just have a few special read/write functions and bytevectors, and
bytevectors are stupid for io in the abscence of c-style type casting, ...).

of course, a lot of this is defeated by a few issues at present:
I use the real toplevel currently used as a dumping ground for builtins, and
beter organization would be needed for making security more managable;
my special "object" syntax has not been fully completed/tested (I just have
the "dictionary" syntax, but this may actually make more sense for toplevels
anyways, eg, since a lot more control is given and expressions are evaluated
in the context of the creator);
I would need to add forms like 'load(script, toplevel)';
...

var user_only=[_user:=user, _console:=system.io.console];
this would create a toplevel which delegates only to "user" and
"system.io.console", but doesn't include anything else.

this could be useful, eg, with:
load("untrusted.bs", user_only);

but these are all minor.

James Harris

unread,

Jul 22, 2004, 4:00:49 PM7/22/04

to

"Marco van de Voort" <mar...@stack.nl> wrote in message

news:slrncfuugl....@toad.stack.nl...
<snip>

> > The jury is out on this one. I may permit the programmer to specify whether
> > an operation is to be checked or not.
>
> That's what Pascal does also. Please also allow to _locally_ disable/enable
> it, and not just global to the project or module.

Agreed.

> > growing on
> >> > me, eg the user typing hello-world.
> >>
> >> Problem with hyphen is that it is the same char as minus. That can be
> >> confusing. (and cause parser problems)
> >
> > I agree it is unfamiliar. Alowing it as a hyphen in identifier names would
> > require the minus sign to be separated from neighbouring character strings
> > by whitespace. Not everyone would like this.
>
> I wouldn't like it. But I don't like any significantly meaning placed on
> whitespace. Call me old fashioned :-)

OK. You're old fashioned. :-)

James Harris

unread,

Jul 22, 2004, 5:06:37 PM7/22/04

to

"cr88192" <cr8...@protect.hotmail.com> wrote in message

news:ZxFLc.5196$Qn6....@fe07.usenetserver.com...
<snip?

> as far as I understand it, you were suggesting allowing functions to be
> evaluated without an args list, or be used as fake objects.
>
> depending on implementation, this can cause "weird" semantic issues (say, a
> practical inability to usefully work with first-class functions).
>
> of course, it could be generalized in another way:
> all operations are implicitly application, so, for example, a normal
> function can be used as an array or object, and can accept method calls and
> slot assignments.
>
> dunno your syntax, using my own.
>
> function foo(var, val) {...}
> foo.bar=baz;
> which would be equivalent to a call:
> foo(#bar, baz);
>
>
> the issue, however, is when you pick up the "function with no arguments is
> equivalent to its return value" type semantics.
>
> function bar() 3;
> x=bar;
> x => 3
>
> this eliminates a lot of possible uses of functions (eg: passing them around
> and calling them from elsewhere).

Good point. I was thinking of this syntax

x y = func arg1 arg2

where func would be matched against its input types and output types. The
question is, could I replace arg1 with another function and could I do the same
with result x or y? I am thinking that all values to the left of the equals sign
would be lvalues (i.e. addresses of) and those to the right rvalues (i.e. the
values of). Since func expects two arguments if I replace arg1 I have to replace
it with another argument of the same type. This means the replacement would
either be a single identifier or need parentheses. Taking the simple identifier
first, replacing y with function fred and arg1 with function joe,

x fred = func joe arg2

where fred is another function would pass the second result of func - that was
previously assigned to y - to the function fred. Fred would be expected to
consume the value passed to it, given the above syntax. Now for function joe. It
would have to emit its value.

Now the case where fred and joe take parameters. Say they are arrays or are
functions modelling arrays. Would this work?

x (fred.3) = func (joe.9) arg2

In this case the second result of func should be allocated to element 3 of fred.
Strictly, fred would be passed two values, the index 3 and the second result
from func. Element 9 of joe would be used in the calculation. Joe would be
passed one value, 9, and emit a result of whatever type it was specified to
return.

Hmm. At the moment I think you are right. I don't have a way to pass the
function joe itself to function func. The syntax is somewhat more defined than
the simple examples above suggest. I'll see if I can fit in a means of passing
functions.

--
Cheers,
James

James Harris

unread,

Jul 22, 2004, 5:23:12 PM7/22/04

to

"Marco van de Voort" <mar...@stack.nl> wrote in message

news:slrncfuvvc....@toad.stack.nl...
<snip>

> .. treating "foreign" identifiers the same as locally defined increases
> the likeliness of nameclasses (opposed to e.g. calling with a modulename
> qualifier). That doesn't have to be bad, as long as you realise it.
> >
> > type ExtendedWord is <whatever>
> > package e : prefix "e."
> [...]
> > where the e. is the prefix from the package definition.
>
> If this prefixing is mandatory, that is pretty much what I meant.
>
> Also think about nesting modules. Can be fun :-)
>
> I got hooked on it using Modula2, and miss it in Pascal sometimes.

Thanks for explaining about the identifier imports. I've snipped it from the
above but followed your reasoning.

Yes, the prefixing is intended to be mandatory but a) is only for procedure
names, member functions if you like, and b) all variables will be local. I'm
intending all procedures to run as separate communicating processes. All
communication between them will be in defined interfaces. That said there will
be the option of collecting procedures in packages along with data.

To refer to a procedure name in a given package the package prefix will be
mandatory but that prefix can be of zero length, which means that the member
function names will stand alone. I'm intending that each process or package is
compiled separately. Compiled functions will form new instructions or data
types. To use these they will be referred to as any inbuild instructions or data
types. This brings in the question of linking. I won't go in to the details here
as it is off topic but it is to be a form of lazy linking. The requirement to
say we are ready to start process A is that all processes A refers to are
locatable and have themselves been confirmed as ready to start.

I haven't given nested modules much thought yet!

James Harris

unread,

Jul 22, 2004, 5:29:07 PM7/22/04

to

"cr88192" <cr8...@protect.hotmail.com> wrote in message

news:khSLc.5946$Qn6....@fe07.usenetserver.com...

> var _io=system.io;
> openfile //because now "self" delegates to system.io (note the '_')

I followed the others but this one defeated me. How does this work? I mean that
I can understand

var _=system.io;
openfile //etc

uses _ for "self" but aren't you showing _io as self...? I am assuming self is a
context for procedure names and hence openfile is found in self's context.

cr88192

unread,

Jul 22, 2004, 5:57:40 PM7/22/04

to

"James Harris" <no.email.please> wrote in message

news:41002c57$0$7132$db0f...@news.zen.co.uk...

all this is just weird and I don't it follow that well...

is the idea that you just have raw function calls and ones that work like
assignment or something?

I have pattern matching that can be used vaguely similar:
{#x, #y}=foo(3, 4);

the idea in this case is that foo is expected to return an array with 2
values, which will be bound to x and y.
(in this case, I leave it undefined whether the left hand side is evaluated
at compile time or runtime, but it should be treated like a compile-time
operation...).

however, it is not possible to "filter" things like that.
as far as I can tell, your language also has implicit currying? (eg: a
function can take some of the args directly, in which case it creates a
function expecting more of the args, delaying evaluation until all args are
recieved?).

personally, I am not a fan of implicit currying as it can have both
implementation and semantic consequences, instead I like currying to be done
explicitly...

> In this case the second result of func should be allocated to element 3 of
fred.
> Strictly, fred would be passed two values, the index 3 and the second
result
> from func. Element 9 of joe would be used in the calculation. Joe would be
> passed one value, 9, and emit a result of whatever type it was specified
to
> return.
>
> Hmm. At the moment I think you are right. I don't have a way to pass the
> function joe itself to function func. The syntax is somewhat more defined
than
> the simple examples above suggest. I'll see if I can fit in a means of
passing
> functions.
>

dunno.

I like use of first class functions (eg: being able to dynamicly pass them
around and call them, stuffing them in objects, ...).

of course, from what I can see the languages are clearly somewhat different
(mine inherits a lot from c and javascript, and some from scheme and
self...).

cr88192

unread,

Jul 22, 2004, 6:32:28 PM7/22/04

to

"James Harris" <no.email.please> wrote in message

news:4100319c$0$7133$db0f...@news.zen.co.uk...

self is allways implicitly referenced if a variable can't be found in the
current lexical scope.

self can be referenced directly via the psuedo-variable self.

'_' is not in itself a name, but it intended to be a NULL psuedovariable
(eg: any attempts to bind or assign it have no effect, and any attempts to
reference it are viewed as invalid).
it is intended to signify "don't care" spots in patterns.

however, as a special bit of semantics, all variables beginning with '_' are
used as "delegates" (excepting those beginning with '__', which are
'special', this includes variables which are generally intended to be
set/interpreted by the implementation or used for special features, but not
to be really used by general code).

eg:
'_io' means "a delegate variable named 'io'".
whatever is assigned to io will be searched for references if they can't be
found in self or any of the previous delegates.
the reason a name is given is so that it is possible to reference or
re-assign them if needed (though this will be discouraged as later it may be
allowed to cause a performance hit).

there are a number of possible delegates. a few 'default' ones are '_parent'
and '_toplevel', which will typically take precedence over others (this can
be controlled more finely by creating objects with dictionary syntax, where
the exact precedence order is under the creators control).

also cool:
delegation graphs are also allowed to be cyclic, as otherwise delegating the
toplevel to system.io, assuming system.io delegated to the toplevel, would
lead to an infinate loop whenever a variable could not be found in any child
scope.

by defualt the toplevel delegates to itself. the idea is thus that
'_toplevel' can be used from pretty much anywhere to refer to the current
toplevel.

eg: _toplevel._io=system.io; is yet another possible way to do an import,
and will also work from child scopes (assuming that _toplevel doesn't
delegate somewhere weird along the way...).

all for now.

James Harris

unread,

Jul 22, 2004, 6:34:35 PM7/22/04

to

"cr88192" <cr8...@protect.hotmail.com> wrote in message

news:dLWLc.5972$Qn6....@fe07.usenetserver.com...

<snip>

> > Now the case where fred and joe take parameters. Say they are arrays or
> are
> > functions modelling arrays. Would this work?
> >
> > x (fred.3) = func (joe.9) arg2
> >
> all this is just weird and I don't it follow that well...

I know. Someone else's syntax suddenly thrown in to the pot without explanation
is hard to follow. Using more conventional syntax

fred.3 is an array reference, more usually referred to as fred[3]
joe.9 is an array reference joe[9]

The /type/ of fred.3 is the type of an element of fred. If the C language
returned a tuple (and using a non-C "let" for clarity of meaning) the above
could be seen as

let x, fred[3] = func (joe[9], arg2)

where func reads the two values on its right and produces the two values on its
left.

> is the idea that you just have raw function calls and ones that work like
> assignment or something?

Everything to the right of the function name, "func", is passed to func.
Everything to the left of the equals sign is returned from it.

In the case of joe, it too can be seen as a function. It has the number 9 to its
right so is passed the number 9. In this case, as an array, it returns the value
of element 9, which replaces (joe[9]) in the arguments passed to func.

Fred, on the other hand, is to the left of the equals sign. It is passed the
number 3 (the index) and the second result from func. It will then do what an
array does, store the second result from func in element 3. As a simpler
example,

fred[3] = 7

is expressed as

fred.3 = 7

or

fred.(3) = 7

where parentheses serve no purpose other than to gather arguments, in this case,
only one of them. The latter is reminiscent of ocaml arrays.

In all the above, and key to what I am intending, we don't need to know in the
source code whether fred or joe are true arrays or process abstractions behaving
as arrays. This allows me to change the implementation of these components
without changing the source code that uses them. A more definite example, joe,
for speed and simplicity, could be an array in the same memory space as the
process being defined. On the other hand joe could be a separate process running
on the computer. Further, joe could be either an array stored on or a process
running on a different machine somewhere out on the network. In all cases, as
long as joe behaves as required and the language implements the communcation,
the source code of the program that uses joe does not change. I like that!

> I have pattern matching that can be used vaguely similar:
> {#x, #y}=foo(3, 4);
>
> the idea in this case is that foo is expected to return an array with 2
> values, which will be bound to x and y.

Yes, this part looks to be the same. My returns are just individual values (of
any type) at the moment. I have yet to do the work to cover variable numbers of
values, generators and the like.

> (in this case, I leave it undefined whether the left hand side is evaluated
> at compile time or runtime, but it should be treated like a compile-time
> operation...).
>
> however, it is not possible to "filter" things like that.
> as far as I can tell, your language also has implicit currying? (eg: a
> function can take some of the args directly, in which case it creates a
> function expecting more of the args, delaying evaluation until all args are
> recieved?).
>
> personally, I am not a fan of implicit currying as it can have both
> implementation and semantic consequences, instead I like currying to be done
> explicitly...

Thanks for the comment. As I say, I have yet to work this stuff out.

> I like use of first class functions (eg: being able to dynamicly pass them
> around and call them, stuffing them in objects, ...).
>
> of course, from what I can see the languages are clearly somewhat different
> (mine inherits a lot from c and javascript, and some from scheme and
> self...).

I think there is also a strong C influence in mine - but, as you have noticed,
not the syntax...!

cr88192

unread,

Jul 22, 2004, 8:08:03 PM7/22/04

to

"James Harris" <no.email.please> wrote in message

news:410040f4$0$7125$db0f...@news.zen.co.uk...

>
> "cr88192" <cr8...@protect.hotmail.com> wrote in message
> news:dLWLc.5972$Qn6....@fe07.usenetserver.com...
>
> <snip>
>
> > > Now the case where fred and joe take parameters. Say they are arrays
or
> > are
> > > functions modelling arrays. Would this work?
> > >
> > > x (fred.3) = func (joe.9) arg2
> > >
> > all this is just weird and I don't it follow that well...
>
> I know. Someone else's syntax suddenly thrown in to the pot without
explanation
> is hard to follow. Using more conventional syntax
>
> fred.3 is an array reference, more usually referred to as fred[3]
> joe.9 is an array reference joe[9]
>
> The /type/ of fred.3 is the type of an element of fred. If the C language
> returned a tuple (and using a non-C "let" for clarity of meaning) the
above
> could be seen as
>
> let x, fred[3] = func (joe[9], arg2)
>
> where func reads the two values on its right and produces the two values
on its
> left.
>

ok, this makes sense now.

> > is the idea that you just have raw function calls and ones that work
like
> > assignment or something?
>
> Everything to the right of the function name, "func", is passed to func.
> Everything to the left of the equals sign is returned from it.
>

yes, ok.

yes, this is cool.

>
> > I have pattern matching that can be used vaguely similar:
> > {#x, #y}=foo(3, 4);
> >
> > the idea in this case is that foo is expected to return an array with 2
> > values, which will be bound to x and y.
>
> Yes, this part looks to be the same. My returns are just individual values
(of
> any type) at the moment. I have yet to do the work to cover variable
numbers of
> values, generators and the like.
>

ok.

the idea though is that multiple return values are generated by returning an
array.

eg:
function haar1d(a, b) ({a+b, b-a});
//parens needed for syntactic reasons

function haar2d(a, b, c, d)
{
{#a1, #b1}=haar1d(a, b);
{#c1, #d1}=haar1d(c, d);
{#a2, #c2}=haar1d(a1, c1);
{#b2, #d2}=haar1d(b2, d2);

{a2, b2, c2, d2}
}
ok, this would have been more elegant with raw substitution, but this is
just to show a point...

misc note:
functions in my lang can be used in both functional and imperative styles,
eg:
tail expressions have implicit return and tail-optimizing, like above;
I can use return manually, eg, to return from wherever or force
tail-optimization.

it is also possible to fold arrays as well, eg:
{#x, #y...}={1, 2, 3, 4};
x => 1
y => {2, 3, 4}

generators are not possible in my case at present though, though in my
personal experience I haven't come up with a strong reason for generators
anyways.
a hack could be done, eg:
function fib_gen(x, y)
fun() ({x, fib_gen(y, x+y+1)...});
var fib=fib_gen(1, 1)();

this could either produce a (faked) infinately long array, or form a nested
array:
{1, {1, {3, {5, ...}}}}

this is assuming I could come up with a good reason for them.

this also reminds me of once having ideas of doing symbolic algebra in a
programming language. it would require a few hacks but it would be possible.
dunno if there is any possible value in this either though...

a weaker hack:
var tri=[a:=3, b:=4];
var pyth=[c:=expr(sqrt((a*a)+(b*b)))];
println("c=", (tri&pyth).c());

where expr would define an "expression" that can be forced to evaluate it by
calling it, in which case it will replace itself with its value (sort of
like delay, but maybe more constrained in that it will not be bytecoded
until final evaluation and thus may be rewritten before hand, and will have
special behavior wrt operations).
(this makes me wonder if this is just a kludge for my absence of lists...).

one thing though:
use of array folding and multiple return values like this will have memory
costs at present (one of the many things that eliminates constant-memory
ness).

I may fix this later though.

> > (in this case, I leave it undefined whether the left hand side is
evaluated
> > at compile time or runtime, but it should be treated like a compile-time
> > operation...).
> >
> > however, it is not possible to "filter" things like that.
> > as far as I can tell, your language also has implicit currying? (eg: a
> > function can take some of the args directly, in which case it creates a
> > function expecting more of the args, delaying evaluation until all args
are
> > recieved?).
> >
> > personally, I am not a fan of implicit currying as it can have both
> > implementation and semantic consequences, instead I like currying to be
done
> > explicitly...
>
> Thanks for the comment. As I say, I have yet to work this stuff out.
>

yes, however with the semantics from before implicit currying may be
required though...

>
> > I like use of first class functions (eg: being able to dynamicly pass
them
> > around and call them, stuffing them in objects, ...).
> >
> > of course, from what I can see the languages are clearly somewhat
different
> > (mine inherits a lot from c and javascript, and some from scheme and
> > self...).
>
> I think there is also a strong C influence in mine - but, as you have
noticed,
> not the syntax...!
>

yes, mine has a lot of syntactic influence from c, but a lot of semantic
influence from scheme and self.

Marco van de Voort

unread,

Jul 22, 2004, 10:29:12 PM7/22/04

to

On 2004-07-22, James Harris <> wrote:
>>
>> If this prefixing is mandatory, that is pretty much what I meant.
>>
>> Also think about nesting modules. Can be fun :-)
>>
>> I got hooked on it using Modula2, and miss it in Pascal sometimes.
>
> Thanks for explaining about the identifier imports. I've snipped it from the
> above but followed your reasoning.
>
> Yes, the prefixing is intended to be mandatory but a) is only for procedure
> names, member functions if you like, and b) all variables will be local.

Types, constants ?

(I'll read the rest later when I'm not dead tired)

> I haven't given nested modules much thought yet!

For a more OOP eq, see inner-classes in Java.

Marco van de Voort

unread,

Jul 22, 2004, 10:29:54 PM7/22/04

to

On 2004-07-23, Marco van de Voort <mar...@stack.nl> wrote:

(as said dead tired)

>> above but followed your reasoning.
>>
>> Yes, the prefixing is intended to be mandatory but a) is only for procedure
>> names, member functions if you like, and b) all variables will be local.
>
> Types, constants ?
>
> (I'll read the rest later when I'm not dead tired)
>
>> I haven't given nested modules much thought yet!
>
> For a more OOP eq, see inner-classes in Java.

... but with a little bit more control over import/export.

Don Groves

unread,

Jul 29, 2004, 2:02:47 AM7/29/04

to

"James Harris" <no.email.please> wrote in message news:<40f918a1$0$7807$db0f...@news.zen.co.uk>...
> Before I embark on a new long-term language project I'd appreciate your advice on how to
> split up long names. I would like to keep the standards for command or instruction names
> the same as that for variable and type names, if possible. Looking at the examples below,
> which ones seem better?

>
> I could also use embedded hyphens as my minus sign must be surrounded by whitespace
> (please suspend disbelief while looking at these. I know they will look unfamiliar!)
> echo-client
> last-char-offset
> hello-world

This gets my vote. Easy and fast to type (no shifted chars) and easy to read,
especially for lispers and schemers. Others will get used to it quickly.
--
dg

cody

unread,

Jul 30, 2004, 3:53:30 AM7/30/04

to

It depends on the language you are using.
All your given conventions are used by languages, you cannot say which one
is better because they are all conventions that are used.

echoClient->Java
EchoClient->VB,C#,Delphi,VC++ (Most languages used under Windows)
echo-client->Scheme and some other strange Languages
echo_client->plain C, C++

My advise is that you should adapt to the language/framework conventions of
the language/framework you are using.

--
cody

Freeware Tools, Games and Humour
http://www.deutronium.de.vu || http://www.deutronium.tk
"James Harris" <no.email.please> schrieb im Newsbeitrag
news:40f918a1$0$7807$db0f...@news.zen.co.uk...

>
> Before I embark on a new long-term language project I'd appreciate your
advice on how to
> split up long names. I would like to keep the standards for command or
instruction names
> the same as that for variable and type names, if possible. Looking at the
examples below,
> which ones seem better?
>

> Straight names
> echoclient
> lastcharoffset
> helloworld
>
> Internal underscores
> echo_client
> last_char_offset
> hello_world

>
> I could also use embedded hyphens as my minus sign must be surrounded by
whitespace
> (please suspend disbelief while looking at these. I know they will look
unfamiliar!)
> echo-client
> last-char-offset
> hello-world
>

> Mixed case
> EchoClient
> LastCharOffset
> HelloWorld
>
> Initial lower case then mixed
> echoClient
> lastCharOffset
> helloWorld
>
> In some ways I like the mixed case versions using an inital capital,
especially as I may
> want to prefix some names with a code for an abstract data type, which,
when present,
> could begin with a lower case. Is this getting too Microsoft-ish? Is it
getting to
> Hungarian? Is Hungarian bad when used with abstract data types rather than
inbuilt ones?
>
> Advice on which is or is not thought to be acceptable would be much
appreciated. Please
> bear in mind that I intend these names for commands/instructions as well
as variables and
> types. Constants would be in all caps.
>
> --
> Thanks,
> James
>
>

cody

unread,

Jul 30, 2004, 4:01:46 AM7/30/04

to

Sorry I reread you posting, you aren't planning a project, you are planning
a new Language.

So it depends which platform your want to support. For primarily windows I'd
suggest pascalcase (EchoClient).
You can also use camelcase (echoClient) like in Java.
If your language uses minus signs for subtraction I'd strongly suggest you
not to allow hyphens in identifiers.
conventions like (echo_client) do not allow differentiations between
variable and classnames.
Therefore I consider for myself PascalCase and camelCase the best ones.

--
cody

Freeware Tools, Games and Humour
http://www.deutronium.de.vu || http://www.deutronium.tk

"cody" <deutr...@web.de> schrieb im Newsbeitrag
news:2mudppF...@uni-berlin.de...

Lasse Hillerøe Petersen

unread,

Jul 31, 2004, 3:02:01 AM7/31/04

to

In article <2mue7iF...@uni-berlin.de>, "cody" <deutr...@web.de>
wrote:

> Sorry I reread you posting, you aren't planning a project, you are planning
> a new Language.
>
> So it depends which platform your want to support. For primarily windows I'd
> suggest pascalcase (EchoClient).
> You can also use camelcase (echoClient) like in Java.
> If your language uses minus signs for subtraction I'd strongly suggest you
> not to allow hyphens in identifiers.
> conventions like (echo_client) do not allow differentiations between
> variable and classnames.

Au contraire. Apart from the (perhaps less) obvious method of using
boldface for types/classnames and regular italic for variables, and
permitting spaces in names, you can still use a lower/uppercase
convention combined with underscore. I believe this is the style
recommended for Eiffel by Bertrand Meyer. Although I'd choose
bold/italic/space if possible, I'd pick Eiffel-style otherwise, except
when using a language already having some other convention.

-Lasse

cr88192

unread,

Jul 31, 2004, 6:05:28 AM7/31/04

to

"Lasse Hillerře Petersen" <lhp+...@toft-hp.dk> wrote in message
news:lhp+news-C8E508...@news.tele.dk...

I wonder if your source is still in plaintext...
in most normal conditions bold and italic are not usable in programming
languages based on the fact that text editors don't support them, or the
compiler doesn't except the formats for which that style of formatting is
allowed.

unless of course you are using an ide or such that handles all of this, or
you have an editor that does syntax highlighting or changing the style...

if, of course, formatting could be used in a language, this brings up
interesting ideas, like, eg, using a bold '.' to mean dot-product, or and
italic 'X' for cross product, ...

James Harris

unread,

Jul 31, 2004, 8:52:34 AM7/31/04

to

"Marco van de Voort" <mar...@stack.nl> wrote in message

news:slrncfuugl....@toad.stack.nl...

>
> I wouldn't like it. But I don't like any significantly meaning placed on
> whitespace. Call me old fashioned :-)

I already replied but I've been wondering about this statement. As a programmer
for many years I've agreed with this but I'm finding my views changing.

When programming we use various items of punctuation to separate elements in the
code but we don't expect users to use the punctuation-laden syntax when invoking
our code from the command line. They use whitespace. Compare these fictitious
statements,

write ("Hello", username, "\n");

write "Hello" username "\n"

and the second - which could be the syntax used when invoking the "write"
program - doesn't need the parens or the comma. This syntax DOES presume
grouping of the command with its parameters which is not necessary in this
example as there is nothing with which to group. In a more complex example,
given the functions "max" and "min" which return the largest and smallest of
their parameters, the more popular,

highest = max (A, B, C)

could be replaced by,

highest = max A B C

then the function would need to be grouped - delimited from any context - so,

range = max(A, B, C, D, E) - min(A, B, C, D, E)

would be replaced by

range = (max A B C D E) - (min A B C D E)

which provides the grouping required. How does that look?

--
Cheers,
James

Marcin 'Qrczak' Kowalczyk

unread,

Jul 31, 2004, 9:07:32 AM7/31/04

to

On Sat, 31 Jul 2004 13:52:34 +0100, James Harris wrote:

> When programming we use various items of punctuation to separate elements in the
> code but we don't expect users to use the punctuation-laden syntax when invoking
> our code from the command line. They use whitespace. Compare these fictitious
> statements,
>
> write ("Hello", username, "\n");
>
> write "Hello" username "\n"

It's not that fictitious. In my language Kogut you write

Write "Hello " username "\n";

or better

WriteLine "Hello " username;

where the semicolon is needed if this is statement is followed by other
statements.

> highest = max (A, B, C)
>
> could be replaced by,
>
> highest = max A B C

let highest = Max A B C;

> then the function would need to be grouped - delimited from any context - so,
>
> range = max(A, B, C, D, E) - min(A, B, C, D, E)
>
> would be replaced by
>
> range = (max A B C D E) - (min A B C D E)

Named function application binds stronger than operator application,
so this is

let range = Max A B C D E - Min A B C D E;

cr88192

unread,

Jul 31, 2004, 10:12:40 AM7/31/04

to

"James Harris" <no.email.please> wrote in message

news:410b960e$0$7125$db0f...@news.zen.co.uk...

>
> "Marco van de Voort" <mar...@stack.nl> wrote in message
> news:slrncfuugl....@toad.stack.nl...
>
> >
> > I wouldn't like it. But I don't like any significantly meaning placed on
> > whitespace. Call me old fashioned :-)
>
> I already replied but I've been wondering about this statement. As a
programmer
> for many years I've agreed with this but I'm finding my views changing.
>

well, there are reasons to use common syntax, and reasons why it is not
necessary to do such.

<snip>

> range = (max A B C D E) - (min A B C D E)
>
> which provides the grouping required. How does that look?
>

you are half-way there in reinventing a hybrid of lisp and logo style
syntax...

lisp style:
(= range (- (max A B C D E) (min A B C D E)))

logo style (as far as I remember anyways):
= range [- [max A B C D E] [min A B C D E]]

now compare this with:

range = max(A, B, C, D, E) - min(A, B, C, D, E)

it is worth noting that in many cases newbies will likely be scared away by
lisp style syntax, mostly as things can happen like:
parens can build up to large numbers;
it is not terribly obvious how to break up/indent things;
a lot of the visual cues are missing;
...

which is sad really, but not that much can be done (though many people would
rather deny the issue, or say that it is unimportant). yes, s-exps do allow
cool features, but at the costs listed above.

or maybe all this was somehow a historical accident, and the only reason
people use c-style syntax is because of history.
in any case for the time being it is the most common, and programmers often
seem comfortable with it.

afaik there is probably somewhere a balance between opposing forces:
punctuation vs whitespace;
symbols vs words;
blocks/statements vs. expressions;
...

I don't know.

my lang seems to be going in the general direction of being fairly loose
about some things, but there are limits, and often particularly odd syntax
for things...

now, why in my lang did I make it so that:
(a)=3;
causes 'a' to evaluate to a value that is used like a target/pattern?

dunno exactly, but at least I can do proxy assignment (among other things):

var foo, bar;
foo=#bar;
(foo)="baz";
bar => "baz"

and at least I can have basic pattern decomposition based on this...

James Harris

unread,

Jul 31, 2004, 10:29:12 AM7/31/04

to

"Marcin 'Qrczak' Kowalczyk" <qrc...@knm.org.pl> wrote in message
news:pan.2004.07.31...@knm.org.pl...

> Write "Hello " username "\n";
> WriteLine "Hello " username;

> let highest = Max A B C;

> let range = Max A B C D E - Min A B C D E;

I like the format and this set me to look more in to your web site. It's great
to see some overlapping of ideas - i.e. someone else having gone along some of
the same thought processes, though there are more differences.

Some questions:
1) Have you written a starter-guide - something to 'sell' the language including
simple examples?
2) The FAQ says that whitespace is insignificant. Aren't you using whitespace to
separate parameters?

Marcin 'Qrczak' Kowalczyk

unread,

Jul 31, 2004, 10:38:08 AM7/31/04

to

On Sat, 31 Jul 2004 15:29:12 +0100, James Harris wrote:

> 1) Have you written a starter-guide - something to 'sell' the language
> including simple examples?

Not yet, unfortunately, and the language reference is incomplete.
You know, this is less fun than implementing and using a language :-)

I will make better references to existing examples, and I'm making PLEAC
entries <http://pleac.sourceforge.net/>.

> 2) The FAQ says that whitespace is insignificant. Aren't you using
> whitespace to separate parameters?

There are places where some whitespace or comments is required to separate
tokens, but the amount and shape of whitespace (i.e. whether it's spaces
or newlines, or how large the indent is) is insignificant - like in many
languages, unlike Python, Ruby, Haskell, Unix shell, and C preprocessor.

James Harris

unread,

Jul 31, 2004, 11:10:04 AM7/31/04

to

"cr88192" <cr8...@protect.hotmail.com> wrote in message

news:IMNOc.5763$4%2.3...@fe07.usenetserver.com...

> you are half-way there in reinventing a hybrid of lisp and logo style
> syntax...
>
> lisp style:
> (= range (- (max A B C D E) (min A B C D E)))

<snip>

> it is worth noting that in many cases newbies will likely be scared away by
> lisp style syntax, mostly as things can happen like:
> parens can build up to large numbers;
> it is not terribly obvious how to break up/indent things;
> a lot of the visual cues are missing;

Agreed. I don't want to put people off. I've come to see the Lisp prefix example
as very flexible and logical from the point of view of the language designer;
however, while it is more reasonable for a function such as Max, above, it is
much less familiar to most programmers than an infix version (if such is
possible). Infix falls down in suggesting exactly two operands. I'm (currently)
intending to allow both, hence the hybrid,

range = (max A B C D E) - (min A B C D E)

which uses both forms and is the most clear way I can think to write this. The
infix notation is just syntactic sugar for the more general prefix notation so
the above is really

range = - (max A B C D E) (min A B C D E)

but I find the former clearer in source code, and it has more of the visual
clues you mention and which are important. A program is read many more times
than it is written!

Parentheses will build up a little partly because I (currently) intend to
require the order of operations to be explicitly specified. They won't build up
in the same way as they do in Lisp because a) assignment will not appear within
an expression, b) the compiler will allow infix as mentioned, c) functions will
not require parameters to be in parens. These three reasons should reduce the
paren count.

<snip>

> or maybe all this was somehow a historical accident, and the only reason
> people use c-style syntax is because of history.

I think it goes back further than C, to school. We are taught to write 'sums'
such as 3+4. No bad thing, though.

<snip>

> now, why in my lang did I make it so that:
> (a)=3;
> causes 'a' to evaluate to a value that is used like a target/pattern?
>
> dunno exactly, but at least I can do proxy assignment (among other things):
>
> var foo, bar;
> foo=#bar;
> (foo)="baz";
> bar => "baz"

Does the third example assign "baz" to bar? I'm assuming foo has been set to a
reference to bar.

James Harris

unread,

Jul 31, 2004, 11:47:34 AM7/31/04

to

"Marcin 'Qrczak' Kowalczyk" <qrc...@knm.org.pl> wrote in message

news:pan.2004.07.31....@knm.org.pl...

> I will make better references to existing examples, and I'm making PLEAC
> entries <http://pleac.sourceforge.net/>.

Interesting. I'd seen the updated Shootout site
http://shootout.alioth.debian.org/ but this one is new to me.

> > 2) The FAQ says that whitespace is insignificant. Aren't you using
> > whitespace to separate parameters?
>
> There are places where some whitespace or comments is required to separate
> tokens, but the amount and shape of whitespace (i.e. whether it's spaces
> or newlines, or how large the indent is) is insignificant - like in many
> languages, unlike Python, Ruby, Haskell, Unix shell, and C preprocessor.

I would guess whitespace within tokens is invalid. Overall I would say this
regards whitespace as significant! The classic example of a language for which
whitespace is insignificant is this code, from Fortran,

DO3I = 1,10

which starts off looking like an assigment but it is, in fact, a loop construct
(DO 3 I = 1,10) whereas,

DO 3 I=1

is an assignment statement assigning 1 to variable DO3I. The compiler can ignore
whitespace because of the use of punctuation such as equal-signs and commas. In
Kogut and in what I have in mind whitespace is required to delimit tokens.

I'm intriguged by the use of semicolon to end a statement. How much would
treating newlines as significant affect your language? I don't think I need the
semicolon to terminate statements. Mostly context will show the need e.g.,

struct {
int var1
int var2
} myStruct

with no need for a terminating semicolon. I am thinking to use semicolon as a
statement separator so as to allow multiple statements on a line such as,

sum = a + b; diff = a - b

but not require them at the ends of lines as I think it looks neater and that
they are unneccessary. Your language strucure is similar. Are there strong
reasons as to why you require semicolons to terminate statements?

--
Cheers,
James

cr88192

unread,

Jul 31, 2004, 12:10:00 PM7/31/04

to

"James Harris" <no.email.please> wrote in message

news:410bb644$0$7131$db0f...@news.zen.co.uk...

yes, at least you thought about it some.

I a lot of what I post is intermediate, and, thus, subject to change...

> <snip>
> > or maybe all this was somehow a historical accident, and the only reason
> > people use c-style syntax is because of history.
>
> I think it goes back further than C, to school. We are taught to write
'sums'
> such as 3+4. No bad thing, though.
>

yes, this is the case for infix.

what about c syntax in general?

c syntax generally resembles a lot of algebra in various ways (ok, this is
debatable).
it has generally defeated many more wordy syntaxes (eg: pascal, cobol, ...);
it is not threatened that much by those with much larger amounts of symbols
(eg: perl).

I think it might be balanced here.

making general code structure exist in terms of large-expressions feels a
little weird, and maybe is a little less natural to many people than
sequences of commands.

...

of course, all of this may have been historical circumstance as well...

> <snip>
> > now, why in my lang did I make it so that:
> > (a)=3;
> > causes 'a' to evaluate to a value that is used like a target/pattern?
> >
> > dunno exactly, but at least I can do proxy assignment (among other
things):
> >
> > var foo, bar;
> > foo=#bar;
> > (foo)="baz";
> > bar => "baz"
>
> Does the third example assign "baz" to bar? I'm assuming foo has been set
to a
> reference to bar.
>

yes.

foo is assigned a symbol.
(foo)="baz";
evaluates foo, notes it is a symbol, and binds "baz" to the named slot.

similarly, patterns can be passed in vars like this:

pat={#x, #y, #z};
(pat)={1, 2, 3};
x => 1
y => 2
z => 3

this may later have other uses as well.

Marcin 'Qrczak' Kowalczyk

unread,

Jul 31, 2004, 12:16:07 PM7/31/04

to

On Sat, 31 Jul 2004 16:47:34 +0100, James Harris wrote:

> Overall I would say this regards whitespace as significant! The classic
> example of a language for which whitespace is insignificant is this
> code, from Fortran,

Ok, this could be phrased differently.

Anyway, Fortran is an exception: I don't know of any other language which
doesn't require some whitespace between identifiers, numbers and keywords.
Most syntaxes fall into three groups:
1. Newlines and indentation is significant (Python, Haskell, Clean).
2. Newlines are significant, indentation is not, except some constructs
like string literals (Ruby, Unix shell, Visual Basic).
3. Neither is significant, except that sometimes some whitespace is needed
to separate tokens, and except some constructs like string literals and
to-end-of-line comments (most languages).

> I'm intriguged by the use of semicolon to end a statement. How much would
> treating newlines as significant affect your language?

I tried 1 at the beginning, then 2.

The problem is that there are too many cases where a newline doesn't end
a definition or statement. Even not counting cases which are obvious from
the context, e.g. after an operator or before a ")".

One could use \ or something to mark a newline as insignificant, but with
too many \'s it's uglier than with explicit semicolons. It's hard for a
human to see whether he is allowed to split a line in a particular place.

This is especially bad if arguments are separated by spaces, because you
can't rely on a newline after a comma being ignored. So every function
application which doesn't fit in one line needs a \.

Significant newlines constrained other parts of my syntax to use only such
combinations of tokens that can be nicely split into lines, without the
need of many explicit \'s. These constraints were too strong, so I abandoned
significant newlines.

Lasse Hillerøe Petersen

unread,

Jul 31, 2004, 3:32:44 PM7/31/04

to

In article <Z8KOc.5755$4%2.4...@fe07.usenetserver.com>,

"cr88192" <cr8...@protect.hotmail.com> wrote:
> >
> I wonder if your source is still in plaintext...

;-)

> in most normal conditions bold and italic are not usable in programming
> languages based on the fact that text editors don't support them, or the
> compiler doesn't except the formats for which that style of formatting is
> allowed.

Some ten years ago, I used Think Pascal, for the Macintosh. It did
syntax checking on the fly, and rendered programs with keywords in bold;
very nice. The Script Editor for AppleScript did much the same, taking
the approach a bit further. I know that this is not quite the same, as
typeface was not used to distinguish symbols in syntax, only as a pretty
rendering; but the step is not far. This is one thing I'd really love
support for in Algol68g, some way to write programs using a special
stropping instead of uppercase, and a way to prettyprint and edit on an
xterm using boldface for modes and keywords.

> unless of course you are using an ide or such that handles all of this, or
> you have an editor that does syntax highlighting or changing the style...

Some IDE is necessary, but bold and underline is available even with a
simple xterm. So it doesn't necessarily have to be a "window-based" IDE.

> if, of course, formatting could be used in a language, this brings up
> interesting ideas, like, eg, using a bold '.' to mean dot-product, or and
> italic 'X' for cross product, ...

I'd rather use proper symbols for such things.

-Lasse

Message has been deleted

cr88192

unread,

Jul 31, 2004, 9:21:21 PM7/31/04

to

"Lasse Hillerře Petersen" <lhp+...@toft-hp.dk> wrote in message

news:lhp+news-8C089B...@news.tele.dk...

> In article <Z8KOc.5755$4%2.4...@fe07.usenetserver.com>,
> "cr88192" <cr8...@protect.hotmail.com> wrote:
> > >
> > I wonder if your source is still in plaintext...
>
> ;-)
>
> > in most normal conditions bold and italic are not usable in programming
> > languages based on the fact that text editors don't support them, or the
> > compiler doesn't except the formats for which that style of formatting
is
> > allowed.
>
> Some ten years ago, I used Think Pascal, for the Macintosh. It did
> syntax checking on the fly, and rendered programs with keywords in bold;
> very nice. The Script Editor for AppleScript did much the same, taking
> the approach a bit further. I know that this is not quite the same, as
> typeface was not used to distinguish symbols in syntax, only as a pretty
> rendering; but the step is not far. This is one thing I'd really love
> support for in Algol68g, some way to write programs using a special
> stropping instead of uppercase, and a way to prettyprint and edit on an
> xterm using boldface for modes and keywords.
>

dunno. rendering text special, in general, does not seem that useful (sure,
it can add visual distinctiveness, but it is mostly just an editor feature).

> > unless of course you are using an ide or such that handles all of this,
or
> > you have an editor that does syntax highlighting or changing the
style...
>
> Some IDE is necessary, but bold and underline is available even with a
> simple xterm. So it doesn't necessarily have to be a "window-based" IDE.
>

yes.

I was just meaning, eg, you can't write code in notepad...
one possibility could be binding an editor to the language (not my preferred
approach, but it is possible).
your code could be represented largely as a glob of xml with a lot of text
stuffed in there as well...

a custom binary format could be used as well, which might be simpler.

or, another weird thought:
a variation of ansi codes could be used for various features as well...

> > if, of course, formatting could be used in a language, this brings up
> > interesting ideas, like, eg, using a bold '.' to mean dot-product, or
and
> > italic 'X' for cross product, ...
>
> I'd rather use proper symbols for such things.
>

yes.

doing anything weird would elminate sending the code as plaintext as well
anyways.
this largely means that little can be done, people are stuck with a limited
character range.

of course, one could use unicode (assuming it gets a lot more common), and
maybe with special editors (that or doing a more complicated multi-lingual
setup or such) one could, eg, also use greek letters and various other
symbols in code or such...

I don't know.

Howard Ding <hading@hading.dnsalias.com>

unread,

Aug 1, 2004, 12:59:16 AM8/1/04

to

"cr88192" <cr8...@protect.hotmail.com> writes:

> it is worth noting that in many cases newbies will likely be scared away by
> lisp style syntax, mostly as things can happen like:
> parens can build up to large numbers;
> it is not terribly obvious how to break up/indent things;
> a lot of the visual cues are missing;
> ...
>

I think a lot here depends on whether you're talking about "newbies to
programming" or "newbies to this particular language who have
experience in other languages". The HtDP people (www.htdp.org and
www.teach-scheme.org) have a lot of experience teaching Scheme, and
the former group of people, according to their experience, don't
really seem to have many problems picking up a Lisp; it's those
bringing in experience from other languages that seem to struggle.

--
Howard Ding
<had...@hading.dnsalias.com>

cr88192

unread,

Aug 1, 2004, 2:03:38 AM8/1/04

to

<had...@hading.dnsalias.com> wrote in message
news:m3fz77p...@frisell.localdomain...

hmm, this is probably true...

I was working roughly under the assumption that people looking at it would
have allready worked with other languages, and would have just encountered
it for whatever reason.

"newbies to programming" tend more often to be drawn to java apparently (and
then later get taught vb as it is most common for classes), and a lot of
them don't seem to know the difference between "machine language" and "c++"
anyways...
thus, the only way most of them will encounter programming will be through a
class, and when it is a class anything remotely reasonable can be taught
without complaint (except for those who object because it is not java or
whatever other language is overly hyped at the time...).
under this assumption most of them are unlikely to encounter lisp or scheme,
but if they did learning it would be no big deal.

or something...

Wilhelm B. Kloke

unread,

Aug 1, 2004, 4:22:45 AM8/1/04

to

In article <lhp+news-8C089B...@news.tele.dk>,

Lasse Hillerře Petersen <lhp+...@toft-hp.dk> wrote:
>
>Some ten years ago, I used Think Pascal, for the Macintosh. It did
>syntax checking on the fly, and rendered programs with keywords in bold;
>very nice. The Script Editor for AppleScript did much the same, taking
>the approach a bit further. I know that this is not quite the same, as
>typeface was not used to distinguish symbols in syntax, only as a pretty
>rendering; but the step is not far. This is one thing I'd really love
>support for in Algol68g, some way to write programs using a special
>stropping instead of uppercase, and a way to prettyprint and edit on an
>xterm using boldface for modes and keywords.

It should be easy to add a new stropping regime for Algol68 (or your
favourite other language) to allow
for constructs like "{\b while}" (TeX) or <bold>while<\bold> (HTML) or
your other favourite RTF (rich text format) to represent the
keyword WHILE. In this case pretty printing is easy: Just use TeX/
your browser/wordprocessor. The textual representation seams clumsy, but
this may be facilitated by the use of edito/browser features.
--
Dipl.-Math. Wilhelm Bernhard Kloke
Institut fuer Arbeitsphysiologie an der Universitaet Dortmund
Ardeystrasse 67, D-44139 Dortmund, Tel. 0231-1084-257

cr88192

unread,

Aug 1, 2004, 5:06:20 AM8/1/04

to

"Wilhelm B. Kloke" <w...@arb-phys.uni-dortmund.de> wrote in message
news:1091349913.914106@vestein...

> In article <lhp+news-8C089B...@news.tele.dk>,
> Lasse Hillerře Petersen <lhp+...@toft-hp.dk> wrote:
> >
> >Some ten years ago, I used Think Pascal, for the Macintosh. It did
> >syntax checking on the fly, and rendered programs with keywords in bold;
> >very nice. The Script Editor for AppleScript did much the same, taking
> >the approach a bit further. I know that this is not quite the same, as
> >typeface was not used to distinguish symbols in syntax, only as a pretty
> >rendering; but the step is not far. This is one thing I'd really love
> >support for in Algol68g, some way to write programs using a special
> >stropping instead of uppercase, and a way to prettyprint and edit on an
> >xterm using boldface for modes and keywords.
>
> It should be easy to add a new stropping regime for Algol68 (or your
> favourite other language) to allow
> for constructs like "{\b while}" (TeX) or <bold>while<\bold> (HTML) or
> your other favourite RTF (rich text format) to represent the
> keyword WHILE. In this case pretty printing is easy: Just use TeX/
> your browser/wordprocessor. The textual representation seams clumsy, but
> this may be facilitated by the use of edito/browser features.

hmm, yes, one just needs a compiler that supports it (or a tool for ripping
out the formatting).
for the rtf format, on windows wordpad is decent but lacks a line number
status or any way to jump to a specific line, which could be annoying.

there may exist other rtf editors though that have such features though
(actually, I think ms word has both a page number and line number, but I am
not sure, and the fragmentation into pages is not that helpful either...).

similarly, many other kinds of editors lack a line number status as well...
one may get used to coding without it though, but I find it quite helpful...

Marco van de Voort

unread,

Aug 1, 2004, 12:35:45 PM8/1/04

to

On 2004-07-31, James Harris <> wrote:
>
> "Marco van de Voort" <mar...@stack.nl> wrote in message
> news:slrncfuugl....@toad.stack.nl...
>
>>
>> I wouldn't like it. But I don't like any significantly meaning placed on
>> whitespace. Call me old fashioned :-)
>
> I already replied but I've been wondering about this statement. As a programmer
> for many years I've agreed with this but I'm finding my views changing.
>
> When programming we use various items of punctuation to separate elements in the
> code but we don't expect users to use the punctuation-laden syntax when invoking
> our code from the command line. They use whitespace. Compare these fictitious
> statements,
>
> write ("Hello", username, "\n");
>
> write "Hello" username "\n"
>
> and the second - which could be the syntax used when invoking the "write"
> program - doesn't need the parens or the comma. This syntax DOES presume
> grouping of the command with its parameters which is not necessary in this
> example as there is nothing with which to group.

Yes but the goal of programming is to make programs fast and speedily. Not
to minimize the number of characters typed, since that is not limiting.

> would be replaced by
>
> range = (max A B C D E) - (min A B C D E)
>
> which provides the grouping required. How does that look?

Slightly better, depending on taste. But usually the reason for changing
language features is not the cases where something looks better, but
avoiding of unnecessary errors.

Now, your spacing "proposal" is not as bad as e.g. the block indentation of
Python. However the gains are IMHO near zero.

Marco van de Voort

unread,

Aug 1, 2004, 12:37:39 PM8/1/04

to

On 2004-07-31, Marcin 'Qrczak' Kowalczyk <qrc...@knm.org.pl> wrote:
> On Sat, 31 Jul 2004 13:52:34 +0100, James Harris wrote:
>
>> When programming we use various items of punctuation to separate elements in the
>> code but we don't expect users to use the punctuation-laden syntax when invoking
>> our code from the command line. They use whitespace. Compare these fictitious
>> statements,
>>
>> write ("Hello", username, "\n");
>>
>> write "Hello" username "\n"
>
> It's not that fictitious. In my language Kogut you write
>
> Write "Hello " username "\n";

Why do you keep such archaic, strained and mangled notation for special characters in
Kogut? Why not something like

write "Hello " username #linefeed

or something?

Marcin 'Qrczak' Kowalczyk

unread,

Aug 1, 2004, 1:53:41 PM8/1/04

to

On Sun, 01 Aug 2004 16:37:39 +0000, Marco van de Voort wrote:

>> Write "Hello " username "\n";
>
> Why do you keep such archaic, strained and mangled notation for special characters in
> Kogut? Why not something like
>
> write "Hello " username #linefeed
>
> or something?

Why would it be better?

The C-like notation is compact and well known, used by many languages.

Well, I don't follow it exactly because C rules for specifying characters
by number are poorly designed, they don't extend to Unicode. So here you
write e.g. \xFFFD; or \255; or \o177; - with the semicolon.

You don't use it often explicitly anyway. As I mentioned, the preferred
way to write a newline is to use WriteLine function, and ReadLine() strips
the "\n".

James Harris

unread,

Aug 1, 2004, 4:43:47 PM8/1/04

to

"Arthur J. O'Dwyer" <a...@nospam.andrew.cmu.edu> wrote in message
news:Pine.LNX.4.60-041....@unix40.andrew.cmu.edu...

> > I am thinking to use semicolon
> > as a statement separator so as to allow multiple statements on a line
> > such as,
> >
> > sum = a + b; diff = a - b
> >
> > but not require them at the ends of lines as I think it looks neater
and
> > that they are unneccessary. Your language strucure is similar. Are
there
> > strong reasons as to why you require semicolons to terminate
statements?
>

> I foresee your language's Classic Newbie Mistake being
>
> if (a > b) then sum = a + b; diff = a - b
>
> when the programmer meant either
>
> if (a > b) then

> sum = a + b
> diff = a - b
>

> or
>
> if (a > b) then

> sum = a + b
> diff = a - b
>

> (whichever /doesn't/ have the same semantics as the newbie's code).
> Don't mix end-of-statement semicolons and significant newlines; it's
> just asking for confusion.
>
> (I am assuming your language will have an imperative-style 'if'
> construct, or at least something with similar syntax. If not, then
> I'll retract this complaint until I come up with a better one. :)
>
> my $.02,
> -Arthur

Your comments - complaints and all - are welcome. It is not a point which
is essential but I was thinking of the former semantics, i.e.,

if (a > b) sum=a + b; diff = a - b

meaning carry out both assignments if a is greater than b. (I don't have a
"then" clause as conditions must be enclosed in parentheses and thus the
end of the condition is clearly seen.)

What was your comment about this? Is it just the potential for confusion?
If the semantics were clear and unambiguous would you still have an
objection to this construct? If so can you explain to me what you see as a
problem? To be clear I'll list the two options I have in mind (ei is
elseif),

If-statement syntax Option 1,
if (<condition>)
action
ei (<condition>)
action
else
action
endif

Option 2,
if (<condition>) action
ei (<condition>) action
else action

i.e. if the action is on the same line as the if statement no endif is
required. For Option 2 the action can be a compound statement so,

if (<condition>) {
action
}
ei (<condion>) {
action
}
else {
action
}

is valid - actions effectively begin on the same line as the if statement
and so do not require endif.

--
TIA,
James

James Harris

unread,

Aug 1, 2004, 5:03:45 PM8/1/04

to

"Marco van de Voort" <mar...@stack.nl> wrote in message

news:slrncgq6v1....@toad.stack.nl...
<snip>

> Yes but the goal of programming is to make programs fast and speedily.
Not
> to minimize the number of characters typed, since that is not limiting.

I'm not sure if you mean to make the programs execute quickly or to be
written quickly I agree that we are not trying to reduce typing per se but
it is good if a program source clearly expresses the intention of the
algorithm. I am suggesting two expression options - see below.

>
> Slightly better, depending on taste. But usually the reason for changing
> language features is not the cases where something looks better, but
> avoiding of unnecessary errors.
>
> Now, your spacing "proposal" is not as bad as e.g. the block indentation
of
> Python. However the gains are IMHO near zero.

I'm not suggesting anything new. I /do/ want to give the programmer
options. Frankly, I'm not proscriptive about the syntax. The semantics are
what I am driving at and I wouldn't be put out if someone came up with
another way to express the meaning. The semantics, though, are another
topic. Suffice to say that I forsee two imperative syntax options in the
language,

Option 1 - assembler variant
opcode operand operand operand etc

Option 2 - algebra variant
result = operand opcode operand

For large parts of what I primarily intend to write the former will be more
regular. For higher level parts Option 2 will be more natural. Option 1 is
more flexible and can, I think, be applied in all cases. I think then than
Option 2 can be reformed to Option 1. In other words Option 2 is syntactic
sugar. In keeping with my desire to give the programmer flexibility both
options will be usable by the programmer - in the same code if desired.

James Harris

unread,

Aug 1, 2004, 5:10:50 PM8/1/04

to

"Marcin 'Qrczak' Kowalczyk" <qrc...@knm.org.pl> wrote in message

news:pan.2004.07.31...@knm.org.pl...

I have to regretfully say that I agree with you. The lack of punctuation
does make it hard to detect a multiline context. And, yes, I was thinking
of using the backslash (\) as a line continuation becuase of its
familiarity to C etc programmers - though my interpretation would not be
exactly the same.

There must be another way round this. (You can guess I'm tenacious about
this point!) I'll have to come back to this when I've written some more
code in the language.

Thanks for your explanation.

--
Cheers,
James

Richard Harter

unread,

Aug 1, 2004, 5:33:26 PM8/1/04

to

On Sun, 1 Aug 2004 22:10:50 +0100, "James Harris" <no.email.please>
wrote:

An alternative that I am using in San is to have block statments,
e.g.,

begin statement
// long winded statement crossing many lines
end statement

Slice and dice to meet your language's syntax.

Richard Harter, c...@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com
Whoever said money can't buy happiness
didn't know where to shop.

James Harris

unread,

Aug 1, 2004, 5:54:23 PM8/1/04

to

"Richard Harter" <c...@tiac.net> wrote in message
news:410d60ec....@news.sbtc.net...

> An alternative that I am using in San is to have block statments,
> e.g.,
>
> begin statement
> // long winded statement crossing many lines
> end statement
>
> Slice and dice to meet your language's syntax.

Do you use 'begin' and 'end' to bracket a /single/ statement or a set of
statements?

As an example, say there is a complex windowing call that takes 12
parameters (remembering, in Marcin's language and my current plan
parameters are NOT separated by commas). The call could be,

ComplexWindowCall Parameter1 Parameter2 Parameter3 Parameter4
Parameter5 Parameter6 Parameter7 Parameter8
Parameter9 Parameter10 Parameter11 Parameter12

Does San, then, use 'begin' and 'end' to group these three lines into one
statement? If so how do you indicate a compound statment?

I could maybe express it this way, using parentheses as 'begin' and 'end'
on a single statement,

(ComplexWindowCall Parameter1 Parameter2 Parameter3 Parameter4
Parameter5 Parameter6 Parameter7 Parameter8
Parameter9 Parameter10 Parameter11 Parameter12)

or

(ComplexWindowCall
Parameter1 Parameter2 Parameter3 Parameter4
Parameter5 Parameter6 Parameter7 Parameter8
Parameter9 Parameter10 Parameter11 Parameter12)

without a line-continuation backslash or a statement-terminating semicolon
in sight. Hmm.....

Richard Harter

unread,

Aug 2, 2004, 12:54:53 AM8/2/04

to

On Sun, 1 Aug 2004 22:54:23 +0100, "James Harris" <no.email.please>
wrote:

>"Richard Harter" <c...@tiac.net> wrote in message

>news:410d60ec....@news.sbtc.net...
>
>> An alternative that I am using in San is to have block statments,
>> e.g.,
>>
>> begin statement
>> // long winded statement crossing many lines
>> end statement

Mea culpa. In the above "begin", "end", and "statement" are all
keywords. It didn't occur to me until too late that the text was
ambiguous.

>>
>> Slice and dice to meet your language's syntax.
>
>Do you use 'begin' and 'end' to bracket a /single/ statement or a set of
>statements?

All blocks are initiated with 'begin' and terminated with 'end'.
Thus, for example,

begin while i lt? n
begin init
prod = x
i = 2
end
prod = prod * x
end

>
>As an example, say there is a complex windowing call that takes 12
>parameters (remembering, in Marcin's language and my current plan
>parameters are NOT separated by commas). The call could be,
>
>ComplexWindowCall Parameter1 Parameter2 Parameter3 Parameter4
>Parameter5 Parameter6 Parameter7 Parameter8
>Parameter9 Parameter10 Parameter11 Parameter12

begin statement

ComplexWindowCall Parameter1 Parameter2 Parameter3 Parameter4
Parameter5 Parameter6 Parameter7 Parameter8
Parameter9 Parameter10 Parameter11 Parameter12

end

>
>Does San, then, use 'begin' and 'end' to group these three lines into one
>statement? If so how do you indicate a compound statment?

San uses 'begin statement' to group lines into one statement. Remove
'statement' from 'begin statement' and it becomes three statements.

>
>I could maybe express it this way, using parentheses as 'begin' and 'end'
>on a single statement,
>
>(ComplexWindowCall Parameter1 Parameter2 Parameter3 Parameter4
>Parameter5 Parameter6 Parameter7 Parameter8
>Parameter9 Parameter10 Parameter11 Parameter12)
>
>or
>
>(ComplexWindowCall
>Parameter1 Parameter2 Parameter3 Parameter4
>Parameter5 Parameter6 Parameter7 Parameter8
>Parameter9 Parameter10 Parameter11 Parameter12)
>
>without a line-continuation backslash or a statement-terminating semicolon
>in sight. Hmm.....

Something like that should work. In most languages surrounding
multi-line statements by parentheses works; it wouldn't in lisp
though.

Marcin 'Qrczak' Kowalczyk

unread,

Aug 2, 2004, 2:03:54 AM8/2/04

to

On Mon, 02 Aug 2004 04:54:53 +0000, Richard Harter wrote:

> begin while i lt? n
> begin init
> prod = x
> i = 2
> end
> prod = prod * x
> end

Why do you indent "end" like the contents of the block rather than like
the corresponding "begin"? This is against a widely accepted convention,
and unreadable for me. This would be better:

| begin while i lt? n
| begin init
| prod = x
| i = 2
| end
| prod = prod * x
| end

Not to mention that this notation is a bit too heavy, and that I would
prefer "<" to "lt?" as almost all languages use.

Dr A. N. Walker

unread,

Aug 2, 2004, 8:17:50 AM8/2/04

to

In article <lhp+news-8C089B...@news.tele.dk>,
Lasse Hiller^xe Petersen <lhp+...@toft-hp.dk> wrote:
> [...] This is one thing I'd really love

>support for in Algol68g, some way to write programs using a special
>stropping instead of uppercase, and a way to prettyprint and edit on an
>xterm using boldface for modes and keywords.

I suppose I'm slightly curious as to why you think this is
something A68G/Marcel "ought" to do? It is, for example, easy to
write a program [in Algol or not, as you choose] that will take a
program text [ditto, I suppose] as input and produce as output a
representation of that text as PostScript or as TeX or Troff [or
whatever] in which upper case has been lowered and boldened and
other letters have been italicised. And a program that would take
a "reserved word" stropped program text and turn it into a prefix
dot stropped text or whatever.

Personally, I would much prefer Marcel to use his talents
to improve the compiler [eg to implement as many as possible of the
proposed extensions to Algol so that we can try them in practice],
which the rest of us can't do without investing heavily in first
studying his code. Pretty-printing, IDEs and other editing ideas
are stand-alone things that "anyone" can do. Likewise producing
useful libraries.

--
Andy Walker, School of MathSci., Univ. of Nott'm, UK.
a...@maths.nott.ac.uk

MJSR

unread,

Aug 2, 2004, 10:55:26 AM8/2/04

to

In message news:<pan.2004.07.31...@knm.org.pl>,

Marcin 'Qrczak' Kowalczyk <qrc...@knm.org.pl> wrote:
> On Sat, 31 Jul 2004 13:52:34 +0100, James Harris wrote:
>
> > When programming we use various items of punctuation to separate elements in the
> > code but we don't expect users to use the punctuation-laden syntax when invoking
> > our code from the command line. They use whitespace. Compare these fictitious
> > statements,
> >
> > write ("Hello", username, "\n");
> >
> > write "Hello" username "\n"
>
> It's not that fictitious. In my language Kogut you write
>
> Write "Hello " username "\n";
>

> or better
>
> WriteLine "Hello " username;
>
> where the semicolon is needed if this is statement is followed by other
> statements.

I am a little bothered by examples like:
WriteLine "three numbers:" 1 2 3
WriteLine "and an error?" 0 -1 -2
(If I understand the web reference, the latter would be like
(WriteLine "and an error?" 0)-1-2
)
For some purposes, I like the less verbose form; e.g., grammars:
assignment = lhs ':=' rhs ;
versus
assignment = lhs, ':=', rhs ;
where the commas get in the way of understanding the productions.
But I am uncomfortable with the potential consequences of interaction
with "traditional" math expression notations, as above.

--
MJSR

Marcin 'Qrczak' Kowalczyk

unread,

Aug 2, 2004, 11:07:46 AM8/2/04

to

On Mon, 02 Aug 2004 07:55:26 -0700, MJSR wrote:

> I am a little bothered by examples like:
> WriteLine "three numbers:" 1 2 3
> WriteLine "and an error?" 0 -1 -2
> (If I understand the web reference, the latter would be like
> (WriteLine "and an error?" 0)-1-2
> )

Yes. Operator applications as arguments of named functions must be
parenthesized:

WriteLine "three numbers:" 0 (-1) (i + 1);

(Also this doesn't print spaces between numbers.)

> But I am uncomfortable with the potential consequences of interaction
> with "traditional" math expression notations, as above.

I don't have an answer for that. An operator and a named function will
require *some* parentheses, one way or the other, to disambiguate two
possible meanings.

f (x, y) used to be allowed with the same meaning as f x y, but no longer.
Now comma has its own meaning (makes a pair).

Lasse Hillerøe Petersen

unread,

Aug 2, 2004, 11:52:17 AM8/2/04

to

In article <celbde$688$1...@oyez.ccc.nottingham.ac.uk>,

a...@maths.nott.ac.uk (Dr A. N. Walker) wrote:

> I suppose I'm slightly curious as to why you think this is
> something A68G/Marcel "ought" to do?

I don't! I just couldn't find stropping info in the docs at first, and
my first attempt of experimenting, using 'begin' instead of 'BEGIN'
failed. Without quote- or dot-stropping it requires some work to leave
uppercase within strings alone I suppose, but with stropping it is
indeed a trivial matter (a one-liner, given a sufficiently wide terminal
window):
HTML:
(echo "<pre>" ; perl -pe 's/</</g;
s#'\''([A-Z]+)'\''#"<b>".lc($1)."</b>"#ge' ; echo "</pre>") <foo.a68
>foo_a68.html
ANSI:
bold=`tput 'md'` ; res=`tput 'me'` ; perl -pe
's#'\''([A-Z]+)'\''#"'$bold'".lc($1)."'$res'"#ge' <foo.a68 >/dev/tty

> Personally, I would much prefer Marcel to use his talents
> to improve the compiler [eg to implement as many as possible of the
> proposed extensions to Algol so that we can try them in practice],

I absolutely agree with you!

-Lasse

Wilhelm B. Kloke

unread,

Aug 2, 2004, 1:16:42 PM8/2/04

to

In article <lhp+news-D6099F...@news.tele.dk>,

Lasse Hillerře Petersen <lhp+...@toft-hp.dk> wrote:

>In article <celbde$688$1...@oyez.ccc.nottingham.ac.uk>,
> a...@maths.nott.ac.uk (Dr A. N. Walker) wrote:
>
>> I suppose I'm slightly curious as to why you think this is
>> something A68G/Marcel "ought" to do?
>
>I don't! I just couldn't find stropping info in the docs at first, and
>my first attempt of experimenting, using 'begin' instead of 'BEGIN'
>failed. Without quote- or dot-stropping it requires some work to leave
>uppercase within strings alone I suppose, but with stropping it is
>indeed a trivial matter (a one-liner, given a sufficiently wide terminal
>window):
>HTML:
>(echo "<pre>" ; perl -pe 's/</</g;
>s#'\''([A-Z]+)'\''#"<b>".lc($1)."</b>"#ge' ; echo "</pre>") <foo.a68
>>foo_a68.html
>ANSI:
>bold=`tput 'md'` ; res=`tput 'me'` ; perl -pe
>'s#'\''([A-Z]+)'\''#"'$bold'".lc($1)."'$res'"#ge' <foo.a68 >/dev/tty

In case, someone really cares, I could tidy up my TeX macros (and
Algol fonts) to do the pretty printing. Together with the indent68
program, this provides a pretty printer in the traditional Algol
style (as used in the Algol60 and Algol68 reports): bold keywords and
italic identifiers etc. The main restriction is that only brief
comment style is easily supported; no other changes to correct programs are
needed; further goodie: translation to visually marked blank characters
in strings.

Marcel van der Veer

unread,

Aug 2, 2004, 4:40:35 PM8/2/04

to

"Wilhelm B. Kloke" wrote:

> It should be easy to add a new stropping regime for Algol68 (or your
> favourite other language) to allow
> for constructs like "{\b while}" (TeX) or <bold>while<\bold> (HTML) or
> your other favourite RTF (rich text format) to represent the
> keyword WHILE. In this case pretty printing is easy: Just use TeX/
> your browser/wordprocessor. The textual representation seams clumsy, but
> this may be facilitated by the use of edito/browser features.

It should indeed be easy to implement RTF/HTML-stropping next to
bold/quote-stropping in a68g. On the other hand there is (IIRC) the
syntax-colouring facility from vim which could serve as an intermediate
solution.

--
Marcel van der Veer

"Algol 68 Genie - An Algol 68 subset interpreter" is at
http://www.xs4all.nl/~jmvdveer/algol.html

James Harris

unread,

Aug 2, 2004, 2:39:30 PM8/2/04

to

"Richard Harter" <c...@tiac.net> wrote in message

news:410dc29...@news.sbtc.net...

> San uses 'begin statemen t'togrouplinesintoonestatement.Remove
> 'statement' from 'begin state ment'anditbecomesthreestatements.

OK. That's novel - and I can see it would work. The indented 'end' is OK
too! I could live with that.

--
Cheers,
James

Wilhelm B. Kloke

unread,

Aug 2, 2004, 4:25:35 PM8/2/04

to

In article <410EA6C3...@xs4all.nl>,
Marcel van der Veer <jmvd...@xs4all.nl> wrote:

>"Wilhelm B. Kloke" wrote:
>
>> keyword WHILE. In this case pretty printing is easy: Just use TeX/
>> your browser/wordprocessor. The textual representation seams clumsy, but
>> this may be facilitated by the use of edito/browser features.
>
>It should indeed be easy to implement RTF/HTML-stropping next to
>bold/quote-stropping in a68g. On the other hand there is (IIRC) the
>syntax-colouring facility from vim which could serve as an intermediate
>solution.

The difference being: My proposal is essentially programming language
independant (but browser/editor dependant). The vim solution needs a language
dependant plug-in for vim.

Of course, this is not really an argument in favour of my proposal, It were,
if there were a really universal rich text format available. I can't
count RTF as such.

Message has been deleted

Richard Harter

unread,

Aug 2, 2004, 6:07:46 PM8/2/04

to

On Mon, 02 Aug 2004 08:03:54 +0200, Marcin 'Qrczak' Kowalczyk
<qrc...@knm.org.pl> wrote:

>On Mon, 02 Aug 2004 04:54:53 +0000, Richard Harter wrote:
>
>> begin while i lt? n
>> begin init
>> prod = x
>> i = 2
>> end
>> prod = prod * x
>> end
>
>Why do you indent "end" like the contents of the block rather than like
>the corresponding "begin"? This is against a widely accepted convention,
>and unreadable for me. This would be better:

Er, ah, you find a minor variation in code layout makes it unreadable
for you? How strange. How very strange considering the wide
variation in layout and syntax that occur in programming languages and
in practice.

As it happens it is my opinion that the "widely accepted convention"
is "the wrong thing to do", although this depends on what the block
delimiters actually signify. In my preferred style each line at a
given level of indentation represents an executable statement or the
"descriptive title" of a block. The "end" is part of the block;
placing it at the same indentation level as the "begin" clutters the
code. If you were creating an outline you wouldn't do something like

1.0 introduction
blah blah blah
end 1.0

would you? Maybe you would, and if you would I applaud you for your
unconventionality, but it's not a standard format. However all of
this turns on subtle questions of what block delimiters actually
signify, and whether we are dealing with an expression based or
statement based language. C, for example, is expression based.
People write things like

if (foo)
(
big_humungous_block_of_code
}

What is going on here is that a block containing many statements is a
component of an expression. Since "if (foo)" can be followed by
either a statement or a block the format used should (not "must" but
"should") distinguish between whether the action is a statement or a
block.

Much the same trick is used in some statement based languages. IIRC
PL/I is statement based in that "begin" and "end" are actual
statements that must be terminated with semicolons, e.g.,

if (foo) begin;
big_humungous_block_of_code
end;
or

if (foo)
begin;
big_humungous_block_of_code
end;

although my recollection may be at fault; corrections accepted.

San, however, is a bit more radical, although the technique should
commend itself to those who do lisp like languages. In San one would
write

begin if (foo)
big_humungous_block_of_code
end

(but see below). The key point is that the block is not part of a
statement; instead the initial block delimiter, the "begin" is
qualified with the conditions under which it will be executed. In C
both statements and blocks can contain both statements and blocks
whereas in San statements cannot contain blocks, which is to say than
in San the hierarchy implied by the indentation is respected by the
language whereas in C it is not (and C is representative in this
regard.)

>| begin while i lt? n
>| begin init
>| prod = x
>| i = 2
>| end
>| prod = prod * x
>| end

Now suppose that we want to look at the execution within a given level
of indentation. Under your preferred scheme we would see

begin init

end
prod = prod * x

What purpose does that end (and there will be one for every 'begin')
serve. Why, none whatsoever. It is just a bit of stuttering. It is
even worse with C et al. What we get is:

if (foo)
{
}

The merits of this I will leave to such as might appreciate them.
Under the scheme I prefer we would have:

begin init

prod = prod * x
end

One might well say, "but you still have an 'end', you've just moved
it." However there is only one 'end', not one for each sub-block, and
it occurs where one would expect it, at the end.

Be all of that as it may, I am aware that different people have
different preferred styles. One of the features of the language are
style statements that specify such things, e.g., placement of 'end'
and indentation style. The thought is that you may use whatever style
you like, but you must specify the style, and having specified it, you
must stick to it. The choices are such that one style can be
mechanically transformed into another.

>
>Not to mention that this notation is a bit too heavy, and that I would
>prefer "<" to "lt?" as almost all languages use.

I sympathize. However it seems that you have no trouble understanding
what 'lt?' means, heavy or not, and San has other uses for <>. That,
however, is a topic for another time.

Marcin 'Qrczak' Kowalczyk

unread,

Aug 2, 2004, 6:44:09 PM8/2/04

to

On Mon, 02 Aug 2004 17:06:38 -0400, Arthur J. O'Dwyer wrote:

> I don't remember which semantics were used by MS BASIC (the BASICA that
> shipped with IBM clones back in the day); did
>
> IF A>B THEN SUM=A+B:DIFF=A-B
>
> mean to do both assignments conditionally, or not? (I'll check
> tomorrow and let you know.)

Commodore 64 Basic meant to do both assignments conditionally.

In Kogut { } around 'if' branches is mandatory. { } is never a no-op,
ordinarily it makes a 0-parameter function, so 'if' looks more like as
if it were an ordinary function, where { } emphasize that the code is
executed conditionally.

Most syntactic constructs which execute something conditionally use { }.
Since multiple statements and local definitions in a subexpression are
useful only if they are executed conditionally (otherwise they could be
moved before the expression they occur in), it follows that sequences of
definitions and statements separated by semicolons occur almost always
inside { } or at the top level of a module, so they don't need another
grouping syntax. In very rare cases they occur in other positions, e.g.
after & or |, and then they are wrapped in ordinary ( ).

Marcin 'Qrczak' Kowalczyk

unread,

Aug 2, 2004, 7:16:43 PM8/2/04

to

On Mon, 02 Aug 2004 22:07:46 +0000, Richard Harter wrote:

>>Why do you indent "end" like the contents of the block rather than like
>>the corresponding "begin"? This is against a widely accepted convention,
>>and unreadable for me. This would be better:
>
> Er, ah, you find a minor variation in code layout makes it unreadable
> for you?

Ok, "hard to read".

> In my preferred style each line at a given level of indentation
> represents an executable statement or the "descriptive title" of a block.

The "end" is neither of these.

> If you were creating an outline you wouldn't do something like
>
> 1.0 introduction
> blah blah blah
> end 1.0
>
> would you?

I would use a horizontal line at the end, if anything.

> The key point is that the block is not part of a statement; instead the
> initial block delimiter, the "begin" is qualified with the conditions
> under which it will be executed.

It doesn't justify or explain indenting "end" with the contents of the
block.

> In C both statements and blocks can contain both statements and blocks
> whereas in San statements cannot contain blocks,

I don't understand.

> which is to say than in San the hierarchy implied by the indentation
> is respected by the language whereas in C it is not (and C is
> representative in this regard.)

Do you mean that in C the indentation has no meaning for the compiler?

Because in practice it obviously *is* consistent with the meaning -
it's only not enforced.

I have nothing against significant indentation per se; Haskell has a very
nice syntax, and Python is not bad too. But it has its costs, because of
subtle interactions with other syntactic assumptions. The most important
cost is that it forces many constructs to be builtin syntax instead of
being functions or macros with a generic call syntax, otherwise it's too
hard to make them play well with indentation.

> Under the scheme I prefer we would have:
>
> begin init
> prod = prod * x
> end
>
> One might well say, "but you still have an 'end', you've just moved
> it." However there is only one 'end', not one for each sub-block, and
> it occurs where one would expect it, at the end.

It doesn't occur where I expect it, because I expect a leaf block to not
include its own end marker. I expect the last indented line to be the last
statement or expression. And I expect the line with "begin" to be indented
by the same amount as the line with "end", so the *contents* of the block
is emphasized by indentation - not contents plus half of the boundary.

Besides, if you make indentation significant, why to have a separate end
marker? Or, to put differently, if you have an explicit end marker, why to
make indentation significant?

Message has been deleted

Marcin 'Qrczak' Kowalczyk

unread,

Aug 3, 2004, 2:31:37 AM8/3/04

to

On Tue, 03 Aug 2004 05:43:39 +0000, Andrew Nicholson wrote:

> Using the semicolon to terminate the numbers would allow you to
> have variable length literals but I think you'll find most people
> would prefer to know the number of bytes it will expand to.
>
> when I use "\257;" did I really want 2 bytes or did I
> mistype "\247;" which would be 1 byte.

It doesn't expand to bytes. It expands to a character (strictly speaking
a code point).

> Also remember that the Unicode standard specifies the byte order of
> the 16 bits (MSB first).

Again, this is a matter of encoding characters in a byte stream, not of
charactes themselves. If they are later encoded as UTF-8 or translated
to some encoding like ISO-8859-2, there are no byte order issues.

Semantically a Kogut string is a sequence of code points, isomorphic
to integers from the range 0..0x10FFFF. I don't see a need of rejecting
non-existant code point at this stage; they are filtered at I/O by
particular encoding schemes.

Internally in programs compiled by my compiler a string is represented in
ISO-8859-1 if all characters are between U+0000 and U+00FF, and in UTF-32
otherwise, but this is an implementation detail - from the user's point of
view they are code points up to U+10FFFF. Mutable and resizable character
arrays are represented purely in UTF-32.

There is no separate character data type, characters are represented by
strings of size 1.

* * *

I'm considering stating that UTF-16 as a *user-visible* representation is
valid too, in case someone will implement Kogut on .NET or JVM and prefers
to use their native strings directly, and doesn't want to make UTF-16
appear as UTF-32 on the fly, which would require scanning the string on
indexing operations and would turn O(1) operations into O(N).

While presenting the user with UTF-16 is wrong from the purity point of
view - UTF-16 is a variable-width encoding, so it's poorly suited for
internal string processing - this might be better pragmatically for
interop reasons.

In such language variant Size "\x100000;" == 2, and by indexing the string
character by character you get UTF-16 code units. A program would have to
take this into account if it e.g. examined character properties of successive
characters of a string, or it will break for code points above U+FFFF.
It's not my fault that they used UTF-16.

Marcel van der Veer

unread,

Aug 3, 2004, 4:19:18 PM8/3/04

to

"Wilhelm B. Kloke" wrote:

> ..., It were,

> if there were a really universal rich text format available. I can't
> count RTF as such.

Agreed. Also, it seems that RTF is not fully standardised between applications.

James Harris

unread,

Aug 7, 2004, 7:27:07 AM8/7/04

to

"Arthur J. O'Dwyer" <a...@nospam.andrew.cmu.edu> wrote in message

news:Pine.LNX.4.60-041....@unix48.andrew.cmu.edu...

<snip>
> Then you ought to be aware that C, C++, and Java use the exact same
> syntax (modulo a required semicolon which is either optional or
> prohibited in your language, I'm not sure which), but with the exact
> opposite semantics. That is, in C-oids,
>
> if (a > b) sum=a + b; diff = a - b;
>
> carries out only the /first/ assignment conditionally, and the second
> assignment always.

I think this in itself makes a point about the use of C semicolons: they
can be used or abused. On the other hand perhaps it makes a more
substantial point about programmers, i.e. since programmers don't code that
way as a rule we must give them credit for producing readable code - if
only they have the tools available to express their thoughts in a readable
and logical style.

<snip>
> I would suggest the use of colon : as in BASIC, rather
> than the easily-confused-with-C-oids semicolon, but that's just a
> humble suggestion. Depends on your audience.

It's possible but then it might look like Basic. :-(

I do have an issue in that if I use semicolon as a statement separator I am
really using it to tie statements together rather than separate them. In
other words, using the construct above,

a=1; b=2; c=3; d=4

is equivalent to the C

{ a=1; b=2; c=3; d=4; }

so that if flow of control gets to a=1 the rest of the assignments will be
executed also. The issue arises with how to express the start, condition
and end of a for-type loop. In C a semicolon is used to segregate the
components. Hmm. Could this multiple assignment be expressed in C as

a=1, b=2, c=3, d=4;

<snip>
> Why not have "elseif" be "elseif"? Wouldn't that make for a shorter
> user's manual? You could remove the sentence explaining that "ei" was
> "elseif". :)

That's another question with a two-part answer. Would you accept one part
for now: that it clearly distinguishes between "elseif" and "else" followed
by another "if" statement?

<snip>
> So in your language, you are thinking of having
>
> if (a > b) {

> sum = a + b;
> }
>

> and
>
> if (a > b)
> {

> sum = a + b;
> }
>

> do different things?! Or would the latter be a syntax error?

Er, the intention is to express block-components either as

or,

I am thinking of allowing braces or parens to group statements. If I do
then the first of your examples would be OK apart from the trailing
semicolon on the assignment. I think the latter - if permissible - would
expect an endif - and also no semicolon.

> (Syntax errors are good. Any compile-time feedback is good. The
> problem with my "newbie mistake" code is that it has two perfectly
> reasonable meanings, one of them is wrong, /and/ there's no feedback to
> say "hey, this is ambiguous and/or just plain wrong."

Fully agreed. I'll have to watch this as I intend to allow options. I have
in mind a principle, right or wrong, that the best languages are designed
for self-use, rather than thinking of features to add for someone else to
use.

> It would be a cool
> feature if you simply disallowed ambiguous-looking constructs and had the
> compile-time system suggest one or two alternative phrasings. For
> example: "Semicolon after a control structure is ambiguous. Did
> you mean 'if (a < b) { sum=a+b; diff=a - b }'?" Just an idea.)

Not sure about this. It is a bit ahead of where I am. Perhaps when I get to
that point there will be enough context to help choose which to do.

> I dunno... it looks pitfallish to me. I think my aversion to it is
> due mainly to the mixture of BASIC-ish and C-ish syntax rules. I
> think you ought to pick /one/ of those languages to "look like," if
> you must look like something. Your language, as described, sounds
> to me like it's going to confuse both C programmers and BASIC
> programmers. :(

I think you are right that there is an element of influence from both. I
recognise the need for the code of others to be readable but am disinclined
to be too prescriptive in how this is laid out. C (which I think gets more
things right than any other language I know) frustrates me in that a) I
cannot define a new pointer type, a longer integer type etc, and b) I
cannot change the way procedure calls are expressed.

In the language I have so far I have the option of assembler syntax, and
the option of high level language syntax. I don't expect programmers to mix
the two per se in a single routine but I'd like to allow the two for where
they are appropriate. I also want to be able as a programmer to define new
instructions and new data types that can be used in exactly the same way as
any of the built-in versions.

Final part of the philosophy - which leads on from the last paragraph - is
to get to something that works and then strip out as much as possible -
perhaps common data structures such as arrays and maybe even integers!! -
from the language core. These stripped out parts would be provided as
libraries. The challenge here is to make these data structures as fast as
they would be if they were intrinsic.

> Good luck, though! I'll keep the complaints coming! ;)
>
> -Arthur

Appreciate your comments.

--
Cheers,
James

Richard Harter

unread,

Aug 9, 2004, 4:10:48 PM8/9/04

to

On Tue, 03 Aug 2004 01:16:43 +0200, Marcin 'Qrczak' Kowalczyk
<qrc...@knm.org.pl> wrote:

>On Mon, 02 Aug 2004 22:07:46 +0000, Richard Harter wrote:
>
>>>Why do you indent "end" like the contents of the block rather than like
>>>the corresponding "begin"? This is against a widely accepted convention,
>>>and unreadable for me. This would be better:
>>
>> Er, ah, you find a minor variation in code layout makes it unreadable
>> for you?
>
>Ok, "hard to read".

Good. We are making progress. You haven't given any good reason for
it being hard to read, other than it is something to which you are not
accustomed. Mind you, that can be sufficient reason.

>
>> In my preferred style each line at a given level of indentation
>> represents an executable statement or the "descriptive title" of a block.
>
>The "end" is neither of these.

This may or may not be true, depending upon the interpretation chosen.
The "begin" and "end" can be thought of as markers, and not statements
at all. On the other hand they can be thought of as executable
statements. In particular, the "end" says "goto the continuation of
the block" where the continuation is determined by the "begin"
statement, which in turn says, "under such and such conditions execute
the following block, else goto the next statement after the block".

[snip]

>> The key point is that the block is not part of a statement; instead the
>> initial block delimiter, the "begin" is qualified with the conditions
>> under which it will be executed.
>
>It doesn't justify or explain indenting "end" with the contents of the
>block.

But it does - it changes the semantics. The "end" is the last
statement of the block, stating that the block is done.

>
>> In C both statements and blocks can contain both statements and blocks
>> whereas in San statements cannot contain blocks,
>
>I don't understand.

Evidently. Let me try again:

This is a statement in C:

if (foo)
{
blah, blah, blah
}

Inside this statement is a container called a block that contains

other statements. In San I would write:

begin if (foo)
blah, blah, blah
end

(You can indent the "end" to the same level as the "begin" if you
like; just specify that you are using that style.)

The block contains statements; but there is no statement containing a
block.

"Contains" here is used in the same sense as when we say that a book
contains chapters, chapters contain paragraphs, and paragraphs contain
sentences.

>
>> which is to say than in San the hierarchy implied by the indentation
>> is respected by the language whereas in C it is not (and C is
>> representative in this regard.)
>
>Do you mean that in C the indentation has no meaning for the compiler?

I didn't mean any such thing at all. See above.
[snip]

>> Under the scheme I prefer we would have:
>>
>> begin init
>> prod = prod * x
>> end
>>
>> One might well say, "but you still have an 'end', you've just moved
>> it." However there is only one 'end', not one for each sub-block, and
>> it occurs where one would expect it, at the end.
>
>It doesn't occur where I expect it, because I expect a leaf block to not
>include its own end marker. I expect the last indented line to be the last
>statement or expression. And I expect the line with "begin" to be indented
>by the same amount as the line with "end", so the *contents* of the block
>is emphasized by indentation - not contents plus half of the boundary.

Well, yes, I understand that it isn't where you expect it; you've
reiterated your desires. One can (and I do) take the view that leaf
terminator is part of the leaf. The consideration that you avoid
considering is that putting the begin/end statements/markers at the
outer level adds clutter. Consider the following:

flow_control_struct_1
{
body_1
}
flow_control_struct_2
{
body_2
}
... More of the same

If we only view the outer level of indentation we have:

flow_control_struct_1
{
}
flow_control_struct_2
{
}

Whereas the equivalent in San would be

begin flow_control_struct_1
body_1
end
begin flow_control_struct_2
body_2
end

When we view the outer level we have:

begin flow_control_struct_1
begin flow_control_struct_2

One third as many statements but containing the same information,
and no clutter.

>
>Besides, if you make indentation significant, why to have a separate end
>marker? Or, to put differently, if you have an explicit end marker, why to
>make indentation significant?

You misapprehend. The requirement isn't for indentation being
significant; it is for the indentation style being consistent and
meaningful. One has choices as to which style one uses within a file;
that's what the style attributes are for. Perhaps you do not see
style and layout consistency as being important; I do. I have seen
far too many messes made by people who insist on using their own pet
formatting in code that they are modifying.

Richard Harter, c...@tiac.net
http://home.tiac.net/~cri, http://www.varinoma.com

Tragedy is living to an old age
and not having any enemies to outlive.

Message has been deleted

Marcin 'Qrczak' Kowalczyk

unread,

Aug 9, 2004, 5:57:49 PM8/9/04

to

On Mon, 09 Aug 2004 20:10:48 +0000, Richard Harter wrote:

> The "begin" and "end" can be thought of as markers, and not statements
> at all. On the other hand they can be thought of as executable
> statements.

Since "end" occurs at the end of every block, and nowhere else, I consider
it a part of the block syntax without a semantics on its own.

> Inside this statement is a container called a block that contains
> other statements. In San I would write:
>
> begin if (foo)
> blah, blah, blah
> end

Why do you prefer it over the traditional style of 'begin' or '{' put
directly before a group of statements?

I don't like that every compound statement starts the same. The first word
of something is the natural primary distinguishing label. But here the
first word is "syntactic noise" and the essence is in the middle of a line.
Not to mention that it's against the tradition of almost all languages.

How do you write an 'if' with an 'else' part?

How do you write an equivalent of Lisp 'cond', i.e. an 'if' with several
conditions tested in order until one of them is true, executing the branch
corresponding to that condition? Languages whose if/then/else syntax ends
with the 'else' statement don't need a separate construct, they just write
'if COND1 then STAT1 else if COND2 then STAT2 else ... else STATN'. But
languages which have some end marker after the 'else' statement do need
something to avoid accumulating end markers at the end, and they usually
have some elseif / elsif / elif keyword.

Does an 'if' construct form an expression? If not, how do you write an
expression-'if'?

> (You can indent the "end" to the same level as the "begin" if you
> like; just specify that you are using that style.)

Can I write the whole 'if' in a single line somehow?

>>> which is to say than in San the hierarchy implied by the indentation
>>> is respected by the language whereas in C it is not (and C is
>>> representative in this regard.)
>>
>>Do you mean that in C the indentation has no meaning for the compiler?
>
> I didn't mean any such thing at all. See above.

I still don't understand what do you mean by this.

Calum Grant

unread,

Aug 9, 2004, 7:27:41 PM8/9/04

to

Here's another idea. [Disclaimer: I'm not saying a /good/ idea...]

Why bother with statement-delimiters at all? Sure, it makes
error-recovery easier, but they are actually not needed. What about a
; to mean end-of-block, not end-of-statement?

So you can quite happily imagine a language that allows

if x>2 print(y) print(z);

Spread out over several lines (the indentation is not part of the syntax
but makes things more readable)

if x>2
print(y)
print(z)
else
print("oops");

Some bright spark will probably point out that this is Algol or
something ;-)

Marcin 'Qrczak' Kowalczyk

unread,

Aug 9, 2004, 7:18:17 PM8/9/04

to

On Mon, 09 Aug 2004 23:27:41 +0000, Calum Grant wrote:

> Why bother with statement-delimiters at all? Sure, it makes
> error-recovery easier, but they are actually not needed. What about a
> ; to mean end-of-block, not end-of-statement?

print(a)
print(b)
if c>0 print(c);
print(d)
print(e)

Technically ';' is paired with 'if', but it doesn't look like paired with
anything, so I'm afraid it would be confusing. It's easy to add too few or
too many semicolons, and the error might not be caught until very far below.

if x>0
print "positive"
else if x==0
print "zero"
else
print "negative";;

The double semicolon needed (or more with more nested ifs) is definitely bad.
It would be avoided with 'elseif'.

MJSR

unread,

Aug 11, 2004, 12:21:11 PM8/11/04

to

In message news:<NLTRc.881$0d6...@newsfe2-gui.ntli.net>,

Calum Grant <inv...@see.sig> wrote:
>
> Here's another idea. [Disclaimer: I'm not saying a /good/ idea...]
>
> Why bother with statement-delimiters at all? Sure, it makes
> error-recovery easier, but they are actually not needed. What about a
> ; to mean end-of-block, not end-of-statement?
>
> So you can quite happily imagine a language that allows
>
> if x>2 print(y) print(z);
>
> Spread out over several lines (the indentation is not part of the syntax
> but makes things more readable)
>
> if x>2
> print(y)
> print(z)
> else
> print("oops");
>
> Some bright spark will probably point out that this is Algol or
> something ;-)

At the risk of being only a dull ember, I think this is COBOL
with semicolon in place of period:
IF X > 2 THEN
DISPLAY Y
DISPLAY Z
ELSE
DISPLAY 'oops'.
The period closes the IF (and any other scopes, I believe, so that
nowadays you would probably prefer to use END-IF); so
IF X > 2 THEN
DISPLAY Y.
DISPLAY Z.
would always display Z no matter what X is, despite the indentation.

--
MJSR

Richard Harter

unread,

Aug 12, 2004, 11:30:12 PM8/12/04

to

On Mon, 09 Aug 2004 23:57:49 +0200, Marcin 'Qrczak' Kowalczyk
<qrc...@knm.org.pl> wrote:

>On Mon, 09 Aug 2004 20:10:48 +0000, Richard Harter wrote:
>
>> The "begin" and "end" can be thought of as markers, and not statements
>> at all. On the other hand they can be thought of as executable
>> statements.
>
>Since "end" occurs at the end of every block, and nowhere else, I consider
>it a part of the block syntax without a semantics on its own.

Well, yes, your opinion is abundantly clear. It remains that one
doesn't have to interpret things the way you wish to interpret them.

>
>> Inside this statement is a container called a block that contains
>> other statements. In San I would write:
>>
>> begin if (foo)
>> blah, blah, blah
>> end
>
>Why do you prefer it over the traditional style of 'begin' or '{' put
>directly before a group of statements?
>
>I don't like that every compound statement starts the same. The first word
>of something is the natural primary distinguishing label. But here the
>first word is "syntactic noise" and the essence is in the middle of a line.
>Not to mention that it's against the tradition of almost all languages.

For the moment clear your mind of this idea that there are compound
statements. In many languages there are compound statements. In San
there are not. There are statements, sequences of statements,
sequences of sequences of statements, etc. Some statements have the
effect of starting a sequence and some of ending it, but they are
still statements. You need not like it and you need not use languages
in which there are no compound statements, but you ought to at least
understand the idea if you are going to discuss it.

You have my sympathy about your plaint about the ubiquitous initial
'begin'. Initially I rebelled myself. However I have been persuaded
that it is the right thing to do. It has the distinct merit of making
it clear that the line in question starts a block.

It remains one can arrange the syntax to suit one's fancy. However
there are practical considerations. In San end of line is a
terminator. When we put the 'begin' after the conditional where do we
put it? Experience says that putting it on the same line as the
conditional is error prone. So you would say, put it on the next
line, e.g.,

if (foo)
begin
blah blah
end

Now we are back where we started from because the 'begin' and its
train are all part of a larger globby mess that has mixed levels of
indentation. There are also the usual confusions involving statements
after the flow control construct, i.e., variations of

if (foo) alpha;
{
beta;
}

which should never happen but never-the-less do.

>
>How do you write an 'if' with an 'else' part?

That is a good question. I am torn being conventional and allowing
elses as a special case, e.g.,

begin if (foo)
blah, blah
end
begin else
gobble, gobble
end

with single statement forms, and being radical and abolishing the
option of an else (using switches/conds) instead, or taking the
then/else road.

>
>How do you write an equivalent of Lisp 'cond', i.e. an 'if' with several
>conditions tested in order until one of them is true, executing the branch
>corresponding to that condition? Languages whose if/then/else syntax ends
>with the 'else' statement don't need a separate construct, they just write
>'if COND1 then STAT1 else if COND2 then STAT2 else ... else STATN'. But
>languages which have some end marker after the 'else' statement do need
>something to avoid accumulating end markers at the end, and they usually
>have some elseif / elsif / elif keyword.

There's no elif but there is a cond equivalent - my current choice is
'switch' but 'cond' might be a better choice. It can take parameters
in the outer line; the choices are cases. Here is an example

begin switch foo bar
case 42: print 42 // 42 is compared with the first arg
begin case %1 eq? %2 // foo is equal to bar
print arguments are equal
print value is %1
end
else: print Hello World
end switch

Upon reflection I rather like cond; it doesn't carry the baggage that
'switch' and 'select' have.

>
>Does an 'if' construct form an expression? If not, how do you write an
>expression-'if'?

I'm not sure what you are counting as an 'if' construct and what you
count as an expression. There are two forms of the 'if', single
statement and conditional block. Thus

if (foo) bagel = dorf
and
begin if (foo)
bagel = dorf
mimsy = borogroves
end

(I will entertain a motion to dispense with the first form entirely.
:-))

Does this answer your question.

>
>> (You can indent the "end" to the same level as the "begin" if you
>> like; just specify that you are using that style.)
>
>Can I write the whole 'if' in a single line somehow?

Only if it is not a block. You can't say

begin if (foo) alpha end

>
>>>> which is to say than in San the hierarchy implied by the indentation
>>>> is respected by the language whereas in C it is not (and C is
>>>> representative in this regard.)
>>>
>>>Do you mean that in C the indentation has no meaning for the compiler?
>>
>> I didn't mean any such thing at all. See above.
>
>I still don't understand what do you mean by this.

I think the confusion lies in the notion of compound statements. In
many languages statements can contain blocks, thus

if (foo) {alpha; beta;}

This is a single statement in C. The only way it can be written using
multiple lines and a single level of indentation is if it is flat,
i.e., if we write it as

if (foo)
{
alpha;
beta;
}

which, of course, is not what we want. When the block contents are
indented we now have a single statement with multiple levels of
indentation. San does not have compound statements. In the
equivalent

begin if (foo)
alpha
beta
end // Just for you

there are four separate statements; each statement has a single level
of indentation.

Richard Harter

unread,

Aug 12, 2004, 11:39:56 PM8/12/04

to

On Mon, 9 Aug 2004 17:03:06 -0400 (EDT), "Arthur J. O'Dwyer"
<a...@nospam.andrew.cmu.edu> wrote:

>
>On Mon, 9 Aug 2004, Richard Harter wrote:

>>
>> Marcin 'Qrczak' Kowalczyk wrote:
>>>>> Why do you indent "end" like the contents of the block rather than like
>>>>> the corresponding "begin"? This is against a widely accepted convention,
>>>>> and unreadable for me. This would be better:
>>>>
>>>> Er, ah, you find a minor variation in code layout makes it unreadable
>>>> for you?
>>>
>>> Ok, "hard to read".
>>
>> Good. We are making progress. You haven't given any good reason for
>> it being hard to read, other than it is something to which you are not
>> accustomed. Mind you, that can be sufficient reason.
>

> I would put it more strongly: It is something to which nobody I
>know is accustomed. It is something I'd never seen, nor even considered,
>before you brought it up. I've seen plenty of dubious styles that
>Aren't Mine, but the "indented 'end' statement" style Isn't Anyone's
>But Yours, as far as I know. ;)
> (IOW: I don't think the style is intrinsically evil, but it's a bit
>more than just "it's not a mirror of my own personal style." It really
>is objectively /weird/.)

My experience may be broader (or at least different) than yours. I
have even (albeit a long time ago) heard someone argue that code was
not properly structured *unless* the 'end' was indented. I never
quite understood the basis for the argument but it was made. I recall
it being argued for in comp.lang.c (in the context of K&R style
placement of '{'), albeit many years ago. What seems to have happened
is that C and C derived languages have become so pervasive that many
people don't know that there is any other way to do things or think
about things.

Be that as it may, I grant if one has never seen as alternative to a
conventional style then it may well feel weird.

>
>> This may or may not be true, depending upon the interpretation chosen.
>> The "begin" and "end" can be thought of as markers, and not statements
>> at all.
>

> This is AFAIK the usual interpretation.

It is the usual interpretation in languages coming out of the Algol
tradition. In other languages, PL/I being the most notable, they are
statements.

>
>> On the other hand they can be thought of as executable
>> statements. In particular, the "end" says "goto the continuation of
>> the block" where the continuation is determined by the "begin"
>> statement, which in turn says, "under such and such conditions execute
>> the following block, else goto the next statement after the block".
>

> This makes sense; a HLL-oriented assembly programmer might write
> [...]
> XOR B, B
> @top_of_loop:
> CMP B, #42
> JE @end_of_loop
> CALL @_foo
> INC B
> JMP @top_of_loop
> @end_of_loop:
> [...]
>Here, the 'JMP @top_of_loop' instruction acts semantically as an 'end'
>marker; yet it is syntactically "inside" the loop, and indented as
>such.

Or
[...]
XOR B, B
@top_of_loop:
CALL @_foo
INC B
CMP B, #42
JNE @top_of_loop
[...]

The standard thing to do in assembly (and an optimization that
compilers do routinely) is to put the termination test at the end
of the loop. You jump to it if the loop is a while loop and not if it
is an until loop. All of that is beside the point, of course; I just
thought I would mention it.
>
> However, this is still /not/ the usual convention in HLLs with respect
>to 'begin' and 'end'.

As noted above, it is not the usual convention in a large class of
languages, including C and C derivatives, which many people believe to
be the only languages that exist. Many is not all.

Be that as it may, it is quite feasible to take begin/end as
statements, and I've chosen to do so. Note that "begin" also is a
statement in the semantics that I am using. Also note that I am using
lispish syntax in that everything is surrounded by delimiters.

>
>[...]

>> But it does - it changes the semantics. The "end" is the last
>> statement of the block, stating that the block is done.
>

> What happens if you use the 'end' statement outside a block?
>Is it like the C 'continue' statement, which simply has no meaning
>outside loops (i.e., produces a syntax error)?

Yes.

>And Marcin: would
>your objections evaporate if Richard were to rename the statement
>from 'end' to 'continue'? :) (That's not an entirely facetious
>suggestion, either, Richard.)

That's an interesting thought; the problem is that all terms are
contaminated by their usage in other languages. "Continue" means
different things in fortran and in C.

>
>[...]

>> The consideration that you avoid considering is that putting the
>> begin/end statements/markers at the outer level adds clutter.
>> Consider the following:
>>
>> flow_control_struct_1
>> {
>> body_1
>> }
>> flow_control_struct_2
>> {
>> body_2
>> }
>> ... More of the same
>>
>> If we only view the outer level of indentation we have:
>>
>> flow_control_struct_1
>> {
>> }
>> flow_control_struct_2
>> {
>> }
>

> (1) Not using the One True Brace Style, we haven't. ;) and (2) to
>an experienced C/C++/Java programmer, those { and } lines are
>essentially blank. If he "only views the outer level," he sees
>only the /non-trivial/ outer level... which in the above example
>doesn't include the (trivial) braces. Same way with Pascal/Algol/...
>programmers and 'begin'/'end', I'm sure.

Quite so. One doesn't see simply doesn't see clutter; selective
blindness cuts in automatically. None-the-less it is still there.

>
>
>>> Besides, if you make indentation significant, why to have a separate end
>>> marker? Or, to put differently, if you have an explicit end marker, why to
>>> make indentation significant?
>>
>> You misapprehend. The requirement isn't for indentation being
>> significant; it is for the indentation style being consistent and
>> meaningful. One has choices as to which style one uses within a file;
>> that's what the style attributes are for. Perhaps you do not see
>> style and layout consistency as being important; I do. I have seen
>> far too many messes made by people who insist on using their own pet
>> formatting in code that they are modifying.
>

> This is a noble goal. :) You just have to worry about how to codify
>all the possible styles out there, and who gets to decide which styles
>are represented and which aren't. Certainly a similar "style code"
>for an existing free-form language like C or C++ would be next to
>impossible (see
>http://www.contrib.andrew.cmu.edu/~ajo/workshop.html#smartindent
>for the little I have to say on why GNU 'indent' sucks).

I sympathize; there really is no way to decently format C. In fact
.. never mind.

> OTOH, you [Richard] are starting from a clean slate, with a new
>language, where you make the style as you go along. So you have a
>fighting chance. :)

Just so. I suspect that it is not really possible to have
controllable style in a free form language.

Marcin 'Qrczak' Kowalczyk

unread,

Aug 13, 2004, 7:12:48 AM8/13/04

to

On Fri, 13 Aug 2004 03:30:12 +0000, Richard Harter wrote:

>>I don't like that every compound statement starts the same. The first word
>>of something is the natural primary distinguishing label. But here the
>>first word is "syntactic noise" and the essence is in the middle of a line.
>>Not to mention that it's against the tradition of almost all languages.
>
> For the moment clear your mind of this idea that there are compound
> statements.

You can't dismiss a complaint by calling things using different names.
It doesn't matter how it's called, it matters how it behaves and looks.

> It has the distinct merit of making it clear that the line in question
> starts a block.

It's more important that it computes something conditionally, than that
the subcomputation is written using several statements.

Especially since my preferred view of computation goes along impure
functional programming style:

- There is no distinction between statements and expressions.

- There is a kind of expression consisting of two subexpressions which
means "evaluate the first, ignore its result, evaluate the second".

Now the arguments of 'if' are simply expressions. There are no compound
statements any more than a function application is compound because
function arguments are denoted by subexpressions. The 'if' itself is an
expression and its arguments are expressions. It's a minor detail whether
the expressions are written in braces, or between 'then' and 'else', or
whatever. The point is that it's just an arbitrary expression - possibly
a sequence of statements, or another 'if'.

In this setting it makes no sense to talk about an 'if' being a block
itself. There is just one kind of 'if' no matter whether its arguments are
expressions, statements, or sequences of statements, because all these are
examples of expressions. Simple.

> In San end of line is a terminator. When we put the 'begin' after the
> conditional where do we put it? Experience says that putting it on the
> same line as the conditional is error prone.

Well, tons of programmers put '{' at the end of a line and like it.
I don't find that error prone.

I used to put '{' under the 'if', but later I realzed that the advantage
of saving a line is bigger than a disadvantage of not having '{' directly
above the '}'. Now I put '{' at the end of a line even in C function
definitions.

In my Kogut, like in Perl, the branches of 'if' are always inside { },
so the argument about being uncertain whether there is an '{' at the end
of a wide line is moot; there always is.

My preferred indentation style of 'if' with 'else' is like this:

if condition {
first case
}
else {
second case
}

and if cases are short, they are written as {case} on one line.

>>Does an 'if' construct form an expression? If not, how do you write an
>>expression-'if'?
>
> I'm not sure what you are counting as an 'if' construct and what you
> count as an expression.

Something that has a value.
Example: let max = if (x < y) {y} else {x};

> if (foo) {alpha; beta;}
>
> This is a single statement in C. The only way it can be written using
> multiple lines and a single level of indentation is if it is flat, i.e.,
> if we write it as
>
> if (foo)
> {
> alpha;
> beta;
> }
> }

But *why* would you write it in a single level of indentation?
Nobody does it. It's written indented, e.g.

if (foo) {
alpha;
beta;
}

What is the problem with that?

> When the block contents are indented we now have a single statement with
> multiple levels of indentation.

What is the problem with that?

> San does not have compound statements. In the equivalent
>
> begin if (foo)
> alpha
> beta
> end // Just for you
>
> there are four separate statements; each statement has a single level of
> indentation.

So what? I don't understand what does it have to do with "hierarchy
implied by indentation", whatever it means, and why calling "begin if
(foo)" a statement as a whole has any advantage (for me it's not a
statement, as it doesn't have a meaning without the matching 'end'
and statements between them).

Dr A. N. Walker

unread,

Aug 13, 2004, 10:19:27 AM8/13/04

to

In article <pan.2004.08.13....@knm.org.pl>,

Marcin 'Qrczak' Kowalczyk <qrc...@knm.org.pl> wrote:

>On Fri, 13 Aug 2004 03:30:12 +0000, Richard Harter wrote:
>> For the moment clear your mind of this idea that there are compound
>> statements.

[See below for my "take" on this.]

>You can't dismiss a complaint by calling things using different names.
>It doesn't matter how it's called, it matters how it behaves and looks.

Isn't it really a mixture? "Look and feel" is important,
but so is "name", as words like "begin" carry so much baggage.
Why, for example, do we have "for" loops? In my first serious
programming, loops were cycles: CYCLE i = 1,1,10; ...; REPEAT.
It seemed quite strange to move to a language where I had to
write "FOR" instead. In parts of the UK, "while" means what the
rest of us mean by "until", so that the traffic sign "WAIT HERE
WHILE LIGHTS ARE RED" was extremely confusing and had to be
re-worded; "while (foo) { ... };" must be just as confusing.
But today, "every" language uses "for ..." and "while ..." in
much the same way, and it would be folly to use them otherwise.

[...]

>Especially since my preferred view of computation goes along impure

>functional programming style: [...]

Right. But your preferences, no matter how interesting,
do not invalidate Richard's. All you are really saying is that
you don't like [the style of] SAN; that much is perfectly valid,
and I even think I agree with you. But it's less reasonable to
claim that SAN is "wrong" -- Richard is surely entitled, in his
own programming language, to be the only person in step?

>In this setting it makes no sense to talk about an 'if' being a block
>itself.

But in older settings it made perfect sense. In my own
early experiences, "BEGIN" and "END" were statements; there were
no "compound statements" -- the only use for "BEGIN ... END" was
to mark out a range of statements within which declarations were
valid; if you needed what we now call a compound statement, then
you had to use a jump: "GOTO 27 IF x = 0; ...; 27: ...". Later,
"BEGIN ... END" and "IF ... THEN" and other constructs evolved,
into their "modern" [ie 30+ year old!] form, to the point where
it's hard to imagine how novel, even exciting, some of them were
at the time. I doubt whether Richard can turn the clock back,
but it's interesting to see someone try.

> [...]. It's written indented, [...]

Another historical notelet: we can be influenced more than
we realise by historical baggage. When I started, we used paper tape
for all programming. Each character took 0.1 seconds to pass through
the Flexowriter in order to obtain program listings -- or to obtain
clean paper tapes for further [hand] editing. So many programmers
used no indentation at all. Looking back at my own code, each line
is either flush left [all main directives, labels, declarations, etc]
or exactly one tab indented [0.1s, for most "statements"]. I must
have been very advanced! Indenting stuff halfway across the line
just to indicate structure would have been very annoying to the queue
of people waiting to use the Flexowriter, so we didn't do it. Those
who used punched cards instead learned different habits. Later, when
indentation was virtually "free" [to the programmer, anyway], and took
no real time to display on a monitor, it acquired different baggages.

>I don't understand [...] why calling "begin if

>(foo)" a statement as a whole has any advantage

Does it have to have? In SAN, it *is* a statement, the
equivalent in most common languages is *not*; but it's a matter
of taste.

> (for me it's not a
>statement, as it doesn't have a meaning without the matching 'end'
>and statements between them).

Would you feel the same about [eg] compiler directives?
Eg, again in my own early experience, there were instructions for
switching on/off array bound checking, overflow checking, and
various other things, either statically or dynamically, on the
lines of "OVERFLOW CHECK OFF UNLESS i < 5". "OVERFLOW CHECK OFF"
only has meaning in the context of a [roughly] matching "... ON"
and statements between that get checked; but the language defined
it as a statement.

--
Andy Walker, School of MathSci., Univ. of Nott'm, UK.
a...@maths.nott.ac.uk

Marcin 'Qrczak' Kowalczyk

unread,

Aug 13, 2004, 11:24:45 AM8/13/04

to

On Fri, 13 Aug 2004 14:19:27 +0000, Dr A. N. Walker wrote:

>>Especially since my preferred view of computation goes along impure
>>functional programming style: [...]
>
> Right. But your preferences, no matter how interesting,
> do not invalidate Richard's.

Fine. I only use them to map the concepts of other programming languages
to concepts I can understand and feel.

For example Python syntax clearly distinguishes expressions and statements,
and doesn't have "block statements" (any construct which has a statement
as its part permits a sequence of statements there), yet I have no trouble
in mapping this syntax to a familiar semantics.

A Python expression maps to my expression. A Python statement maps to my
expression whose result is ignored. There is a slight complication with
'return', but it's solvable, especially as most languages work similarly:
a function definition implicitly introduces an aborting continuation, and
a 'return' statement enters that continuation of the innermost enclosing
function. As an optimization 'return' as the last statement can be
translated without the use of the continuation.

It's obvious that in an abstract syntax tree there would be an 'if' node
with three arguments: the condition, the "true" part, and the "false" part.
Nobody calls 'if condition:' alone a statement. If it was to be treated as
a statement, its meaning would no longer easily map to my model, even with
continuations available. Perhaps there is some smart encoding, but it's
not obvious.

>>In this setting it makes no sense to talk about an 'if' being a block
>>itself.
>
> But in older settings it made perfect sense. In my own
> early experiences, "BEGIN" and "END" were statements; there were
> no "compound statements" -- the only use for "BEGIN ... END" was
> to mark out a range of statements within which declarations were
> valid;

Sorry, I knew nothing about programming before 1985, so this model
is foreign to me. Apart from C64 Basic most languages did have some
hierarchical structure, without BEGIN statement or END statement.

>>I don't understand [...] why calling "begin if
>>(foo)" a statement as a whole has any advantage
>
> Does it have to have? In SAN, it *is* a statement, the
> equivalent in most common languages is *not*; but it's a matter
> of taste.

Not only taste.

In Haskell terms a Python expression has type
IO Value
and a Python statement has type
IO ()
and a Scheme expression has type
([Value] -> IO ()) -> IO ()
and a Lisp expression has type
IO [Value]
(they could be more informative if environment and state was taken
apart from the implicit IO monad and made explicit). I don't know what
type to give to San statements such that 'begin if (foo)' and 'end'
make sense as standalone statements.

>> (for me it's not a
>>statement, as it doesn't have a meaning without the matching 'end'
>>and statements between them).
>
> Would you feel the same about [eg] compiler directives?

If you mean something like C pragmas, they are not called statements nor
definitions. They are some metalevel construct which statically influences
the interpretation of other constructs. They are not "executed" themselves.

Dr A. N. Walker

unread,

Aug 13, 2004, 1:52:02 PM8/13/04

to

In article <pan.2004.08.13....@knm.org.pl>,
Marcin 'Qrczak' Kowalczyk <qrc...@knm.org.pl> wrote:

>It's obvious that in an abstract syntax tree there would be an 'if' node
>with three arguments: the condition, the "true" part, and the "false" part.
>Nobody calls 'if condition:' alone a statement.

Not *now*, and not in C/Algol-like languages. But there
were plenty of early languages in which, somewhat following machine
code, there was no "if condition then foo else bar" statement.
machine code usually only has conditional jumps, so that was what
many languages did: "if condition label" or the equivalent. No
compound statements, as there was no need to group. Even ulse"
had to be learned.

>Sorry, I knew nothing about programming before 1985, so this model

>is foreign to me. [...]

You could have much pleasure in store! Looking at languages
of the '50s [before even my time, so this is history to me as well],
there are amazingly many blind alleys and gropings; and also some
nifty ideas that died or were forgotten for no terribly good reason.

> [...]. I don't know what

>type to give to San statements such that 'begin if (foo)' and 'end'
>make sense as standalone statements.

What type should you give to "stop" [dynamic end of program]?
What to the empty statement? And why do you expect all statements
to be "standalone"? You are pre-supposing a model of computation
which is/was simply not always appropriate.

>> Would you feel the same about [eg] compiler directives?
>If you mean something like C pragmas, they are not called statements nor
>definitions. They are some metalevel construct which statically influences
>the interpretation of other constructs. They are not "executed" themselves.

But sometimes they are [or were]. A statement such as
"OVERFLOW CHECK OFF" generated some machine code [to influence the
operation of the CPU] which was obeyed dynamically if and when that
statement was reached. "END" too could, in many languages, generate
machine code [eg to unwind the stack], and there were even a few
languages in which "BEGIN ... END" [or the moral equivalent] did
not have to match -- you could do something like [schematically]

PROC a BEGIN ... GOTO label
PROC b BEGIN ... label: ... END
PROC c BEGIN ... IF foo END ... IF bar END ... END

[in which "END" is more like C's "return;"].

Marcin 'Qrczak' Kowalczyk

unread,

Aug 13, 2004, 2:53:58 PM8/13/04

to

On Fri, 13 Aug 2004 17:52:02 +0000, Dr A. N. Walker wrote:

>>Sorry, I knew nothing about programming before 1985, so this model
>>is foreign to me. [...]
>
> You could have much pleasure in store!

To be clear, I meant: before 1985 I knew nothing about programming, not:
I know nothing about how programming looked before 1985. Of course I don't
know too much because I didn't experience it myself.

Anyway, I think programs are better understood as hierarchical trees than
linear sequences of statements.

>> [...]. I don't know what
>>type to give to San statements such that 'begin if (foo)' and 'end'
>>make sense as standalone statements.
>
> What type should you give to "stop" [dynamic end of program]?

The IO monad includes mutable state and exception, and an exception caught
at the toplevel is enough to model ending the program, so it conforms to
the IO () type of statements. All statements have to have the same type.

It would probably be clearer to make some capabilities of the IO monad
explicit, e.g. the store and exceptions, but something of IO must remain
to model actual I/O. I don't like making the store explicit because the
encoding doesn't reflect that memory is used in the single threaded way
(unless we adopt some Clean-style uniqueness typing).

If ending the program has a different semantics than throwing an exception
which is caught at the toplevel, encoding must be further complicated.

> What to the empty statement?

No problem with that, every sensible encoding of statements includes
an encoding of the empty statement.

> And why do you expect all statements to be "standalone"?

It's easier to understand the language if constructs have semantics of
their own, which are then composed, than if larger parts of the program
must be considered as a whole.

> You are pre-supposing a model of computation which is/was simply not
> always appropriate.

It should be appropriate for something as simple and fundamental as a
conditional, unless the language really uses some very non-standard
evaluation model. For example Icon and Prolog model conditionals
differently than passing around boolean values. But I suppose SAN does
what most languages do, so its description should easily accommodate that.

> But sometimes they are [or were]. A statement such as
> "OVERFLOW CHECK OFF" generated some machine code [to influence the
> operation of the CPU] which was obeyed dynamically if and when that
> statement was reached.

Ok, if it's interpreted dynamically rather than statically, then it's
indeed a statement which changes some global variables, which are examined
by arithmetic.

Encoding conditionals in this way would be painful because they can be
nested. We would have to model the evaluation stack. It should be easier
if we don't model everything as sequential evaluation of statements:
- do something with the condition
- then do the statements to be executed if true",
- then do something corresponding to 'end'
which requires an explicit stack for pending locations to jump to,
but instead we compose the semantics of substatements into larger
statements, having conditionals with nested statements available in
the target language.

In order to understand a language it's not necessary to convert it to
assembly. Any understandable language will do. It can be quite high level.
My brain works in terms of conditionals, not in terms of goto.

Richard Harter

unread,

Aug 13, 2004, 4:17:33 PM8/13/04

to

On Fri, 13 Aug 2004 13:12:48 +0200, Marcin 'Qrczak' Kowalczyk
<qrc...@knm.org.pl> wrote:

>On Fri, 13 Aug 2004 03:30:12 +0000, Richard Harter wrote:
>
>>>I don't like that every compound statement starts the same. The first word
>>>of something is the natural primary distinguishing label. But here the
>>>first word is "syntactic noise" and the essence is in the middle of a line.
>>>Not to mention that it's against the tradition of almost all languages.
>>
>> For the moment clear your mind of this idea that there are compound
>> statements.
>
>You can't dismiss a complaint by calling things using different names.
>It doesn't matter how it's called, it matters how it behaves and looks.

Do you know, I don't care about your complaint as such; it amounts to
little more than a plaint that things aren't being done the way you
would do them. Be that as it may, the point is that the model of
computation is a model in which one has compound statements in the
sense that you use the term.

[snip discussion of 'if' as an expression]
[snip discussion of indentation style]

>> I'm not sure what you are counting as an 'if' construct and what you
>> count as an expression.
>
>Something that has a value.
>Example: let max = if (x < y) {y} else {x};

Okay. I'm not persuaded that the ternary form as such has enough
value that it warrants special treatment. In San one could always
have a function to do the same thing. Still, there are some
possibilities that I will explore.

[snip]

Richard Harter

unread,

Aug 13, 2004, 4:21:34 PM8/13/04

to

On Fri, 13 Aug 2004 20:17:33 GMT, c...@tiac.net (Richard Harter) wrote:

>
>Do you know, I don't care about your complaint as such; it amounts to
>little more than a plaint that things aren't being done the way you
>would do them. Be that as it may, the point is that the model of
>computation is a model in which one has compound statements in the
>sense that you use the term.

Er, the model of San is a model in which one DOES not have...