Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Common operators

34 views
Skip to first unread message

James Harris

unread,
Nov 21, 2021, 1:55:05 PM11/21/21
to
This is to follow up on comments from another thread. It's prompted by a
picture of a calculator which Bart posted:

https://upload.wikimedia.org/wikipedia/commons/9/9c/Hp35s_Calculator.jpg

I think the issue being discussed was what operators should be provided
by a language. I'd suggest almost none!

Don't get me wrong! AISI there should be standard operator /symbols/
such as +, *, >>, etc (and possibly some reserved operator words such as
"and" and "or") and they should have precedences and associativities but
that they should be semantically meaningless. Instead, they would be
given meaning by the data types to which they are applied.

That's not as strange as it may sound. In

A + B

the meaning of + will depend on whether A and B are ints or floats - or,
possibly, strings or some user-defined type.

If the meaning depends on the types of the operands then it doesn't have
to be built in but can be defined by code /for the requisite type/. For
example, if A and B are floats and there's a definition for the type
'float' which includes the following

type float
function "+"(float self, float other)
... code for float addition ...

then the code therein could carry out the necessary floating-point
addition. (I'm being deliberately vague about how as this topic is more
about invocation.)

The first point is that type float can be an external module and not
part of the language. The same is possibly true also of signed and
unsigned integers!?!

IOW the /language/ would define the precedences and associativity of
operator symbols or standard names but the meaning of those operators
would be defined /externally/.

The second point is that named prefix operations such as ABS and SIN -
and many like them - would not need to be part of the language. For one
thing, they could be defined in external modules and, for another, they
would appear best as functions rather than as operators. For example, in

abs(y)

the abs function referred to would depend on whether y was an int or a
float, and a call such as

sin(x)

would work for x being a float but the name "sin" would be undefined if
x were an integer.



Going back to the calculator, aside from common operations surely most
of the more obscure actions one could find on a calculator such as
statistical and financial operations would be most clearly expressed in
code by

op_name(args)

Further, that 'named prefix' form allows any number of functions to be
apply-able to the types of its args. Possibly most of an advanced
calculator's functions are unary, anyway, so the prefix form would be
more appropriate.

With function form there's no ambiguity about how any operations are
applied and the names give a clue as to the meaning. For example,

factorial(x)

gives a big clue as to what it might do....!



In sum, perhaps it would be better to build common operators into type
libraries and not make them part of the language itself.


--
James Harris

Bart

unread,
Nov 21, 2021, 2:45:50 PM11/21/21
to
I think I've made my view known on this!

You need a more complicated implementation, and one that implements more
advanced optimisations, which are going to be slower, to get code of the
same standard as you can get easily when things are built-in.

The point of my calculator link was that, while it was a programmable
calculator that could conceivably be used to implement ABS (which is an
easy function actually), they still provided an ABS button anyway.

Now, it's possible that there is an OEM maker of calculators that takes
delivery of unbranded programmable calculators where many buttons are
blank. And they decide what 'built-in' functions should be provided on
those buttons. The devices are supplied with those already programmed.

If languages took the same approach, then that's fine. The user won't
know any different (except it might be slower). And they will be using
apparently the same named language as other people, so they can easily
share code.

Except that languages don't like to draw that line and make those OEM
facilities available to everyone; each person creates their own language
with its own unwieldy set of half-implemented features.

Charles Lindsey

unread,
Nov 22, 2021, 9:26:09 AM11/22/21
to
On 21/11/2021 18:55, James Harris wrote:
> This is to follow up on comments from another thread. It's prompted by a picture
> of a calculator which Bart posted:
>
> https://upload.wikimedia.org/wikipedia/commons/9/9c/Hp35s_Calculator.jpg
>
> I think the issue being discussed was what operators should be provided by a
> language. I'd suggest almost none!
>
> Don't get me wrong! AISI there should be standard operator /symbols/ such as +,
> *, >>, etc (and possibly some reserved operator words such as "and" and "or")
> and they should have precedences and associativities but that they should be
> semantically meaningless. Instead, they would be given meaning by the data types
> to which they are applied.
>
> That's not as strange as it may sound. In
>
>   A + B
>
> the meaning of + will depend on whether A and B are ints or floats - or,
> possibly, strings or some user-defined type.
>
> If the meaning depends on the types of the operands then it doesn't have to be
> built in but can be defined by code /for the requisite type/. For example, if A
> and B are floats and there's a definition for the type 'float' which includes
> the following

You are re-inventing Algol 68 again.

--
Charles H. Lindsey ---------At my New Home, still doing my own thing------
Tel: +44 161 488 1845 Web: https://www.clerew.man.ac.uk
Email: c...@clerew.man.ac.uk Snail-mail: Apt 40, SK8 5BF, U.K.
PGP: 2C15F1A9 Fingerprint: 73 6D C2 51 93 A0 01 E7 65 E8 64 7E 14 A4 AB A5

Andy Walker

unread,
Nov 22, 2021, 3:34:16 PM11/22/21
to
On 21/11/2021 19:45, Bart wrote:
> On 21/11/2021 18:55, James Harris wrote:
>> In sum, perhaps it would be better to build common operators into
>> type libraries and not make them part of the language itself.

I agree with Charles!

> You need a more complicated implementation, and one that implements
> more advanced optimisations, which are going to be slower, to get
> code of the same standard as you can get easily when things are
> built-in.

Doesn't follow. The library can have access to parts of the
language that are not available to ordinary users, inc [eg] escapes
into assembler, inlining, access to the compiler's internal structures,
..., and can be optimised when the compiler is built. The point is to
keep the language definition simple, not cluttered [as are, eg, Pascal
and C] by having to explain as part of the language what "add-ops" are
and how they relate to "mult-ops" or logical operators.

--
Andy Walker, Nottingham.
Andy's music pages: www.cuboid.me.uk/andy/Music
Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Grieg

Bart

unread,
Nov 22, 2021, 6:08:01 PM11/22/21
to
On 22/11/2021 20:34, Andy Walker wrote:
> On 21/11/2021 19:45, Bart wrote:
>> On 21/11/2021 18:55, James Harris wrote:
>>> In sum, perhaps it would be better to build common operators into
>>> type libraries and not make them part of the language itself.
>
>     I agree with Charles!
>
>> You need a more complicated implementation, and one that implements
>> more advanced optimisations, which are going to be slower, to get
>> code of the same standard as you can get easily when things are
>> built-in.
>
>     Doesn't follow.  The library can have access to parts of the
> language that are not available to ordinary users, inc [eg] escapes
> into assembler, inlining, access to the compiler's internal structures,
> ..., and can be optimised when the compiler is built.

This is my point: you need:

* Inline assemby
* Inlining
* User-defined operator overloads
* The equivalent of C++'s constexpr
* A gcc-class optimiser
...

Too many advanced compilers features, just to keep the language 'small'.

Yet the user will see the small core features plus the ones implemented
via language features; they will still see a big language.

Where's the advantage, and who benefits?


  The point is to
> keep the language definition simple,

What, like Algol68? They seemed to have gone out of their way to make
the definition imcomprehensible!

(My a68g.exe is 2.8MB; my nearest language is 0.5MB. Just saying...)

James Harris

unread,
Nov 23, 2021, 10:50:58 AM11/23/21
to
On 22/11/2021 14:26, Charles Lindsey wrote:

...

>
> You are re-inventing Algol 68 again.
>

I am not reinventing it ... but if I am going down some of the same
lines then that's probably a good thing.

I have yet to reply to another post where you said the same, and I think
my reply to that may be similar to what Algol means by returning an
object rather than its value so maybe the similarities continue!



--
James Harris

James Harris

unread,
Nov 23, 2021, 11:39:55 AM11/23/21
to
On 22/11/2021 23:08, Bart wrote:
> On 22/11/2021 20:34, Andy Walker wrote:
>> On 21/11/2021 19:45, Bart wrote:
>>> On 21/11/2021 18:55, James Harris wrote:

>>>> In sum, perhaps it would be better to build common operators into
>>>> type libraries and not make them part of the language itself.

...

>>> You need a more complicated implementation, and one that implements
>>> more advanced optimisations, which are going to be slower, to get
>>> code of the same standard as you can get easily when things are
>>> built-in.
>>
>>      Doesn't follow.  The library can have access to parts of the
>> language that are not available to ordinary users, inc [eg] escapes
>> into assembler, inlining, access to the compiler's internal structures,
>> ..., and can be optimised when the compiler is built.
>
> This is my point: you need:
>
> * Inline assemby
> * Inlining
> * User-defined operator overloads
> * The equivalent of C++'s constexpr
> * A gcc-class optimiser

I don't see why Andy says there should be escapes to assembly and I'd
question your list, too.

I've got work to do on this, as yet, but here's how I see it starting.
AFAIK it's a novel approach so may need some synapse rewiring. :-)

A 'type library' would include code for each operator of that type. For
instance imagine that an expression such as

A + B

has A and B of the same type as each other. Let's say it's float. In the
type library for float there will be code for

<float> + <float> => <float>

The library would be responsible for generating the right IR. For
example, if we assume that 'wider' floats can hold all the values of
narrower ones and (to make this not too simple) we further assume that
the language specification says that float addition is performed to the
resolution of the widest of the two operands then the IR generation
might be along the lines of the following.

w = max(A._width, B._width)
IR.widen(A1, A, w)
IR.widen(B1, B, w)
IR.declare(float, w, Result)
IR.float_add(Result, A1, B1)

What that is intended to do is to work out the maximum width of A and B,
widen the narrower one (if any) to that width, declare a variable called
Result which has that width, then carry out the addition into Result.

Note where this sits in the compile process. The calls to IR routines
would populate the parse tree.

Note, also, that the tree would include noops such as those to widen a
64-bit float to another 64-bit float but such noops would disappear as
part of optimisation.

Above all, note that the code is at high level and not particularly
complicated. All it's really doing is adding operations to the parse tree.

> ...
>
> Too many advanced compilers features, just to keep the language 'small'.

For sure, the compiler would have the above complication. But the
handling of all data types could be written in the same way.

>
> Yet the user will see the small core features plus the ones implemented
> via language features; they will still see a big language.

Not at all. I see it as a benefit for a programmer to be able to
distinguish between language and libraries, and for the language to be
kept small.

>
> Where's the advantage, and who benefits?

The model is different, I grant you, but conceptually simple. And it's
more consistent. All data types - those inbuilt, those provided in
standard libraries, and those the programmer writes - would be treated
in the same way.

Further, the IR-generating code is simple enough that anyone who wanted
to see the details of how an operation would be implemented can go read
it! Maybe it would be enough to satisfy the language lawyers. :-)

I should say the above is work in progress. I am not committed to it
yet. One problem I still have is essentially overloading (that old
chestnut) and how to distinguish bound from unbound functions. (Bound
would be defined in a library for the type of their first argument.
Unbound would not. Yet calls to them might look remarkably similar....)

Flak jacket at the ready!


--
James Harris

Dmitry A. Kazakov

unread,
Nov 23, 2021, 12:44:42 PM11/23/21
to
On 2021-11-23 17:39, James Harris wrote:

> One problem I still have is essentially overloading (that old
> chestnut) and how to distinguish bound from unbound functions. (Bound
> would be defined in a library for the type of their first argument.
> Unbound would not. Yet calls to them might look remarkably similar....)

The terms are "method" vs "free function."

Dedicated argument makes no sense, obviously. It is motivated by
inability to implement multiple dispatch, serves as an excuse.

One answer is per types. Free function would be defined on the whole
class, that makes them distinguishable as the type would be different.

Most languages allow free functions with the same type, arguably they
should not [unless inside the private implementation]. This again is
problematic with modules and multiple dispatch.

So, basically, the issue is wide open to debate and there is no workable
concept in sight.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

James Harris

unread,
Nov 23, 2021, 3:38:32 PM11/23/21
to
On 23/11/2021 17:44, Dmitry A. Kazakov wrote:
> On 2021-11-23 17:39, James Harris wrote:
>
>> One problem I still have is essentially overloading (that old
>> chestnut) and how to distinguish bound from unbound functions. (Bound
>> would be defined in a library for the type of their first argument.
>> Unbound would not. Yet calls to them might look remarkably similar....)
>
> The terms are "method" vs "free function."

Maybe, although I think of methods as going along with classes, and
there are no classes (at least no inheritance) in what I have in mind.

>
> Dedicated argument makes no sense, obviously. It is motivated by
> inability to implement multiple dispatch, serves as an excuse.

I don't think I'd want multiple dispatch. As discussed elsewhere it
cannot scale well. At most, ISTM, one could select a routine based on
two types: that of one argument and that of the result, though in most
cases one would use only the type of one argument.

For sure, if something more complicated is required a routine could do
what I think you called double dispatch switching between different
implementations based on the types of any further arguments it wanted
(i.e. any arguments after the first). You had concerns that it would be
slow but AFAICS it would involve two dispatches which were about as fast
as possible and it would scale much better than multiple dispatch.

>
> One answer is per types. Free function would be defined on the whole
> class, that makes them distinguishable as the type would be different.

What about the syntax of invocation? In

a = F(b, c)

it might be that F is bound to the type of its first argument, b, or it
might be that the programmer has defined F as an unbound function with
two parameters. There's nothing in the source to indicate which is
meant, and that seems to be a problem.

>
> Most languages allow free functions with the same type, arguably they
> should not [unless inside the private implementation]. This again is
> problematic with modules and multiple dispatch.
>
> So, basically, the issue is wide open to debate and there is no workable
> concept in sight.

Understood. It's reassuring to hear you say so...!


--
James Harris

Dmitry A. Kazakov

unread,
Nov 23, 2021, 4:34:05 PM11/23/21
to
On 2021-11-23 21:38, James Harris wrote:
> On 23/11/2021 17:44, Dmitry A. Kazakov wrote:
>> On 2021-11-23 17:39, James Harris wrote:
>>
>>> One problem I still have is essentially overloading (that old
>>> chestnut) and how to distinguish bound from unbound functions. (Bound
>>> would be defined in a library for the type of their first argument.
>>> Unbound would not. Yet calls to them might look remarkably similar....)
>>
>> The terms are "method" vs "free function."
>
> Maybe, although I think of methods as going along with classes, and
> there are no classes (at least no inheritance) in what I have in mind.

You have it the form of implicit type conversions. Say you convert float
to double and have a function Foo defined on double and accepting float.
This Foo is a sort of method for the class {float, double}.

In C this is consistent because C does this *always*. Unless you do
same, you would have a massive problem that whether a conversion apply
could depend on the actual visibility context.

>> One answer is per types. Free function would be defined on the whole
>> class, that makes them distinguishable as the type would be different.
>
> What about the syntax of invocation? In
>
>   a = F(b, c)
>
> it might be that F is bound to the type of its first argument, b, or it
> might be that the programmer has defined F as an unbound function with
> two parameters. There's nothing in the source to indicate which is
> meant, and that seems to be a problem.

There is no problem. Let we have three types:

T1 a
T2 b
T3 c

Now, let F be not a method of b. This is what you mean, right?

The first argument of F would be class T2 rather than simple T2. It
means that b will be "converted" to the class. [Under the model that b
keeps the type tag it is null conversion.] Then inside F it will
dispatch to the methods of T2.

Thus regardless on any visibility the semantics F regarding b could not
change.

Methods persist and class-wide operations are always reduced to method
calls, which makes it safe.

Free functions of simple types in presence of classes of any kind is
asking for trouble.

Andy Walker

unread,
Nov 23, 2021, 6:04:21 PM11/23/21
to

On 23/11/2021 16:39, James Harris wrote:
> On 22/11/2021 23:08, Bart wrote:
>> On 22/11/2021 20:34, Andy Walker wrote:
>>>      Doesn't follow.  The library can have access to parts of the
>>> language that are not available to ordinary users, inc [eg] escapes
>>> into assembler, inlining, access to the compiler's internal structures,
>>> ..., and can be optimised when the compiler is built.
>> This is my point: you need:
>> * Inline assemby
[...]
> I don't see why Andy says there should be escapes to assembly and I'd
> question your list, too.

Andy didn't say there should be, but that there can be. IOW,
a [system or standard] library can do things that ordinary user code
can't; in particular it can know more than ordinary users about the
compiler internals. Whatever Bart expects his compiler to do about
[eg] standard operators can equally be done by the library. It's a
matter of taste whether the library is tightly integrated with the
compiler proper, or is more loosely connected [cf whether a multi-
pass compiler consists of a chain of completely separate programs
or is a collection of modules linked together into one executable].
There are pros and cons either way.

[For "assembly" read your IR if you prefer. It's just that
I was brought up with languages where you could use assembler codes
directly in your program. (Eg, A68R, the Malvern compiler, had
"CODE ... EDOC" brackets within which you could use ICL 1900 op codes,
augmented by symbolic access to your program's variables.) C was the
first language I used extensively in which this was not possible, but
instead you had to write (PDP 11) assembler modules which then got
linked in to your C.]

--
Andy Walker, Nottingham.
Andy's music pages: www.cuboid.me.uk/andy/Music
Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Handel

Andy Walker

unread,
Nov 23, 2021, 7:04:55 PM11/23/21
to
On 22/11/2021 23:08, Bart wrote:
>>> You need a more complicated implementation, and one that implements
>>> more advanced optimisations, which are going to be slower, to get
>>> code of the same standard as you can get easily when things are
>>> built-in.
>>      Doesn't follow.  The library can have access to parts of the
>> language that are not available to ordinary users, inc [eg] escapes
>> into assembler, inlining, access to the compiler's internal structures,
>> ..., and can be optimised when the compiler is built.
> This is my point: you need:
> * Inline assemby
> * Inlining
> * User-defined operator overloads

[You don't need user-defined overloads unless your language
includes them. In which case you need them because of the language,
not because you've implemented them in a library.]

> * The equivalent of C++'s constexpr
> * A gcc-class optimiser
> ...
> Too many advanced compilers features, just to keep the language 'small'.

If you're going to have these things, you need them anyway.
There is no clear-cut boundary between what compilers do and what
goes into a standard library. It doesn't matter whether it's the
compiler that decides that "A + B" turns into "load A, load B, add"
or the library that decides that. In either case, syntax analysis
produces a node in the parse tree consisting of an operator and its
two [in this case] operands, or some near equivalent to that. How
that gets processed further is a matter of taste.

> Yet the user will see the small core features plus the ones
> implemented via language features; they will still see a big
> language.

It isn't a "big language" if the syntax is small and simple.
The problem for users is not usually checking through the list of
library routines to see if there's one to multiply matrices or to
open a window, but is knowing precisely how to write loops or to
declare arrays of pointers to procedures.

> Where's the advantage, and who benefits?

Logical separation; careful users.

>> The point is to
>> keep the language definition simple,
> What, like Algol68? They seemed to have gone out of their way to make
> the definition imcomprehensible!

That is because some people went out of their way to claim
that it is incomprehensible, and others more recently [no names, no
pack drill] still go out of their way to ask me what a semicolon
means instead of looking it up themselves. The Watt-Peck-Sintzoff
chart in Algol Bulletin gets the A68 syntax onto one side; the
A68R User Guide gets it, in larger type and with lots of white
space, into well under two sides; substantially less than usual
representations of C, Pascal, Ada, ....

> (My a68g.exe is 2.8MB; my nearest language is 0.5MB. Just saying...)
My copy of "a68g" [2.8.4] is 1183284 bytes; it has crept
up over the decades from the 504749 bytes of 1.4.0, the earliest
A68G version I have. Algol itself hasn't changed, and A68G "then"
was missing only a few minor features, so I expect the increase
comes primarily from (a) much more extensive checking, (b) extra
language features such as partial parametrisation and the Torrix
extensions, and (c) additional library features such as Plotutils,
PostgreSQL, linear algebra, and many others. I expect that if
your language included such things, it too would grow from 0.5MB
to around 1MB.

--
Andy Walker, Nottingham.
Andy's music pages: www.cuboid.me.uk/andy/Music
Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Handel

James Harris

unread,
Nov 24, 2021, 5:06:43 AM11/24/21
to
On 23/11/2021 21:34, Dmitry A. Kazakov wrote:
> On 2021-11-23 21:38, James Harris wrote:
>> On 23/11/2021 17:44, Dmitry A. Kazakov wrote:
>>> On 2021-11-23 17:39, James Harris wrote:

...

>> What about the syntax of invocation? In
>>
>>    a = F(b, c)
>>
>> it might be that F is bound to the type of its first argument, b, or
>> it might be that the programmer has defined F as an unbound function
>> with two parameters. There's nothing in the source to indicate which
>> is meant, and that seems to be a problem.
>
> There is no problem. Let we have three types:
>
>    T1 a
>    T2 b
>    T3 c
>
> Now, let F be not a method of b. This is what you mean, right?

I can't be sure whether you meant to say "not" in that but the idea is
that the definition of type T2 would include at least one function
called F as in

type T2
function F


>
> The first argument of F would be class T2 rather than simple T2.

I don't know about that. AFAICS the first argument of F would be of
/type/ T2. That's all.

The problem I have is that F could be defined within type T2 (i.e.
'bound' to T2) as in

type T2
function F(....)

or it could be a plain ('unbound') function as in

function F(T2 x, ....)

I am uncomfortable about having a function called F potentially findable
in two completely different places.

What to do? I don't know. AFAICS both forms of function definition have
their uses - and programmers are used to both forms.

Am open to suggestions!


> It
> means that b will be "converted" to the class. [Under the model that b
> keeps the type tag it is null conversion.] Then inside F it will
> dispatch to the methods of T2.
>
> Thus regardless on any visibility the semantics F regarding b could not
> change.
>
> Methods persist and class-wide operations are always reduced to method
> calls, which makes it safe.
>
> Free functions of simple types in presence of classes of any kind is
> asking for trouble.
>

The trouble is that some functions are naturally free/unbound, i.e. they
either don't have any parameters or they are not naturally bound to the
type of their first parameter so in a call such as

F(x)

there's no way to tell whether F is free/bound or non-free/unbound.


--
James Harris

Dmitry A. Kazakov

unread,
Nov 24, 2021, 6:18:57 AM11/24/21
to
On 2021-11-24 11:06, James Harris wrote:
> On 23/11/2021 21:34, Dmitry A. Kazakov wrote:
>> On 2021-11-23 21:38, James Harris wrote:
>>> On 23/11/2021 17:44, Dmitry A. Kazakov wrote:
>>>> On 2021-11-23 17:39, James Harris wrote:
>
> ...
>
>>> What about the syntax of invocation? In
>>>
>>>    a = F(b, c)
>>>
>>> it might be that F is bound to the type of its first argument, b, or
>>> it might be that the programmer has defined F as an unbound function
>>> with two parameters. There's nothing in the source to indicate which
>>> is meant, and that seems to be a problem.
>>
>> There is no problem. Let we have three types:
>>
>>     T1 a
>>     T2 b
>>     T3 c
>>
>> Now, let F be not a method of b. This is what you mean, right?
>
> I can't be sure whether you meant to say "not" in that but the idea is
> that the definition of type T2 would include at least one function
> called F as in
>
>   type T2
>     function F

You are talking about [misfortunate] lexical elements of your language.
The question is the semantics of these. I guessed that this construct
would imply that F is a method in T2 argument/result and when F declared
outside would be not, i.e. a free function.

>> The first argument of F would be class T2 rather than simple T2.
>
> I don't know about that. AFAICS the first argument of F would be of
> /type/ T2. That's all.

That means a plain free function, which has all problems expected. You
get what you asked for.

> What to do? I don't know. AFAICS both forms of function definition have
> their uses - and programmers are used to both forms.
>
> Am open to suggestions!

I already explained that. If F were class-wide in b, it would have no
such problems because it would always go back to the methods. Any
type-specific argument or result, or if you want alternative
terminology, any *contravariant* argument, is a trouble in presence of
classes of whatever origin. Drop classes and you will have no problems.
Keep classes (implicit conversions included) and you will have problems
with contravariance. OK?

> The trouble is that some functions are naturally free/unbound, i.e. they
> either don't have any parameters or they are not naturally bound to the
> type of their first parameter so in a call such as
>
>   F(x)
>
> there's no way to tell whether F is free/bound or non-free/unbound.

Why should anybody care? It tells do F on x. The effect is determined by
the type of x. Qualifiers like "bound" are implementation details at
best, otherwise meaningless.

James Harris

unread,
Nov 24, 2021, 10:35:54 AM11/24/21
to
I was illustrating semantics. With pseudocode. Nothing to do with
lexical elements.


> I guessed that this construct
> would imply that F is a method in T2 argument/result and when F declared
> outside would be not, i.e. a free function.

Yes.

>
>>> The first argument of F would be class T2 rather than simple T2.
>>
>> I don't know about that. AFAICS the first argument of F would be of
>> /type/ T2. That's all.
>
> That means a plain free function, which has all problems expected. You
> get what you asked for.

It is not possible to make all functions bound. For example,

time_now()

or

emit_newline()

cannot be bound to the types of their first parameters because neither
has a first parameter.

>
>> What to do? I don't know. AFAICS both forms of function definition
>> have their uses - and programmers are used to both forms.
>>
>> Am open to suggestions!
>
> I already explained that. If F were class-wide in b, it would have no
> such problems because it would always go back to the methods. Any
> type-specific argument or result, or if you want alternative
> terminology, any *contravariant* argument, is a trouble in presence of
> classes of whatever origin. Drop classes and you will have no problems.
> Keep classes (implicit conversions included) and you will have problems
> with contravariance. OK?

Unfortunately, what you explained doesn't see to match because you are
using terms (such as contravariance) associated with subtyping when
there is no subtyping in the problem! And it's left me mystified as to
how all those terms relate to the issue at hand - if at all. :-(

When you explain things it might help if you were to do so in terms of
the mechanics rather than in terms of buzzwords/jargon.

>
>> The trouble is that some functions are naturally free/unbound, i.e.
>> they either don't have any parameters or they are not naturally bound
>> to the type of their first parameter so in a call such as
>>
>>    F(x)
>>
>> there's no way to tell whether F is free/bound or non-free/unbound.
>
> Why should anybody care? It tells do F on x. The effect is determined by
> the type of x. Qualifiers like "bound" are implementation details at
> best, otherwise meaningless.
>

Well, if you saw

F(x)

where would you as a programmer look to find where F was defined?

Similarly, if F were to be defined in two places as in

type T2
function F....

and

function(T2 F, ....)

which one should the F(x) in

T2 x
F(x)

invoke?

As it happens, I am thinking that perhaps I should just accept that
programmers would want to use both forms but then what happens if the
same name is defined in both ways? Does one take precedence? Should the
compiler (or the linker if from separate modules) flag an error?

It may be worth bearing in mind that the bound form can always be
accessed via the type as in

T2.F

or

(typeof x).F

By contrast, I suppose that the unbound form could be accessed by means
of whatever contains it such as

namespace N
function F

N.F

so even if the same name were defined in both ways there would always be
a way to specify one or the other - with the bound form being easiest to
specify.


--
James Harris

Dmitry A. Kazakov

unread,
Nov 24, 2021, 11:08:38 AM11/24/21
to
On 2021-11-24 16:35, James Harris wrote:
> On 24/11/2021 11:18, Dmitry A. Kazakov wrote:

>> That means a plain free function, which has all problems expected. You
>> get what you asked for.
>
> It is not possible to make all functions bound. For example,
>
>   time_now()
>
> or
>
>   emit_newline()
>
> cannot be bound to the types of their first parameters because neither
> has a first parameter.

emit_newline() does not have problems either as there is no class involved.

time_now() has the result, which has a type and if this type is a member
of a class, you have all problems back. The class could be like

{ Calendar_Zoned_Time, UTC_Time, Monotonic_Real_Time }

>> I already explained that. If F were class-wide in b, it would have no
>> such problems because it would always go back to the methods. Any
>> type-specific argument or result, or if you want alternative
>> terminology, any *contravariant* argument, is a trouble in presence of
>> classes of whatever origin. Drop classes and you will have no
>> problems. Keep classes (implicit conversions included) and you will
>> have problems with contravariance. OK?
>
> Unfortunately, what you explained doesn't see to match because you are
> using terms (such as contravariance) associated with subtyping when
> there is no subtyping in the problem! And it's left me mystified as to
> how all those terms relate to the issue at hand - if at all. :-(

As I said, no classes no problems. Your "bondage" stuff loses sense
without classes. It makes no difference how you declare subprograms and
where back in FORTRAN-IV. You want FORTRAN-IV be FORTRAN-IV! You want
more, learn the meaning of the "buzz words"...

> When you explain things it might help if you were to do so in terms of
> the mechanics rather than in terms of buzzwords/jargon.

All mechanics was explained 100 times over. You just skipped it all 100
times.

>> Why should anybody care? It tells do F on x. The effect is determined
>> by the type of x. Qualifiers like "bound" are implementation details
>> at best, otherwise meaningless.
>
> Well, if you saw
>
>   F(x)
>
> where would you as a programmer look to find where F was defined?

Right-mouse click -> "go to definition" It was a stupid question, really.

> Similarly, if F were to be defined in two places as in
>
>   type T2
>     function F....
>
> and
>
>   function(T2 F, ....)

I have no idea what the second is supposed to mean, a cat walking the
keyboard?

> which one should the F(x) in
>
>   T2 x
>   F(x)
>
> invoke?

The one declared on the type T2, obviously. Sorry, this question looks
even more stupid to me.

[...]

> so even if the same name were defined in both ways there would always be
> a way to specify one or the other - with the bound form being easiest to
> specify.

Free functions is trouble in presence of classes, that was established.

Bart

unread,
Nov 24, 2021, 1:19:14 PM11/24/21
to
On 24/11/2021 00:04, Andy Walker wrote:
> On 22/11/2021 23:08, Bart wrote:
>>>> You need a more complicated implementation, and one that implements
>>>> more advanced optimisations, which are going to be slower, to get
>>>> code of the same standard as you can get easily when things are
>>>> built-in.
>>>      Doesn't follow.  The library can have access to parts of the
>>> language that are not available to ordinary users, inc [eg] escapes
>>> into assembler, inlining, access to the compiler's internal structures,
>>> ..., and can be optimised when the compiler is built.
>> This is my point: you need:
>> * Inline assemby
>> * Inlining
>> * User-defined operator overloads
>
>     [You don't need user-defined overloads unless your language
> includes them.  In which case you need them because of the language,
> not because you've implemented them in a library.]

If you want to have an overloaded 'abs' operator or function available
to the user of the language (so I'm not specifying how), then if it's
not built-in, it will need the ability to do user-overloads, plus
inlining if you want it as efficient.


>> * The equivalent of C++'s constexpr
>> * A gcc-class optimiser
>> ...
>> Too many advanced compilers features, just to keep the language 'small'.
>
>     If you're going to have these things, you need them anyway.
> There is no clear-cut boundary between what compilers do and what
> goes into a standard library.  It doesn't matter whether it's the
> compiler that decides that "A + B" turns into "load A, load B, add"
> or the library that decides that.

It matters very much. My static language is self-hosted, and can build
itself entirely from source fast enough that I could do so ten times
every second if I was so inclined.

To me that is important; not that I need to do that 600 times a minute,
but it means an effectively zero edit-build-run cycle for similar-sized
applications.

That would be harder to achieve using the wrong language choices; it
would also take longer, and the result would build apps more slowly.


>  In either case, syntax analysis
> produces a node in the parse tree consisting of an operator and its
> two [in this case] operands, or some near equivalent to that.  How
> that gets processed further is a matter of taste.


Fine. So you don't care exactly how the language makes standard language
features appear as though they are built-in. So there's no problem.

>> What, like Algol68? They seemed to have gone out of their way to make
>> the definition imcomprehensible!
>
>     That is because some people went out of their way to claim
> that it is incomprehensible,

I hadn't look at the report for years, and forgotten what it was like.

But I did recently glimpse it again at the back of that PDF. The first
thing you see is it talking about Protonotion, Metanotion, and Hypernotion.

A few pages further on, it's on about Moid, Mood and Mu.

You can't help but think that this is a language takes itself too
seriously! (I have also considered that it might all have been an
eloborate joke.)

An interesting exercise would be to take the language and create a new
definition without all those fancy terms.

The language, as far as I can see, doesn't offer abilities that are
particularly unusual or hard to understand, certainly amongst the
current crop of languages. It should therefore have a more down-to-earth
definition.


> and others more recently [no names, no
> pack drill] still go out of their way to ask me what a semicolon
> means instead of looking it up themselves.

Commas have the usual expected use in languages that employ them.

AFAICS in Algol68, ";" and "," can both be used to separate things, but
"A;B" means do A then B, but "A,B" do A and B, in either order or
concurrently.

Whether that applies to A[i,j,k] or F(a,b,c), I don't know.

However, it doesn't like F(a;b;c) when three arguments are expected. So
there is something more to it.

(Don't bother to explain; I'm never going to get it. I'm just not
surprised that Wirth left the Algol68 (or 'new Algol') group and
produced his own simpler language.)


  The Watt-Peck-Sintzoff
> chart in Algol Bulletin gets the A68 syntax onto one side;  the
> A68R User Guide gets it, in larger type and with lots of white
> space, into well under two sides;  substantially less than usual
> representations of C, Pascal, Ada, ....
>
>> (My a68g.exe is 2.8MB; my nearest language is 0.5MB. Just saying...)
>     My copy of "a68g" [2.8.4] is 1183284 bytes;  it has crept
> up over the decades from the 504749 bytes of 1.4.0, the earliest
> A68G version I have.  Algol itself hasn't changed, and A68G "then"
> was missing only a few minor features, so I expect the increase
> comes primarily from (a) much more extensive checking, (b) extra
> language features such as partial parametrisation and the Torrix
> extensions, and (c) additional library features such as Plotutils,
> PostgreSQL, linear algebra, and many others.  I expect that if
> your language included such things, it too would grow from 0.5MB
> to around 1MB.

Yeah I think this has come up before.

The reality is that the languages (either of mine, both 0.5MB) are too
different to do a meaningful comparison. A68 has lots of features I
don't, and I have quite a few missing in A68).

One point of similarity however is the use of a single, self-contained
executable.


James Harris

unread,
Nov 26, 2021, 12:21:11 PM11/26/21
to
On 24/11/2021 16:08, Dmitry A. Kazakov wrote:
> On 2021-11-24 16:35, James Harris wrote:
>> On 24/11/2021 11:18, Dmitry A. Kazakov wrote:

...

>>> I already explained that. If F were class-wide in b, it would have no
>>> such problems because it would always go back to the methods. Any
>>> type-specific argument or result, or if you want alternative
>>> terminology, any *contravariant* argument, is a trouble in presence
>>> of classes of whatever origin. Drop classes and you will have no
>>> problems. Keep classes (implicit conversions included) and you will
>>> have problems with contravariance. OK?
>>
>> Unfortunately, what you explained doesn't see to match because you are
>> using terms (such as contravariance) associated with subtyping when
>> there is no subtyping in the problem! And it's left me mystified as to
>> how all those terms relate to the issue at hand - if at all. :-(
>
> As I said, no classes no problems. Your "bondage" stuff loses sense
> without classes. It makes no difference how you declare subprograms and
> where back in FORTRAN-IV. You want FORTRAN-IV be FORTRAN-IV! You want
> more, learn the meaning of the "buzz words"...

I have tried to understand your use of buzzwords but quite apart from
you expecting others to know Ada lingo, in some cases I remember finding
that you use buzzwords differently from other people. So if you want to
obfuscate a meaning you are doing exactly the right thing.

The real world does not happen in jargon but in transformations of data
and data structures. It should always be possible to explain a concept
in terms of data structures or algorithms. Binding a name to a function
is a case in point and the process could be described by steps, not jargon.

>
>> When you explain things it might help if you were to do so in terms of
>> the mechanics rather than in terms of buzzwords/jargon.
>
> All mechanics was explained 100 times over. You just skipped it all 100
> times.

Rubbish. You may have used buzzwords 100 times but that does not
constitute an explanation.

...

>> Similarly, if F were to be defined in two places as in
>>
>>    type T2
>>      function F....
>>
>> and
>>
>>    function(T2 F, ....)
>
> I have no idea what the second is supposed to mean, a cat walking the
> keyboard?

Sorry, it was pseudo C syntax. The type name T2 preceding the identifier
F was meant to indicate that F would be of type T2. The point of the two
fragments of code was that in BOTH of them the first argument to F would
be of type T2. The problem, then, is to say which F would be invoked by

F(x)

where x was of type T2.


--
James Harris

Dmitry A. Kazakov

unread,
Nov 26, 2021, 3:05:07 PM11/26/21
to
On 2021-11-26 18:21, James Harris wrote:

> The real world does not happen in jargon but in transformations of data
> and data structures.

... precisely described by that jargon.

>>> Similarly, if F were to be defined in two places as in
>>>
>>>    type T2
>>>      function F....
>>>
>>> and
>>>
>>>    function(T2 F, ....)
>>
>> I have no idea what the second is supposed to mean, a cat walking the
>> keyboard?
>
> Sorry, it was pseudo C syntax. The type name T2 preceding the identifier
> F was meant to indicate that F would be of type T2. The point of the two
> fragments of code was that in BOTH of them the first argument to F would
> be of type T2. The problem, then, is to say which F would be invoked by
>
>   F(x)
>
> where x was of type T2.

The call is ambiguous when F of both declarations is directly visible.

An important difference between a method and a free function is that
method cannot be hidden. If you see T2 or any object of you also see all
methods of T2.

James Harris

unread,
Nov 26, 2021, 4:56:27 PM11/26/21
to
On 26/11/2021 20:05, Dmitry A. Kazakov wrote:
> On 2021-11-26 18:21, James Harris wrote:

...

>>>> Similarly, if F were to be defined in two places as in
>>>>
>>>>    type T2
>>>>      function F....
>>>>
>>>> and
>>>>
>>>>    function(T2 F, ....)
>>>
>>> I have no idea what the second is supposed to mean, a cat walking the
>>> keyboard?
>>
>> Sorry, it was pseudo C syntax. The type name T2 preceding the
>> identifier F was meant to indicate that F would be of type T2. The
>> point of the two fragments of code was that in BOTH of them the first
>> argument to F would be of type T2. The problem, then, is to say which
>> F would be invoked by
>>
>>    F(x)
>>
>> where x was of type T2.
>
> The call is ambiguous when F of both declarations is directly visible.

Indeed. The question is what to do about the ambiguity. Options:

1. Pick the F function found in the type.

2. Pick the F function which is 'unbound'.

3. If the two functions are defined in different scopes pick the one
which is 'closer in' to where F is invoked.

4. Reject the expression because of the ambiguity.

I do not know which approach would be best.


>
> An important difference between a method and a free function is that
> method cannot be hidden. If you see T2 or any object of you also see all
> methods of T2.

OK.


--
James Harris

Andy Walker

unread,
Nov 26, 2021, 5:00:44 PM11/26/21
to
On 24/11/2021 18:19, Bart wrote:
>>> * User-defined operator overloads
>>      [You don't need user-defined overloads unless your language
>> includes them.  In which case you need them because of the language,
>> not because you've implemented them in a library.]
> If you want to have an overloaded 'abs' operator or function
> available to the user of the language (so I'm not specifying how),
> then if it's not built-in, it will need the ability to do
> user-overloads, plus inlining if you want it as efficient.

"Available to the user" is not the same as "user-defined".
There are typically lots of things in a library that the user not
merely does not need to define but often cannot define within the
formal language. This typically includes basic types such as "int",
privileged calls to the OS, environment enquiries, generic types
[eg when "print" can take any type as parameter(s)], ....

[...]
>>> Too many advanced compilers features, just to keep the language 'small'.
>>      If you're going to have these things, you need them anyway.
>> There is no clear-cut boundary between what compilers do and what
>> goes into a standard library.  It doesn't matter whether it's the
>> compiler that decides that "A + B" turns into "load A, load B, add"
>> or the library that decides that.
> It matters very much. My static language is self-hosted, and can
> build itself entirely from source fast enough that I could do so ten
> times every second if I was so inclined.

So somewhere in your source there is code that says "A + B"
does not mean "pass A and B to a function called +" but instead notes
this as a special sort of node in the parse tree [or near equivalent]
and says "Aha! This compiles into ...". The effect is the same
whether this is built-in to the parser or is some "magic" in the
library. Which version is faster depends on how it is done; there
are pros and cons.

> To me that is important; not that I need to do that 600 times a
> minute, but it means an effectively zero edit-build-run cycle for
> similar-sized applications.

No, it means an effectively zero build-run part of that
cycle. If it was ten times slower, it would have negligible
effect on the whole cycle, unless it takes you less than a few
seconds to note the results of the run, work out what has gone
wrong [if anything], decide what you need to change, go to the
right place in the code, and type in the change. To which you
can [or should!] add the time needed to run regression tests,
and documentation. [I'm not saying that speed of compilation is
unimportant, of course.]

[A68:]
> But I did recently glimpse it again at the back of that PDF. The
> first thing you see is it talking about Protonotion, Metanotion, and
> Hypernotion.

Normal people largely skip that bit; it's just explaining
how two-level grammars work.

> A few pages further on, it's on about Moid, Mood and Mu.

Actually about "MOID", etc. That's the metalanguage, the
higher of the two levels. It's how you get an infinite set of
lower-level syntax rules, escape the tyranny of BNF, and check
as a matter of syntax that variables are declared, etc.

> You can't help but think that this is a language takes itself too
> seriously! (I have also considered that it might all have been an
> eloborate joke.)

You can't please everyone. A "MOID" is either a "MODE"
[type] or "void"; what should it be called, given that you want
to use it several hundred times? You want something short that
conveys the right general impression. No matter what you choose,
it will seem too jokey to some, too up itself to others.

> An interesting exercise would be to take the language and create a
> new definition without all those fancy terms.

Feel free. You could start with the brief syntax in the
"A68R User's Guide" or in [eg] "Introductory Algol 68 Programming".
But beware that the C standard and other similar documents are
several times longer, and nowhere near as definitive as the RR.

[...]
> AFAICS in Algol68, ";" and "," can both be used to separate things,
> but "A;B" means do A then B, but "A,B" do A and B, in either order or
> concurrently.
> Whether that applies to A[i,j,k] or F(a,b,c), I don't know.
> However, it doesn't like F(a;b;c) when three arguments are expected.
> So there is something more to it.

Of course there is more to it! You have supplied a serial
clause where a parameter pack is required. You can't expect just
to slap down random sequences of characters without understanding
what they are supposed to be -- in any language! -- and have it
work. AFAIK, no language I have ever used expects "A[i;j;k]" to
be an array element or "F(a;b;c)" to be the call of a function
with three parameters. I don't see any interesting distinction
between the way ";" and "," are used in Algol and the way they
are used in Pascal or C; ";" primarily to end a "statement" or
similar, and"," primarily to break up lists.

> (Don't bother to explain; I'm never going to get it. I'm just not
> surprised that Wirth left the Algol68 (or 'new Algol') group and
> produced his own simpler language.)

Well, simpler to compile, for sure. Simpler for new
programmers to learn or for experienced programmers to use?
Far from clear. Simpler to describe? People mocked the 236
pages of the RR [most of it comments, examples or the system
library] when it first came out -- until they found that proper
descriptions of other languages were at least as long, and
nowhere near as precise.

--
Andy Walker, Nottingham.
Andy's music pages: www.cuboid.me.uk/andy/Music
Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Marpurg

Dmitry A. Kazakov

unread,
Nov 26, 2021, 5:16:24 PM11/26/21
to
On 2021-11-26 22:56, James Harris wrote:

> Indeed. The question is what to do about the ambiguity. Options:
>
> 1. Pick the F function found in the type.
>
> 2. Pick the F function which is 'unbound'.
>
> 3. If the two functions are defined in different scopes pick the one
> which is 'closer in' to where F is invoked.
>
> 4. Reject the expression because of the ambiguity.
>
> I do not know which approach would be best.

Only #4 is valid.

What I tried to explain you with "buzz" words the solution is preventing
some ambiguities to appear by limiting free functions.

There are also other language design requirements regarding consistency
and unambiguity of program, especially similarly looking code in
different contexts or literally same generic code instantiated in
different contexts.

But I do not want to go into it, because it will be another spat about
"buzz" words.

Bart

unread,
Nov 26, 2021, 8:21:05 PM11/26/21
to
On 26/11/2021 22:00, Andy Walker wrote:
> On 24/11/2021 18:19, Bart wrote:
>>>> * User-defined operator overloads

>     So somewhere in your source there is code that says "A + B"
> does not mean "pass A and B to a function called +" but instead notes
> this as a special sort of node in the parse tree [or near equivalent]
> and says "Aha!  This compiles into ...".  The effect is the same
> whether this is built-in to the parser or is some "magic" in the
> library.  Which version is faster depends on how it is done;  there
> are pros and cons.

Imagine if you were creating the bytecode for an interpreter. Bytecode
execution is already slow enough, you will surely want a dedicated
instruction for a such a common operation, rather than involve a
call/return sequence plus whatever other instructions are needed.

Well, the bytecode for my interpeter [and many others!] has such an ADD
instruction. So does the VM used by my compiler, and knows what is it
early on. Very early in fact; this table entry links 'addsym', a lexer
token, with 'kadd', the VM op for ADD:

(addsym, "+", bin_op, kadd, kaddto, 4, 1),

Why not specify it straightaway instead of going around the houses?

(The '4' is a precedence level, and the '1' indicates it can start an
expression. The 'kaddto' is an auxiliary bytecode for when used as
'+:='; that is built-in too.)


>> To me that is important; not that I need to do that 600 times a
>> minute, but it means an effectively zero edit-build-run cycle for
>> similar-sized applications.
>
>     No, it means an effectively zero build-run part of that
> cycle.  If it was ten times slower, it would have negligible
> effect on the whole cycle,

For a few years, I worked with an interpreted compiler. That was usable,
most compiles /of a module/ took 0.5 seconds or less.

(That it was usable at all was also thanks to the simple compilation
techniques I favoured.)

But that was with separate compilation. I wanted the advantages of a
whole-program compiler, so it had to be faster if I didn't want the
annoyance of waiting several seconds for each build.

So I switched to native, and got 20 times faster.

Why would I want to make it slower? It means the approach is scalable to
programs 10 times the size (it might take 1-2 seconds) before build
delays become intolerable, when I'd need some more speedups.

That is more interesting than inventing ways for the compiler to get slower.

(Yes I know that many people with much bigger projects and much slower
compilers have to wait minutes. That is partly up to them, but I feel
sorry for the people working with Rust now. I would switch language to
increase productivity and a more lively environment.)

> unless it takes you less than a few
> seconds to note the results of the run, work out what has gone
> wrong [if anything], decide what you need to change, go to the
> right place in the code, and type in the change.

With near-instant compilation you develop different ways of working. I
might edit, build and re-run just to the spacing right or fix some
annoying detail, which could take multiple builds a few seconds apart.

Also, doing nothing for several seconds interrupts your concentration.

> [A68:]
>> But I did recently glimpse it again at the back of that PDF. The
>> first thing you see is it talking about Protonotion, Metanotion, and
>> Hypernotion.
>
>     Normal people largely skip that bit;  it's just explaining
> how two-level grammars work.

(Did you mean Normally rather than Normal?)


>> A few pages further on, it's on about Moid, Mood and Mu.
>
>     Actually about "MOID", etc.  That's the metalanguage, the
> higher of the two levels.  It's how you get an infinite set of
> lower-level syntax rules, escape the tyranny of BNF, and check
> as a matter of syntax that variables are declared, etc.

Except that it isn't a matter of syntax; that's too constraining.(But
didn't you say its syntax could be described in 1-2 pages?)



>> You can't help but think that this is a language takes itself too
>> seriously! (I have also considered that it might all have been an
>> eloborate joke.)
>
>     You can't please everyone.  A "MOID" is either a "MODE"
> [type] or "void";  what should it be called, given that you want
> to use it several hundred times?

A MODE. Especially if as you say, a MODE/VOID is used in 100s of places.

If it's necessary to discriminate at some point between VOID and any
other MODE, then do so more locally, or with a less silly-sounding name.

Andy Walker

unread,
Nov 27, 2021, 5:19:48 PM11/27/21
to
On 27/11/2021 01:21, Bart wrote:
[I wrote:]
>> So somewhere in your source there is code that says "A + B"
>> does not mean "pass A and B to a function called +" but instead notes
>> this as a special sort of node in the parse tree [or near equivalent]
>> and says "Aha! This compiles into ...". The effect is the same
>> whether this is built-in to the parser or is some "magic" in the
>> library. Which version is faster depends on how it is done; there
>> are pros and cons.
> Imagine if you were creating the bytecode for an interpreter.
> Bytecode execution is already slow enough, you will surely want a
> dedicated instruction for a such a common operation, rather than
> involve a call/return sequence plus whatever other instructions are
> needed.

??? Of course it is common for back-end interpreters to
have "add" instructions. That makes it less, rather than more,
important for the front end to do anything special with it. You
seem to be hung up in thinking that a library cannot do anything
but turn "A + B" into two parameters and a function call. If it
is integrated into the system, it has privileged access to the
innards of the compiler, the operating system, ....

> Well, the bytecode for my interpeter [and many others!] has such an
> ADD instruction. So does the VM used by my compiler, [...].
> Why not specify it straightaway instead of going around the houses?

You can. But it's not "around the houses" if it's a
library that emits that bytecode rather than some "magic" in
the syntax analysis. It's just a matter of choice where in the
processing that emitting takes place. There is also merit in
having the syntax analysis simpler and more general. There is
also merit in leaving optimisations to a back end. There is room
for different models! Your requirement for a monolith is perhaps
different from [eg] Gnu's requirement for lots of front ends for
a variety of languages feeding into a common module feeding into
lots of back ends for different hardware.

[...]
> So I switched to native, and got 20 times faster.
> Why would I want to make it slower?

No reason why you should; I was merely pointing out that
whether a compilation and run takes 0.01s [as a large majority of
mine do using Algol 68G, with most of the rest < 1s] or 1s makes
no interesting difference to the complete edit-compile-run cycle.
But you seem to be assuming that putting things into libraries is
/ipso facto/ significantly slower than building them in to the
syntax. It may be; or it could be faster; or it could make
absolutely no measurable difference, depending on implementation
details and other factors.

[...]
> With near-instant compilation you develop different ways of working.
> I might edit, build and re-run just to the spacing right or fix some
> annoying detail, which could take multiple builds a few seconds
> apart.

Sure; I do that. It is still the case that my cycles
are almost entirely taken up by the edit phase.

[...]
>> [A68:]
>>> But I did recently glimpse it again at the back of that PDF. The
>>> first thing you see is it talking about Protonotion, Metanotion, and
>>> Hypernotion.
>> Normal people largely skip that bit; it's just explaining
>> how two-level grammars work.
> (Did you mean Normally rather than Normal?)

No, I meant "normal". Perhaps you're one of those strange
people who read encyclopaedias by starting at "Aachen" and going
through to "zygote"? Most people skip things they don't understand
and perhaps come back to them when better placed.

>>> A few pages further on, it's on about Moid, Mood and Mu.
>> Actually about "MOID", etc. That's the metalanguage, the
>> higher of the two levels. It's how you get an infinite set of
>> lower-level syntax rules, escape the tyranny of BNF, and check
>> as a matter of syntax that variables are declared, etc.
> Except that it isn't a matter of syntax; that's too constraining.

It /is/ a matter of syntax in the RR; in what sense is
it constraining? If you use undeclared variables, do you not
want the compiler to detect the error? Or do you prefer the
early versions of Fortran or C where the compiler assumes that
an undeclared "j" [for example] is an integer variable?

> (But didn't you say its syntax could be described in 1-2 pages?)

Yes, in the same way that most standard languages can
be described [eg] by a few pages of EBNF supplemented by lots
of pages of semantics. The RR chooses instead to have lots
of pages of syntax and rather few of semantics. Where to put
the boundary is a grey area. But, broadly speaking, it's the
semantics that lead to imprecision and debates about what
something means.

[...]
>> You can't please everyone. A "MOID" is either a "MODE"
>> [type] or "void"; what should it be called, given that you want
>> to use it several hundred times?
> A MODE. Especially if as you say, a MODE/VOID is used in 100s of
> places.

"MODE" [without "void" -- not "VOID"] is also used all over
the place; so, for that matter, is "void" without "MODE".

> If it's necessary to discriminate at some point between VOID and any
> other MODE, then do so more locally, or with a less silly-sounding
> name.

It's necessary throughout the RR. All part of being
careful with use of language. You find "MOID" silly; most
of us just find it a little quirky, and surely better than
[eg] "MODEORVOID".

--
Andy Walker, Nottingham.
Andy's music pages: www.cuboid.me.uk/andy/Music
Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Godfrey

Bart

unread,
Nov 28, 2021, 7:52:19 AM11/28/21
to
On 27/11/2021 22:19, Andy Walker wrote:
> On 27/11/2021 01:21, Bart wrote:
> [I wrote:]
>>>      So somewhere in your source there is code that says "A + B"
>>> does not mean "pass A and B to a function called +" but instead notes
>>> this as a special sort of node in the parse tree [or near equivalent]
>>> and says "Aha!  This compiles into ...".  The effect is the same
>>> whether this is built-in to the parser or is some "magic" in the
>>> library.  Which version is faster depends on how it is done;  there
>>> are pros and cons.
>> Imagine if you were creating the bytecode for an interpreter.
>> Bytecode execution is already slow enough, you will surely want a
>> dedicated instruction for a such a common operation, rather than
>> involve a call/return sequence plus whatever other instructions are
>> needed.
>
>         ???  Of course it is common for back-end interpreters to
> have "add" instructions.  That makes it less, rather than more,
> important for the front end to do anything special with it.  You
> seem to be hung up in thinking that a library cannot do anything
> but turn "A + B" into two parameters and a function call.  If it
> is integrated into the system, it has privileged access to the
> innards of the compiler, the operating system, ....

OK, you can understand that internally, a language+implementation can do
things as it pleases, just so long as the user can do A+B and it does
what is expected of it.

Which means /my/ implementation can choose to handle it as it likes too.

Yet not long ago you seemed to keen to implement things like:

A +:= B

via operator-overload mechanisms expressed within the language itself.
And more recently that you wanted new comparison operators to be defined
the same way.

In that latter case, the benefits of the language allowing such new
features without bothering the language or changing the compiler, seemed
to be outweighed by the poor quality and limitations of the result (and
its non-stardardness!)

But it seems my approach is perfectly valid too.


>> Well, the bytecode for my interpeter [and many others!] has such an
>> ADD instruction. So does the VM used by my compiler, [...].
>> Why not specify it

>> So I switched to native, and got 20 times faster.
>> Why would I want to make it slower?
>
>         No reason why you should;  I was merely pointing out that
> whether a compilation and run takes 0.01s [as a large majority of
> mine do using Algol 68G, with most of the rest < 1s] or 1s makes
> no interesting difference to the complete edit-compile-run cycle.
> But you seem to be assuming that putting things into libraries is
> /ipso facto/ significantly slower than building them in to the
> syntax.  It may be;  or it could be faster;  or it could make
> absolutely no measurable difference, depending on implementation
> details and other factors.

I've just used gcc to build a 4-line hello.c program; it took over 5
seconds even on my somewhat faster new PC, less than 1 line per second!

That's from cold; subsequent runs take 0.25 seconds; still only 20 lines
per second. Such products are cumbersome because they are so large, and
overheads mean even tiny programs take an annoying amount of time.

On normal-sized programs, it might do 14Klps unoptimised, or 3Klps at -O3.

The A68G manages up to a respectable 20Klps, although not spectacular
for an /interpreter/.

The products I make can manage 500Klps for my static language, and 1Mlps
for my interpreter.

With the intepreter, it needs to compile an entire program head of time
into bytecode, before it can start execution. It uses a bundled library
which is [currently] over 10K lines. At A68G speeds, every program would
take 0.5 seconds start-up time before it even started on the user's
application.

(That library largely provides graphics and image processing capabilities.)

A typical application of 25Kloc would take over a second start-up, an
annoying latency. Fortunately, at its actual speed, start-up delays only
start to be noticable when programs reach 100K lines, and then the delay
is minimal (around 1/10th of a second).

Now start to compare products that use tracing JIT-compilers, which
introduce enough overheads that on the wrong kind of application (not
enough localised bottlenecks), they can be slower than regular
interpreters with longer start-up times.


>> Except that it isn't a matter of syntax; that's too constraining.
>
>         It /is/ a matter of syntax in the RR;  in what sense is
> it constraining?  If you use undeclared variables, do you not
> want the compiler to detect the error?

But it's nothing to do with syntax! Syntax is just the shape of the
code. Here is the kind of syntax we've been using:

A + B

That's a perfectly valid expression in most normal languages. Except in
Algol68, because A and B haven't been declared?

It's far simpler to separate these different ideas: the shape of the
code; the resolving of the names (which may detect they are undefined);
the code generation, execution etc.

In my dynamic language, it doesn't care they might be undefined until
runtime.

Instead, Algol68 chooses the most complex approach all, that is hard to
understand and difficult to implement. You can tell it was created by
academics!

> Or do you prefer the
> early versions of Fortran or C where the compiler assumes that
> an undeclared "j" [for example] is an integer variable?

If that's what the language said, then that's what it will be. If I
execute this complete program of my 'Q' language:

print j

it says:

<Void>

(That's not an error, but most other uses are.)

It's not exactly the end of the world. In Algol68:

INT j;
print(j)

gives a runtime error. So j might be defined, but it hasn't done much
good! I guess the RR doesn't make correct /initialisation/ part of the
syntax?

>         It's necessary throughout the RR.  All part of being
> careful with use of language.  You find "MOID" silly;  most
> of us just find it a little quirky,

The whole thing is quirky. What I can confidently say is that, to me,
that whole document is almost completely useless for any purpose.

Andy Walker

unread,
Nov 28, 2021, 6:10:06 PM11/28/21
to
On 28/11/2021 12:52, Bart wrote:
> OK, you can understand that internally, a language+implementation can
> do things as it pleases, just so long as the user can do A+B and it
> does what is expected of it.
> Which means /my/ implementation can choose to handle it as it likes
> too.

Of course it can.

> Yet not long ago you seemed to keen to implement things like:
>    A +:= B
> via operator-overload mechanisms expressed within the language
> itself.

??? You again fail to distinguish between how something
is defined, and how it is implemented. "A +:= B" is defined by
the RR, /together with/ [as in many language specifications] a
general rule [end of RR10.1] that the definition describes the
effect, and implementations are free to use more efficient
methods. In the instant case, the RR explains what "A +:= B"
is supposed to do; you can't really expect it to give the
machine code for your computer, that's up to the compiler.

> And more recently that you wanted new comparison operators to
> be defined the same way.

If you have a new operator, how would you propose it be
defined, in general, other than within the language? We can't
go around re-revising the language specification every time
someone thinks of a new useful operator, nor every time someone
invents new hardware with different machine code.

> In that latter case, the benefits of the language allowing such new
> features without bothering the language or changing the compiler,
> seemed to be outweighed by the poor quality and limitations of the
> result (and its non-stardardness!)

Two different things. As noted during the debate, I
personally have no requirement for "A < B < C"; I merely point
out that I can at least /define/ what that means within A68.
If I point out some useful operator that your language is
missing [eg, the ability to write "A + B" where "A" and "B"
are graphs or games or functions], you have to change your
compiler [and its associated documentation]; I just write
and annotate some more A68. "Poor quality" is a matter of
opinion; you're like the man who, seeing his neighbour's
dog playing chess, complained that it wasn't playing very
well. "Limitations" are your lack of imagination. "Non-
standardness" happens every time you write a new function
in your program; it is up to good programmers to give new
functions sufficiently descriptive names [or comments].
The same applies to operators and types.

> But it seems my approach is perfectly valid too.

Of course it is, if that's all you want. But it
doesn't work for standardised languages used by lots of
people, and perhaps across lots of hardware.

> I've just used gcc to build a 4-line hello.c program; it took over 5
> seconds even on my somewhat faster new PC, less than 1 line per
> second!

Perhaps you should switch to plain old "cc"? Oh,
wait, on my machine that is a symlink to "gcc". On my home
PC, thirteen years old and nothing fancy, it compiles and
runs a "hello.c" program in 0.04s, only 3x slower than A68G.

[...]
>>> Except that it isn't a matter of syntax; that's too constraining.
>>          It /is/ a matter of syntax in the RR;  in what sense is
>> it constraining?  If you use undeclared variables, do you not
>> want the compiler to detect the error?
> But it's nothing to do with syntax!

In the RR, it /is/ to do with syntax. ...

> Syntax is just the shape of the
> code. Here is the kind of syntax we've been using:
>    A + B
> That's a perfectly valid expression in most normal languages. Except
> in Algol68, because A and B haven't been declared?

... If there is no declaration for "A" and "B", then it
may be syntactically correct in your language, but it surely is
not valid? In A68, there is no route from the concept "program"
to "... letter a symbol, plus symbol. letter b symbol, ..." if
they are not declared. Whether the compiler reports that as a
syntax error or more helpfully as "identifier A not declared" is
a matter of QoI.

> It's far simpler to separate these different ideas: the shape of the
> code; the resolving of the names (which may detect they are
> undefined); the code generation, execution etc.

So much simpler that the C, C++, Ada, ... standards are
much bigger, much harder to navigate, and leave far too much
ill-defined. Of course, if your language doesn't /have/ a
standard, that is indeed much simpler. But as soon as C moved
away from the one implementation at Bell Labs, and it was no
longer possible for everyone to ask DMR what something meant,
it all became more complicated. /You/ grumble about the result
in practically every exchange here with DB.

Anyway, /you/ see them as "these different ideas". The
RR sees at least the first two as a unified whole; there is a
book, "Grammars for Programming Languages" by Cleaveland and
Uzgalis which explains how to integrate all the rest also into
one single grammar. You might think that's going too far?

> In my dynamic language, it doesn't care they might be undefined until
> runtime.

That's fine; it just means that your language is not
the same as C or Algol or ... in this respect. Leaving things
to runtime has its own pros and cons; you seem very unwilling
to accept that some issues are "on the one hand ..., on the
other ..." so that there is a balance to be struck -- which
sometimes comes down one side, sometimes the other.

> Instead, Algol68 chooses the most complex approach all, that is hard
> to understand and difficult to implement.

You aren't expected to learn A68 from the RR, but from
a suitable textbook. It's not hard to learn, unless you are
claiming that you are substantially more stupid than the many
thousands of students who learned A68 as their first language
in [mainly] the 1970s. It's certainly harder than Pascal to
implement; but you write compilers once and use them many
times, so the difficulty is amortised over thousands of uses.

>> Or do you prefer the
>> early versions of Fortran or C where the compiler assumes that
>> an undeclared "j" [for example] is an integer variable?
> If that's what the language said, then that's what it will be. If I
> execute this complete program of my 'Q' language:
>     print j
> it says:
>     <Void>
> (That's not an error, but most other uses are.)

In a compiled language, it's a disaster waiting to
happen, which is why it was dropped from Fortran and C.
Again, the balance of risk is different in different
environments. Serious [esp life-critical] code should not
depend on things that can easily happen by accident, eg by
a simple typo.

> It's not exactly the end of the world. In Algol68:
>     INT j;
>     print(j)
> gives a runtime error. So j might be defined, but it hasn't done much
> good! I guess the RR doesn't make correct /initialisation/ part of
> the syntax?

It's undefined by the RR, and in general is not
computable at compile time. Requiring initialisation along
with the declaration is inefficient [think eg of a large
array where the required values are derived, eg, from user
input, so that filling in zeroes, for example, would be
wasted]. A68G, as you have found, treats it as a run-time
error; yes, there's a speed penalty, but it also picks up
lots of programming errors.

> [...] What I can confidently say is that, to me,
> [the RR] is almost completely useless for any purpose.

That's fine. You're not writing an A68 compiler.
But it also means that you're equally almost completely
unqualified to comment on any matter of A68.

--
Andy Walker, Nottingham.
Andy's music pages: www.cuboid.me.uk/andy/Music
Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Ravel

Bart

unread,
Nov 28, 2021, 7:53:35 PM11/28/21
to
On 28/11/2021 23:10, Andy Walker wrote:
> On 28/11/2021 12:52, Bart wrote:

>> I've just used gcc to build a 4-line hello.c program; it took over 5
>> seconds even on my somewhat faster new PC, less than 1 line per
>> second!
>
>     Perhaps you should switch to plain old "cc"?  Oh,
> wait, on my machine that is a symlink to "gcc".  On my home
> PC, thirteen years old and nothing fancy, it compiles and
> runs a "hello.c" program in 0.04s, only 3x slower than A68G.

On my WSL, 'gcc hello.c && ./a.out' takes 0.25 seconds from 'cold', and
then 0.1 seconds.

However, I use Windows. DB will say that Windows is slow at file
operations, so it is not gcc's fault.

I would say it /is/ the app's fault if it has a complex enough structure
that it struggles to start up each time, faults that may be hidden on
forgiving OSes. Hence why I sometimes like to test across both.

>> [...]  What I can confidently say is that, to me,
>> [the RR] is almost completely useless for any purpose.
>
>     That's fine.  You're not writing an A68 compiler.
> But it also means that you're equally almost completely
> unqualified to comment on any matter of A68.
>

So they wanted to be elitist; that's fine. They can keep their language!
But people can voice their opinions about it too.

I just selected some choice bits of syntax and adapted them for my own
purposes. However, thanks to the exchanges here, I have a mind to tweak
my syntax to break any remaining Algol68 connections.

I can already write 'end', 'endif' etc in place of 'fi' 'esac', 'od',
the keywords that tend to give it away.

My language, if I were to produce a reference to it, would definitely
not be gobbledygook; I'd see to that. But that's unlikely to happen
since, as certain people like to point out, it's one-person language;
the benefits would be minimal.

It would be like producing a guide to my own house.

Here however are some benchmarks for 'fannkuch(10)':

Secs Lang Types Exec

A68G 50 A68 Static Interp? Win or WSL
A68G -optimise -- A68 (WSL: Failed)
qq -asmopt 2.2 Q Dynamic Interp (mildly accelerated)
gcc -O3 0.22 C Static Native
mm -opt 0.22 M Static Native

My stuff doesn't do too badly speedwise does it?

I couldn't get a68g -optimise to even, even on WSL, a shame as that
50-second timing to interpret /static/ code is poor.

Charles Lindsey

unread,
Nov 29, 2021, 7:30:09 AM11/29/21
to
On 29/11/2021 00:53, Bart wrote:

>
> Here however are some benchmarks for 'fannkuch(10)':
>
>                    Secs  Lang  Types    Exec
>
>   A68G             50    A68   Static   Interp?   Win or WSL
>   A68G -optimise   --    A68                     (WSL: Failed)
>   qq -asmopt        2.2  Q     Dynamic  Interp   (mildly accelerated)
>   gcc -O3           0.22 C     Static   Native
>   mm -opt           0.22 M     Static   Native
>
> My stuff doesn't do too badly speedwise does it?
>
> I couldn't get a68g -optimise to even, even on WSL, a shame as that 50-second
> timing to interpret /static/ code is poor.

$ cat hello.c
#include <stdio.h>
int main() {printf("Hello World!");}
$ time gcc hello.c && ./a.out

real 0m0.038s
user 0m0.034s
sys 0m0.004s
Hello World!$

$ cat hello.a68
BEGIN print("Hello World!") END
$ time a68g hello.a68
Hello World!

real 0m0.016s
user 0m0.008s
sys 0m0.008s
$

That's Ubuntu on a modern Intel 68-bit hardware.

Bart

unread,
Nov 29, 2021, 8:33:48 AM11/29/21
to
I'm still stuck on 64 bits unfortunately.

The purpose of the 'hello world' test was to demonstrate the start-up
overheads of a big compiler like gcc.

So even on my SSD-based PC, that overhead will add 0.25 seconds to
building any program.

People like to demonstrate how starting gcc on Linux takes zero time; I
don't know why that is. Maybe large Windows PE/DLL files are slower to
process; maybe much of what is needed for gcc on Linux is already running.

However, I work on Windows, and that timing is real. My C compiler also
runs on Windows, and that doesn't have a problem:

C:\c>tm bcc hello
Compiling hello.c to hello.exe
TM: 0.03

(Note that running even an empty program on this OS will take 0.02
seconds anyway.)

Regarding the above benchmark, the A68 code for that is given below.

So that I can properly fill in my table, I'd be grateful if you could
run this on your machine both with and without -optimise, as that flag
doesn't work on my WSL. Then I can estimate the runtime on my machine.

(I assume it transpiles to C then runs that C. The errors are within the
A68 headers.)

-----------------------------------------------------------------------------------
INT res2;

PROC fannkuch=(INT n)INT: BEGIN
[n]INT p,q,s;
INT sign, maxflips, sum, q1, flips, qq, t, sx, i,j;

sign:=1;
maxflips:=0;
sum:=0;

FOR i TO n DO
p[i]:=i;
q[i]:=i;
s[i]:=i
OD;

DO
q1:=p[1];
IF q1/=1 THEN
FOR i FROM 2 TO n DO q[i]:=p[i] OD;
flips:=1;
DO
qq:=q[q1];
IF qq=1 THEN
sum+:=sign*flips;
IF flips>maxflips THEN
maxflips:=flips
FI;
exit1
FI;

q[q1]:=q1;
IF q1>=4 THEN
i:=2; j:=q1-1;
WHILE
t:=q[i]; q[i]:=q[j]; q[j]:=t;
i+:=1;
j-:=1;
i<j
DO SKIP OD
FI;
q1:=qq;
flips+:=1
OD;
exit1: SKIP
FI;

IF sign=1 THEN
t:=p[2]; p[2]:=p[1]; p[1]:=t;
sign:=-1
ELSE
t:=p[2]; p[2]:=p[3]; p[3]:=t;
sign:=1;
FOR i FROM 3 TO n DO
sx:=s[i];
IF sx/=1 THEN s[i]:=sx-1; exit2 FI;
IF i=n THEN
res2:=maxflips;
return
FI;
s[i]:=i;
t:=p[1];

FOR j TO i DO
p[j]:=p[j+1]
OD;

p[i+1]:=t
OD;
exit2: SKIP
FI
OD;
return:
sum
END;

INT n=10;
INT res;
res:=fannkuch(n);

print(("Fannkuch(",n,") = ",res," ", res2,newline))
-----------------------------------------------------------------------------------


Output should be:

Fannkuch( +10) = +73196 +38

Charles Lindsey

unread,
Nov 30, 2021, 3:45:47 PM11/30/21
to
$ time a68g --nooptimise fannkuch.a68
Fannkuch( +10) = +73196 +38

real 0m40.654s
user 0m40.644s
sys 0m0.005s
$
$
$ time a68g --optimise fannkuch.a68
Fannkuch( +10) = +73196 +38

real 0m13.150s
user 0m10.383s
sys 0m0.119s
$

Bart

unread,
Nov 30, 2021, 7:07:22 PM11/30/21
to
On 30/11/2021 20:45, Charles Lindsey wrote:
> On 29/11/2021 13:33, Bart wrote:
>> On 29/11/2021 12:30, Charles Lindsey wrote:
>>> On 29/11/2021 00:53, Bart wrote:
>>>
>>>>
>>>> Here however are some benchmarks for 'fannkuch(10)':
>>>>
>>>>                     Secs  Lang  Types    Exec
>>>>
>>>>    A68G             50    A68   Static   Interp?   Win or WSL
>>>>    A68G -optimise   --    A68                     (WSL: Failed)
>>>>    qq -asmopt        2.2  Q     Dynamic  Interp   (mildly accelerated)
>>>>    gcc -O3           0.22 C     Static   Native
>>>>    mm -opt           0.22 M     Static   Native

>> Regarding the above benchmark, the A68 code for that is given below.
>>
>> So that I can properly fill in my table, I'd be grateful if you could
>> run this on your machine both with and without -optimise, as that flag
>> doesn't work on my WSL. Then I can estimate the runtime on my machine.
>>
>> (I assume it transpiles to C then runs that C. The errors are within
>> the A68 headers.)

> $ time a68g --nooptimise fannkuch.a68
> Fannkuch(        +10) =      +73196         +38
>
> real    0m40.654s
> user    0m40.644s
> sys    0m0.005s
> $
> $
> $ time a68g --optimise fannkuch.a68
> Fannkuch(        +10) =      +73196         +38
>
> real    0m13.150s
> user    0m10.383s
> sys    0m0.119s
> $
>

Thanks. My table is now:

Secs Lang Types Code

A68G 50 A68 Static Interp? (WSL; Win=52)
A68G -optimise 16 A68 Static ? (Estimated)
CPython 12.3 Python Dynamic (WSL; Win=19)
qq -asmopt 2.2 Q Dynamic Interp
PyPy 0.65 Python Dynamic JIT
LuaJIT 0.45 Lua Dynamic JIT
gcc -O3 0.22 C Static Native
mm -opt 0.22 M Static Native

The A68G timing suggests it is not generating C code for directly
executing the program, but just generating C code that does the
intepretation more efficently.

That's OK, it just means A68G is not a compiler, but it is still
somewhat sluggish given that the language is statically typed.

I've added timings for CPython plus two JIT programs. Although these do
very well, JIT timings tend to be misleading. They're good at small
benchmarks and but not so good at real programs.

anti...@math.uni.wroc.pl

unread,
Dec 4, 2021, 4:32:55 AM12/4/21
to
Bart <b...@freeuk.com> wrote:
> On 22/11/2021 20:34, Andy Walker wrote:
> > On 21/11/2021 19:45, Bart wrote:
> >> On 21/11/2021 18:55, James Harris wrote:
> >>> In sum, perhaps it would be better to build common operators into
> >>> type libraries and not make them part of the language itself.
> >
> > ????I agree with Charles!
> >
> >> You need a more complicated implementation, and one that implements
> >> more advanced optimisations, which are going to be slower, to get
> >> code of the same standard as you can get easily when things are
> >> built-in.
> >
> > ????Doesn't follow.? The library can have access to parts of the
> > language that are not available to ordinary users, inc [eg] escapes
> > into assembler, inlining, access to the compiler's internal structures,
> > ..., and can be optimised when the compiler is built.
>
> This is my point: you need:
>
> * Inline assemby
> * Inlining
> * User-defined operator overloads
> * The equivalent of C++'s constexpr
> * A gcc-class optimiser
> ...

I use a language like this. You do need inlining and it is useful
only due to overloading (_not_ limited to operators). However,
other position on your list above are not needed. More precisely,
there is small core language which implements basic types and
corresponding operators. Basic library types simply ofer "view"
of basic types and operators from core language. Due to inlining
there is no runtime loss of efficiency compared to using core
language directly. But there are also fancy library types
which are infeasible to have as part of core language.

> Too many advanced compilers features, just to keep the language 'small'.

You did not get it: purpose is to have managable implementation
of _very large_ language. Low-level error prone parts are
small. Library types can be written by ordinary application
programmers.

Overloading can be implemented with quite small extra code.
Inlining is also easy to implement if you have internal
representation of larger code fragments (as opposed to
generating machine code directly after recognizing given
syntactic construct). Both put limits on maximal possible
compiler speed: with overloading you have more complex
symbol table and need more lookups. Intermediate representation
takes time to build (on top of time needed to generate code).
Also, inlining allows large object code from small sources,
so if you measure time per source line compier speed will
likely be worse than speed of compiler that forces you to
write each source line separately. Still, compiler can
be reasonably fast.

Anyway, increase in compiler complexity is relatively
small. Even for small languages in total balance
approach using overloading may give you saving in
effort spent on implementation. Just as a simple
example let me mention Pascal. In Pascal there is
a buch of "builtin" operations (mostly on files).
Most of them are too complex to do by inline code
so practical compilers have runtime library implementing
them. But compiler still needs to parse those
"builtins" and convert them to runtime calls.
In a sense this is trivial task, but with realistic
compiler structure you need several lines of compiler
code to generate corresponding runtime call. Bunch
of older languages like Cobol, Fortran, PL/I had
special "builtin" operations, some of them much
more than Pascal.

Let me add that there are more radical approaches:
there are languages in which "users" can redefine
scanner and/or parser. So users can add on the
fly new operators and define their meaning. As
a possible example consider you 'stringimage'
extention: in extensible language user simply
declares that 'stringimage' is a new keyword that
should invoke its handler. Handler looks at
following program text, extracts filename,
reads the file and tells compiler to use resulting
string as a string constant. The whole thing is
probably 10 lines for handler and a single
line to register new keyword. In language that
allows syntax extentions instead of overloading
you can just define new operators, so overloading
strictly speaking is not necessary. Similarly
you have some way for syntax handlers to generate
code, which is similar to inlining, but stricly
speaking inlining is not needed.

--
Waldek Hebisch

anti...@math.uni.wroc.pl

unread,
Dec 4, 2021, 6:16:23 AM12/4/21
to
Andy Walker <a...@cuboid.co.uk> wrote:
> work. AFAIK, no language I have ever used expects "A[i;j;k]" to
> be an array element or "F(a;b;c)" to be the call of a function
> with three parameters.

In a language that I use:

(28) -> sin(1;2;3.0)

(28) 0.1411200080_598672221
Type: Float
(29) -> v := new(1, 1)$Vector(Integer)

(29) [1]
Type: Vector(Integer)
(30) -> v(3;2;1)

(30) 1
Type: PositiveInteger

In the first case 3 expressions are computed, but only last one
provided value of resulting compound expression, which in turn
is used as argument to 'sin'. Similarly 'v(3;2;1)' uses last
value as array index.

> Far from clear. Simpler to describe? People mocked the 236
> pages of the RR [most of it comments, examples or the system
> library] when it first came out -- until they found that proper
> descriptions of other languages were at least as long, and
> nowhere near as precise.

I may have already mentioned it: Extended Pascal standard
(ISO 10206) has 230 pages (so almost as long). It describes
rather large language. I found description to be rather
precise: it uses _slightly_ less formal style than Algol
report, but terms are given precise definitions. A little
sample:

: A module A shall be designated as supplying a module B if A
: supplies the module-heading or moduleblock of B. A module A
: shall be designated as supplying a main-program-block if the
: module supplies the block of the main-program-block. A module
: A shall be designated as supplying a module-heading, module-block,
: or block, B, either if B contains an applied occurrence of an
: interface-identifier having a defining occurrence contained by
: the module-heading of A, or if A supplies a module that supplies
: B.
:
: No module shall supply its module-heading.

This may be not very friendly to lay persons, in more plain
language it says that while cyclic imports of module bodies
are allowed, cyclic imports of interface parts ('module-heading'
in standarese) are not allowed. To understand it you need
to read introductory part which explains that semantics is
defined on parse tree and grammar rules have labels allowing
unambigous reference to parts of parse tree. In particular
'module-heading' is part of module defined by syntax rules.
Similarly 'applied occurrence' and 'defining occurrence' are
precisly defined earlier.

This is 22 years after Algol 68 report, but at least in
presentation aspect they did _much_ better work than
Algol group. In particular, words and terms they use
actually make sense, so without firm grasp of the whole
standard you can get resonable approximation to meaning
just taking meaning of words from everyday programming
practice. This allows you to read definitions and refine
your understanding of meaning. After few iterations
you converge and get the whole thing. Not so with
Algol 68...

--
Waldek Hebisch

Bart

unread,
Dec 4, 2021, 6:51:57 AM12/4/21
to
On 04/12/2021 09:32, anti...@math.uni.wroc.pl wrote:
> Bart <b...@freeuk.com> wrote:

>> * Inline assemby
>> * Inlining
>> * User-defined operator overloads
>> * The equivalent of C++'s constexpr
>> * A gcc-class optimiser
>> ...
>
> I use a language like this. You do need inlining and it is useful
> only due to overloading (_not_ limited to operators). However,
> other position on your list above are not needed. More precisely,
> there is small core language which implements basic types and
> corresponding operators. Basic library types simply ofer "view"
> of basic types and operators from core language. Due to inlining
> there is no runtime loss of efficiency compared to using core
> language directly. But there are also fancy library types
> which are infeasible to have as part of core language.
>
>> Too many advanced compilers features, just to keep the language 'small'.
>
> You did not get it: purpose is to have managable implementation
> of _very large_ language.

Source code for gcc was 45,000 source files last I looked. I don't even
know if that's just for C, or for C++ too.

That's a thousand times a larger scale than what I do. So I guess I'm in
less need of such methods.

> Low-level error prone parts are
> small. Library types can be written by ordinary application
> programmers.
>
> Overloading can be implemented with quite small extra code.
> Inlining is also easy to implement if you have internal
> representation of larger code fragments (as opposed to
> generating machine code directly after recognizing given
> syntactic construct). Both put limits on maximal possible
> compiler speed: with overloading you have more complex
> symbol table and need more lookups. Intermediate representation
> takes time to build (on top of time needed to generate code).
> Also, inlining allows large object code from small sources,
> so if you measure time per source line compier speed will
> likely be worse than speed of compiler that forces you to
> write each source line separately. Still, compiler can
> be reasonably fast.
>
> Anyway, increase in compiler complexity is relatively
> small. Even for small languages in total balance
> approach using overloading may give you saving in
> effort spent on implementation.

I could do overloading. I have an approach I've developed, I just
haven't got round to it (there are a dozen more things more useful ahead
of it, which I haven't done yet either!).

But, this purely user-defined overloading of existing operators, with
user-defined types, or applying extra operations to standard types.

It is not for implementing the core language, which I like to handle by
dedicated code in the compiler, keeping it tight, fast and allowing such
enhancements as above to be optional (and on a waiting list).

For example, there is already a backend IL instruction, MUL.I64, to do
signed multiply. The main part, MUL, is attached to the binary * op
during parsing. The .I64 part is assigned two passes further on.

With only mechanisms with user-defined overloads, it's a lot harder to
make those associations. For one thing, those IL instructions are not
exposed in the language.

> Just as a simple
> example let me mention Pascal. In Pascal there is
> a buch of "builtin" operations (mostly on files).
> Most of them are too complex to do by inline code
> so practical compilers have runtime library implementing
> them. But compiler still needs to parse those
> "builtins" and convert them to runtime calls.
> In a sense this is trivial task, but with realistic
> compiler structure you need several lines of compiler
> code to generate corresponding runtime call.

I have a similar set of operations for I/O. They are implemented as a
set of function calls, designed to be called implicitly from code, so
have funny names. So:

print a,b

is equivalent to:

m$print_startcon()
m$print_i64_nf(a) # when a,b are i64
m$print_i64_nf(b)
m$print_end

(The bracketing between start/end calls allows separator logic within
the whole print statement; and allows calls to functions within the
print list that might themselves use print, to files etc.)

So, print is already implemented in user code. I just do not see the
advantage of replacing 110 or 50 lines of code within the compiler's
code generator for print, with a likely more complicated solution.

(110 lines of code for static language which needs to generate different
calls for various types; 50 lines for dynamic language.)

> Bunch
> of older languages like Cobol, Fortran, PL/I had
> special "builtin" operations, some of them much
> more than Pascal.
>
> Let me add that there are more radical approaches:
> there are languages in which "users" can redefine
> scanner and/or parser. So users can add on the
> fly new operators and define their meaning. As
> a possible example consider you 'stringimage'
> extention:

Do you mean my 'strinclude' which incorporates a text file (or
text/binary file in dynamic code), as string data within the program?

> in extensible language user simply
> declares that 'stringimage' is a new keyword that
> should invoke its handler.

That itself is not so easy. Keywords are built-in, scopeless, and known
program-wide. With a user-define keyword, could two modules define that
keyword differently? Could another part of the program use that same
name as an ordinary variable?

If so then it's not known as a new keyword after the next pass after
parsing. So it will need to have a suitable 'shape' to be parsed as
/something/. (My language also has out-of-order definitions.)


Handler looks at
> following program text, extracts filename,
> reads the file and tells compiler to use resulting
> string as a string constant.

How? String literals are detected fairly early, they have a certain AST
node, length etc. At what pass will this be done?

> The whole thing is
> probably 10 lines for handler and a single
> line to register new keyword.

In language that
> allows syntax extentions instead of overloading
> you can just define new operators, so overloading
> strictly speaking is not necessary. Similarly
> you have some way for syntax handlers to generate
> code, which is similar to inlining, but stricly
> speaking inlining is not needed.

Do you have an example of such a language where implementing my
'strimport' is 10 lines of code? For example, C++. Or the one you
mentioned. If so, how does it look?

In my static compiler, 'strinclude' needs a handful of lines to define
enums and names, this these two lines in the parser:

when kstrincludesym then
lex()
p:=createunit1(j_strinclude,readterm2())

Followed by this code later on to create a suitable string literal:

proc tx_strinclude(unit p,a)=
int fileno

tpass(a)
if a.tag<>j_const or not a.isastring then
txerror("strincl/not string")
fi

fileno := moduletable[p.moduleno].fileno
fileno := getfile(a.svalue, path:sourcefilepaths[fileno],
issupport:1)

a.svalue := sourcefiletext[fileno]
a.slength := sourcefilesizes[fileno]

deleteunit(p,a) # replace strincl node with new string
end

The fiddly bit is deciding where to look for that file (unless an
absolute path is given, it will be relative to this module).

The file involved counts as a support file, which means that when an
application is combined into an amalgamated file, such files are
included. (But need to be marked because sometimes, a support file will
have the same name as an actual source module.)

Also, when reading from an amalgamated file, it will look inside that
for the support file.

I think that 20 lines of code for 'strinclude' inside a compiler is
reasonable.

Andy Walker

unread,
Dec 4, 2021, 4:38:58 PM12/4/21
to
On 01/12/2021 00:07, Bart wrote:
[various benchmark timings:]
> The A68G timing suggests it is not generating C code for directly
> executing the program, but just generating C code that does the
> intepretation more efficently.

AIUI, A68G doesn't "generate C code" at all, unless you go
down the "--optimise" route [in which case you might do better with
an actual compiler, such as "algol68toc" or Charles's "a68s"], but
builds a tree and then "walks" it, doing a /lot/ of checking as it
goes.

Anyway, this got me looking at the supplied "fannkuch"
program, which looks as though it's imported from some other
language, as the Algol is very unidiomatic. Simply tidying
the code speeds it up by around 10%. I saved a further ~15%
by replacing "sign" by a Boolean, replacing explicit array
elements by declaring and using "ref int"s to point to them
and similar, copying arrays in toto rather than element by
element, and so on. That got the time down from just under
40s to just under 30s. I didn't touch the actual algorithm.
Testing also uncovered a buglet; the [original] code fails
if "n < 3"; easy enough to correct, but Algol makes it easier
to spot in the first place.

This got me testing different versions of A68G, as
some of them have the garbage-collection bug that causes
repeated array copying to fill up store, necessitating a
"sweep heap" from time to time. The results weren't quite
what I expected. Fastest was not the current A68G [2.8.4],
but 1.16 [the most recent Mk 1 on my computer, April 2009],
which clocked 21s. Other versions of Mk 1 are mostly a
little slower than 2.8.4, though a couple are a little faster.
Slowest of all was 2.0.3, which weighed in at 64s [Bart's
original at 88s]. The timings seem to bear little relation
to the changelog, so I have no explanation for either the
factor 3 in speed between 1.16 and 2.0.3, nor the factor 2
improvement since then. We'd have to ask Marcel!

I haven't noticed such changes in my own programs,
and suspect that the "instruction mix" in "fannkuch" is
completely different from the norm [eg, only a couple of
procedure calls, almost no actual calculation yet tens of
millions of array accesses, ...]. If I can work up the
enthusiasm [unlikely!], I might try running the Whetstone
benchmark through the different versions.

--
Andy Walker, Nottingham.
Andy's music pages: www.cuboid.me.uk/andy/Music
Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Heller

Bart

unread,
Dec 5, 2021, 7:22:07 AM12/5/21
to
On 04/12/2021 21:38, Andy Walker wrote:
> On 01/12/2021 00:07, Bart wrote:
> [various benchmark timings:]
>> The A68G timing suggests it is not generating C code for directly
>> executing the program, but just generating C code that does the
>> intepretation more efficently.
>
>     AIUI, A68G doesn't "generate C code" at all, unless you go
> down the "--optimise" route

That's what --optimise does, but I assumed it was generating direct C code.


> [in which case you might do better with
> an actual compiler, such as "algol68toc"

I tried to run that but it's a 16-bit executable that no long runs under
Win64.

or Charles's "a68s"], but
> builds a tree and then "walks" it, doing a /lot/ of checking as it
> goes.
>
>     Anyway, this got me looking at the supplied "fannkuch"
> program, which looks as though it's imported from some other
> language, as the Algol is very unidiomatic.  Simply tidying
> the code speeds it up by around 10%.  I saved a further ~15%
> by replacing "sign" by a Boolean, replacing explicit array
> elements by declaring and using "ref int"s to point to them
> and similar, copying arrays in toto rather than element by
> element, and so on.  That got the time down from just under
> 40s to just under 30s.  I didn't touch the actual algorithm.

All my fannkuch versions are derived from an original in Lua.

I'm using it as a benchmark for testing compilers (with up to 10,000
copies in one file).

Partly for that reason, they need to comprise largely the same code.

The actual performance is less important, but it is useful for comparing
the results of trade-offs between compilers and compile options. So
again, they need to execute the same algorithm.

Andy Walker

unread,
Dec 7, 2021, 10:52:32 AM12/7/21
to
On 05/12/2021 12:22, Bart wrote:
[I wrote:]
>> ... [in which case you might do better with
>> an actual compiler, such as "algol68toc"
> I tried to run that but it's a 16-bit executable that no long runs
> under Win64.

Ah, unlucky. If you're that interested, perhaps you need
to download the source and build a 64-bit executable.

[...]
> All my fannkuch versions are derived from an original in Lua.
> I'm using it as a benchmark for testing compilers [...].
> Partly for that reason, they need to comprise largely the same code.

Well, that's a rather grey area. Is "a := b" largely
the same code as "for i to n do a[i] := b[i] od"? The former
is simply not possible in many languages, but it [or a standard
"copyarray" procedure] is likely to be implemented significantly
more efficiently than the loop [without sufficiently aggressive
optimisation]. Similarly, for the use of a built-in Boolean
type vs an integer to be tested for sign. Similarly, when there
are issues of "return;" vs "run off the end of the procedure"
or of "loop and a half" constructs. As noted in my PP, simply
tidying the "fannkuch" code into idiomatic Algol saved around
25% of the time [without changing the underlying algorithm].
No doubt if you started with some Algol and translated that
directly into C or Lua, you would find that you could save a
similar amount by switching some of the resulting ugly C/Lua
into more idiomatic code.

[Similarly, and relevant to a nearby thread, the
difference between 0-based and 1-based arrays is more than
is often supposed. It affects which comparison operator you
use, and whether you pre- or post-increment or decrement, and
then you find yourself pulling the code around to suit.]

--
Andy Walker, Nottingham.
Andy's music pages: www.cuboid.me.uk/andy/Music
Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Bull

David Brown

unread,
Dec 7, 2021, 12:41:18 PM12/7/21
to
On 07/12/2021 16:52, Andy Walker wrote:
> On 05/12/2021 12:22, Bart wrote:
> [I wrote:]
>>> ... [in which case you might do better with
>>> an actual compiler, such as "algol68toc"
>> I tried to run that but it's a 16-bit executable that no long runs
>> under Win64.
>
>     Ah, unlucky.  If you're that interested, perhaps you need
> to download the source and build a 64-bit executable.
>

You could also try Wine on Linux if it is a Windows 3.x program, or
DOSBox if it is a DOS program.

You could even use VirtualBox and start a FreeDOS virtual machine, and
enjoy the nostalgia :-)

(I'm not guaranteeing these will work, but they might be worth a shot.)


Dmitry A. Kazakov

unread,
Dec 7, 2021, 2:39:17 PM12/7/21
to
On 2021-12-07 18:41, David Brown wrote:

> You could even use VirtualBox and start a FreeDOS virtual machine, and
> enjoy the nostalgia :-)

I managed to run Windows 98 in VirtualBox. It requires special drivers
for the display, though.

Firefox 2.0.0.20 is such a fun. Comparing to the latest versions it is
blindingly fast. OK, it does not show most of the pages, but should I
visit them anyway? (:-))

Bart

unread,
Dec 7, 2021, 7:01:26 PM12/7/21
to
On 07/12/2021 15:52, Andy Walker wrote:
> On 05/12/2021 12:22, Bart wrote:
> [I wrote:]
>>> ... [in which case you might do better with
>>> an actual compiler, such as "algol68toc"
>> I tried to run that but it's a 16-bit executable that no long runs
>> under Win64.
>
>     Ah, unlucky.  If you're that interested, perhaps you need
> to download the source and build a 64-bit executable.
>
> [...]
>> All my fannkuch versions are derived from an original in Lua.
>> I'm using it as a benchmark for testing compilers [...].
>> Partly for that reason, they need to comprise largely the same code.
>
>     Well, that's a rather grey area.  Is "a := b" largely
> the same code as "for i to n do a[i] := b[i] od"?

I would allow that, if the copying is by value (making a distinct copy
not sharing), which is what the element-by-element does, except there is
no array assignment in the program, only of a slice, which is little
more advanced.

The original benchmark site allowed any mix of algorithms provided they
still gave the same results, although blatant cheating (just printing a
string with the expected results) would be forbidden.

So if you wanted to compare implementations, you were out of luck.

(A reminder that my own tests were to do with compilation, which needed
a substantial source file.)

>  The former
> is simply not possible in many languages, but it [or a standard
> "copyarray" procedure] is likely to be implemented significantly
> more efficiently than the loop [without sufficiently aggressive
> optimisation].  Similarly, for the use of a built-in Boolean
> type vs an integer to be tested for sign.  Similarly, when there
> are issues of "return;" vs "run off the end of the procedure"
> or of "loop and a half" constructs.  As noted in my PP, simply
> tidying the "fannkuch" code into idiomatic Algol saved around
> 25% of the time [without changing the underlying algorithm].
> No doubt if you started with some Algol and translated that
> directly into C or Lua, you would find that you could save a
> similar amount by switching some of the resulting ugly C/Lua
> into more idiomatic code.

Yes, these are useful tweaks, but there's such a huge gap between A68G
and the next language, that the basic problem still exists: it is slow!

The only comparable language I have installed is Seed7, also statically
typed, also usually interpreted. That one takes 44 seconds for this
task, compared with 52 seconds for A68G (on Windows; it's a bit faster
on WSL).

However Seed7 also has a transpiler to C, one that works and works
transparently to produce a .exe file. Then the runtime reduces to 0.7
seconds, which is more like it.

(Note that the fastest implementation is still 3 times faster than this.)

A68 really needs someone to produce an equivalent transpiler, an update
of that algol68toc program.

(If I liked it more, and understood it a LOT more, then I'd have a go
myself.)

>     [Similarly, and relevant to a nearby thread, the
> difference between 0-based and 1-based arrays is more than
> is often supposed.  It affects which comparison operator you
> use, and whether you pre- or post-increment or decrement, and
> then you find yourself pulling the code around to suit.]

Yes, the original algorithm was 1-based, and all versions I made are
1-based even when the language was 0-based (by allocating an extra
element and ignoring element 0).

I would expect an optimising compiler to iron out such minor differences.


Bart

unread,
Dec 8, 2021, 4:29:52 PM12/8/21
to
On 24/11/2021 18:19, Bart wrote:
> On 24/11/2021 00:04, Andy Walker wrote:

>>      If you're going to have these things, you need them anyway.
>> There is no clear-cut boundary between what compilers do and what
>> goes into a standard library.  It doesn't matter whether it's the
>> compiler that decides that "A + B" turns into "load A, load B, add"
>> or the library that decides that.
>
> It matters very much. My static language is self-hosted, and can build
> itself entirely from source fast enough that I could do so ten times
> every second if I was so inclined.
>
> To me that is important; not that I need to do that 600 times a minute,
> but it means an effectively zero edit-build-run cycle for similar-sized
> applications.
>
> That would be harder to achieve using the wrong language choices; it
> would also take longer, and the result would build apps more slowly.

A discussion on Reddit happened to mention that someone's 20Kloc C++
language project shouldn't take more than '5 minutes' on a modern
machine using -O3.

Why so slow? Apparently it was all to do with templates.

Someone else mentioned a 25Kloc compiler taking 5 minutes, with 8
cores/16 threads.

And now someone had a 100Kloc project using a Sun C++ compiler than took
*16 hours* to build.

Effective compile-speeds of 65lps, 80 lps (with 8 cores!) and ... 1.7
lines per second.

When I said the wrong language design choices could slow things down, I
didn't even know it could be by that much.

I think I'm happy with my own choices at the minute (a 40Kloc project
takes 0.1 seconds on one core).

Andy Walker

unread,
Dec 8, 2021, 4:40:39 PM12/8/21
to
On 08/12/2021 00:01, Bart wrote:
> Yes, these are useful tweaks, but there's such a huge gap between
> A68G and the next language, that the basic problem still exists: it
> is slow!

Yes, it's an interpreter. But not as slow as you claim,
even for this very artificial test, as long as the code is written
[or transcribed from Lua/C/whatever] by someone familiar with
Algol [without changing the underlying algorithm, before you ask].
In other tests, it benchmarks on my [ancient] PC at around 200x
faster than the ICL 1906A [comparable with the fastest mainframes
around in the mid-'70s] by the Whetstone benchmark -- a decent
approximation to useful scientific computing -- even without the
optimiser. That's plenty fast enough for all the real computing
I do, where development time is much more important than the few
extra seconds needed for live runs [esp interactive ones, where
the real bottleneck is my typing]. I'm not [these days -- I used
to do a lot of benchmarking in the last century!] as obsessed by
speed as some people.

Meanwhile, my I suggest a little test for you? Instead
of "fannkuch(10)", please get your collection of programs and
compilers to produce "fannkuch(2)". Any compilers that return
"2, 2" [without comment and from code equivalent to what you
supplied] should be kept well away from any serious computation.

[...]
> A68 really needs someone to produce an equivalent transpiler, an
> update of that algol68toc program.
> (If I liked it more, and understood it a LOT more, then I'd have a go
> myself.)

Both A68G and "algol68toc" are available as source, so the
hard work is already done. But if you don't understand them, you're
not well placed either to edit that source or to complain about the
[perceived] deficiencies thereof.

[...]
> Yes, the original algorithm was 1-based, and all versions I made are
> 1-based even when the language was 0-based (by allocating an extra
> element and ignoring element 0).
> I would expect an optimising compiler to iron out such minor
> differences.

I suspect you're being optimistic if you expect even a good
optimiser to detect that [in a reasonably large and complicated
program] element 0 is never accessed, or systemically to reduce all
indexes by 1. But you never know. Small differences in code
accumulate as dozens of programming decisions are taken on the
basis of where arrays start and finish.

--
Andy Walker, Nottingham.
Andy's music pages: www.cuboid.me.uk/andy/Music
Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Handel

Bart

unread,
Dec 8, 2021, 5:49:29 PM12/8/21
to
On 08/12/2021 21:40, Andy Walker wrote:
> On 08/12/2021 00:01, Bart wrote:
>> Yes, these are useful tweaks, but there's such a huge gap between
>> A68G and the next language, that the basic problem still exists: it
>> is slow!

>     Meanwhile, my I suggest a little test for you?  Instead
> of "fannkuch(10)", please get your collection of programs and
> compilers to produce "fannkuch(2)".  Any compilers that return
> "2, 2" [without comment and from code equivalent to what you
> supplied] should be kept well away from any serious computation.

What should it produce for fannkuch(2)?

I downloaded versions in 4 languages from the original benchmarks site.

Three of them (Lua, Python, Julia) go wrong in giving a program error.

The fourth, in C, completes, but fannkuch(2) returns -1. I don't know if
that is correct or not, but it sounds like it might be undefined.

The presence of these lines:

swap(p[2],p[3])
signx:=1
for i:=3 to n do

plus perm1[2] in the 0-based Python, and p[3] in Julia, suggests n needs
to be at least 3.

>> I would expect an optimising compiler to iron out such minor
>> differences.
>
>     I suspect you're being optimistic if you expect even a good
> optimiser to detect that [in a reasonably large and complicated
> program] element 0 is never accessed, or systemically to reduce all
> indexes by 1.  But you never know.

Element 0 can stay there; it's not doing any harm with just 3 arrays.

Being 1-based may simply mean an extra offset on some address modes; is
that what you think is causing a slow-down? With local arrays, that
offset should just be absorbed into the stack offset of the array.

Dmitry A. Kazakov

unread,
Dec 9, 2021, 2:46:50 AM12/9/21
to
On 2021-12-08 22:29, Bart wrote:

> Why so slow? Apparently it was all to do with templates.

Likely, GCC is very poor with them. But your language has nothing to
express polymorphism in either form. So are not in position to criticize.

> When I said the wrong language design choices could slow things down, I
> didn't even know it could be by that much.

Yes, a modern medium-sized project is extremely large. In a language
without polymorphism it would take years to design and the code would be
tenfold bigger and complex.

> I think I'm happy with my own choices at the minute (a 40Kloc project
> takes 0.1 seconds on one core).

It is a false equivalence. You simply cannot write equivalent code of
same size.

David Brown

unread,
Dec 9, 2021, 3:28:20 AM12/9/21
to
On 09/12/2021 08:46, Dmitry A. Kazakov wrote:
> On 2021-12-08 22:29, Bart wrote:
>
>> Why so slow? Apparently it was all to do with templates.
>
> Likely, GCC is very poor with them. But your language has nothing to
> express polymorphism in either form. So are not in position to criticize.
>

In general, gcc is fast at compiling templates. But there is always
scope for improvement, and gcc has definitely been getting better there
in recent versions. (More importantly, it has also been getting better
at error messages from heavily templated code.)

But with templates, there is no limit to the complexity you can have.
Indeed, templates can be seen as a Turing-complete programming language
at compile-time. (Individual compilers, of course, have limits to
tempate recursion depth and the like, usually as tunable parameters for
people doing weird stuff.)

Improvements and enhancements to C++ have been added to combat this,
including constexpr and consteval functions to get compile-time
evaluation without inefficient complicated recursive template
meta-programming, and concepts to make requirements clearer, error
messages more useful and template compilation faster. Then there are
modules which can hugely improve compilation of large template libraries.

None of the new features limits the use of templates, however - they
just offer alternatives. So it is quite easy to make code that might
/look/ like 20 Kloc, but is effectively compiling millions of lines.
(To get the same results, the alternative in statically compiled
languages that don't have templates would, of course, be macros,
pre-processors, generators, or manual coding also resulting in millions
of lines.)

Then there is the issue of optimisation. Asking for -O3 is telling the
compiler to try /really/ hard to look for optimisation opportunities,
and not to worry about the time or ram usage in doing so.
Inter-procedural optimisations scale approximately quadratically with
the number of functions in a compilation unit. Some others scale even
worse in the size of functions, and use of templates can allow a lot of
inlining and therefore very big functions. (There are legitimate
use-cases for huge functions, and work in progress in gcc for handling
them better.)

All these things can add up to faster results, but obviously there are
diminishing returns for the effort. It is rare that it makes sense to
use -O3 rather than -O2. Think of it like using liquid nitrogen to cool
your cpu rather than water - is the extra 10% overclocking really worth
it? Most people would usually only do a massively templated -O3 compile
for testing, benchmarking or bragging rights, not because they tried -O2
and measured that the resulting binaries are too slow for their needs.


>> When I said the wrong language design choices could slow things down,
>> I didn't even know it could be by that much.
>
> Yes, a modern medium-sized project is extremely large. In a language
> without polymorphism it would take years to design and the code would be
> tenfold bigger and complex.

Exactly. Developer time is more important than computer time.

>
>> I think I'm happy with my own choices at the minute (a 40Kloc project
>> takes 0.1 seconds on one core).
>
> It is a false equivalence. You simply cannot write equivalent code of
> same size.
>

Indeed.

You could also look at these example and be impressed with how much
functionality you can get from a mere 20 Kloc of code with C++, where
the same useful work in C or BOL (Bart's Own Language) might take 50
times as much.

Statistically speaking, based on real studies, error rates in software
are proportional to the number of lines of code, and relatively
independent of the language used. A language that lets you write fewer
lines of code because you can make more general code and libraries or
re-use more existing (tested) code for other sources, will let you write
code with fewer errors - all other things being equal.

(This is, of course, a big reason why languages like Python are popular.)

Bart

unread,
Dec 9, 2021, 5:50:11 AM12/9/21
to
On 09/12/2021 07:46, Dmitry A. Kazakov wrote:
> On 2021-12-08 22:29, Bart wrote:
>
>> Why so slow? Apparently it was all to do with templates.
>
> Likely, GCC is very poor with them. But your language has nothing to
> express polymorphism in either form. So are not in position to criticize.

I expect my dynamic language can do polymorphism. That would would
between 1 and 2 magnitudes slower, but it would still be 1-2 magnitudes
faster than that C++ compiler!

But is it really that necessary inside a compiler?


>> When I said the wrong language design choices could slow things down,
>> I didn't even know it could be by that much.
>
> Yes, a modern medium-sized project is extremely large. In a language
> without polymorphism it would take years to design and the code would be
> tenfold bigger and complex.

If I did such a project (say 100-1000Kloc) I would probably use dynamic
scripting code for most of it.

Since most code is not speed-critical. Especially when you just want a
test run.

>> I think I'm happy with my own choices at the minute (a 40Kloc project
>> takes 0.1 seconds on one core).
>
> It is a false equivalence. You simply cannot write equivalent code of
> same size.

We don't know what are the capabilities of the language in question for
which that compiler, written in C++, took so long to build.

You need ask a different question: do you really expect a compiler for
any kind of language, which is only 20-25,000 lines of user-code, to
take MINUTES to build on today's machines? (And on multiple cores!)
Somebody is doing something wrong.

Yes, the C++ compiler could also be more inefficient than need be, but
people can only use the C++ compilers that exist.

Bart

unread,
Dec 9, 2021, 6:01:19 AM12/9/21
to
On 09/12/2021 08:28, David Brown wrote:
> On 09/12/2021 08:46, Dmitry A. Kazakov wrote:

>> Yes, a modern medium-sized project is extremely large. In a language
>> without polymorphism it would take years to design and the code would be
>> tenfold bigger and complex.
>
> Exactly. Developer time is more important than computer time.

Exactly. That is why having developers twiddle their thumbs for five
minutes every time they compile is a joke.

>> It is a false equivalence. You simply cannot write equivalent code of
>> same size.
>>
>
> Indeed.
>
> You could also look at these example and be impressed with how much
> functionality you can get from a mere 20 Kloc of code with C++, where
> the same useful work in C or BOL (Bart's Own Language) might take 50
> times as much.

> (This is, of course, a big reason why languages like Python are popular.)

Scripting languages have all the benefits of C++ and then some. Except
for execution speed.

But the C++ in these examples is so slow to build, that a scripting
language is likely to be faster.




Dmitry A. Kazakov

unread,
Dec 9, 2021, 7:00:08 AM12/9/21
to
On 2021-12-09 11:50, Bart wrote:
> On 09/12/2021 07:46, Dmitry A. Kazakov wrote:
>> On 2021-12-08 22:29, Bart wrote:
>>
>>> Why so slow? Apparently it was all to do with templates.
>>
>> Likely, GCC is very poor with them. But your language has nothing to
>> express polymorphism in either form. So are not in position to criticize.
>
> I expect my dynamic language can do polymorphism.

I expect it does not, because...

> But is it really that necessary inside a compiler?

you obviously do not understand it.

> If I did such a project (say 100-1000Kloc) I would probably use dynamic
> scripting code for most of it.

Again a failure to see actual challenges of medium to large software
collaboration projects.

> Since most code is not speed-critical. Especially when you just want a
> test run.

Yep, a test run of a production line automation system, a banking
system, a luggage delivery system written in BOL, would love to see it,
from a safe distance of course...

> You need ask a different question: do you really expect a compiler for
> any kind of language, which is only 20-25,000 lines of user-code, to
> take MINUTES to build on today's machines? (And on multiple cores!)
> Somebody is doing something wrong.

Yes, lack of experience/education on modern language-assisted software
design.

> Yes, the C++ compiler could also be more inefficient than need be, but
> people can only use the C++ compilers that exist.

C++ has a lot of problems, but not lack of understanding and
anticipating software developing issues. At least they tried and have
achieved far more than many.

On your part we see an active denial of very existence of such problems.
Things you see important are just silly from the point of view of a
practicing software developer. Things you reject are key tools for
managing complexity of modern software.

David Brown

unread,
Dec 9, 2021, 8:16:10 AM12/9/21
to
On 09/12/2021 12:01, Bart wrote:
> On 09/12/2021 08:28, David Brown wrote:
>> On 09/12/2021 08:46, Dmitry A. Kazakov wrote:
>
>>> Yes, a modern medium-sized project is extremely large. In a language
>>> without polymorphism it would take years to design and the code would be
>>> tenfold bigger and complex.
>>
>> Exactly.  Developer time is more important than computer time.
>
> Exactly. That is why having developers twiddle their thumbs for five
> minutes every time they compile is a joke.

When the alternative is spending weeks or months writing the code by
hand rather than using templates?

>
>>> It is a false equivalence. You simply cannot write equivalent code of
>>> same size.
>>>
>>
>> Indeed.
>>
>> You could also look at these example and be impressed with how much
>> functionality you can get from a mere 20 Kloc of code with C++, where
>> the same useful work in C or BOL (Bart's Own Language) might take 50
>> times as much.
>
>> (This is, of course, a big reason why languages like Python are popular.)
>
> Scripting languages have all the benefits of C++ and then some. Except
> for execution speed.
>

Spoken like someone who does not know C++, or have any concept of
scripting languages. Or someone who /does/ know, but is trolling.

There are different languages with different balances of features and
emphasises, and that's good. But you'd have to be completely ignorant
to think that there are any scripting languages with /all/ the benefits
of C++ (or to think that C++ has /all/ the benefits of all scripting
languages).

> But the C++ in these examples is so slow to build, that a scripting
> language is likely to be faster.
>

That would seem highly unlikely. Even if such a compile time was
realistic (and it rarely is - these examples are rather special cases),
a scripting language is only going to be faster for short and simple
scripts, or scripts only used occasionally.

But since we know nothing about the C++ examples other than that someone
managed to make a compilation that took a long time, it is fairly
pointless to speculate what alternatives could have been better.

Bart

unread,
Dec 9, 2021, 8:39:55 AM12/9/21
to
On 09/12/2021 12:00, Dmitry A. Kazakov wrote:
> On 2021-12-09 11:50, Bart wrote:
>> On 09/12/2021 07:46, Dmitry A. Kazakov wrote:
>>> On 2021-12-08 22:29, Bart wrote:
>>>
>>>> Why so slow? Apparently it was all to do with templates.
>>>
>>> Likely, GCC is very poor with them. But your language has nothing to
>>> express polymorphism in either form. So are not in position to
>>> criticize.
>>
>> I expect my dynamic language can do polymorphism.
>
> I expect it does not, because...

Get me an example of polymorphism, as there seem to be various kinds,
and I'll see if I can express that in my language.


>> But is it really that necessary inside a compiler?
>
> you obviously do not understand it.

I understand /my/ compilers and they're obviously not suffering from any
lack of whatever you think they're lacking.

>> If I did such a project (say 100-1000Kloc) I would probably use
>> dynamic scripting code for most of it.
>
> Again a failure to see actual challenges of medium to large software
> collaboration projects.

I haven't done commercial apps for a long time. But when I did:

* I wrote the core application with 75% of modules written as script
code, about 150Kloc total
* OEM developers created add-on functionality via my scripting language
* End users also had the ability to customise and automate via macros,
command-line scripts and that same scripting language

Yes, this is a fairly small project with few collaborators.

But, on top of all the various issues involved in bigger projects, you
don't want build times to be unnecessarily slow for no good reason.

For that, you need someone whose job it is to oversee that part, to
identify such problems, and to do something to fix it if possible.

What you don't want is for someone like you to just shrug their
shoulders and say, Well, what do you expect? It is what it is.


>> Since most code is not speed-critical. Especially when you just want a
>> test run.
>
> Yep, a test run of a production line automation system, a banking
> system,

Python is used extensively in banking. Python is a scripting language.

> a luggage delivery system written in BOL, would love to see it,
> from a safe distance of course...

Actually some airport baggage handling installations were designed with
the help of my CAD application.

(Whether they used the script language to help, I don't know.)

>> You need ask a different question: do you really expect a compiler for
>> any kind of language, which is only 20-25,000 lines of user-code, to
>> take MINUTES to build on today's machines? (And on multiple cores!)
>> Somebody is doing something wrong.
>
> Yes, lack of experience/education on modern language-assisted software
> design.

My first two examples were from people /implementing their own languages/.

> On your part we see an active denial of very existence of such problems.
> Things you see important are just silly from the point of view of a
> practicing software developer. Things you reject are key tools for
> managing complexity of modern software.

OK. Carry on waiting 5 minutes between each run. Hopefully it will be
for a more worthwhile reason than over-eager, inappropriate use of
templates.

Or, for that 100Kloc Sun program, wait 16 hours to build it. Because it
is perfectly reasonable to compile at 1.7 lines per second.

It's really no skin off my nose.

Bart

unread,
Dec 9, 2021, 9:00:07 AM12/9/21
to
On 09/12/2021 13:16, David Brown wrote:
> On 09/12/2021 12:01, Bart wrote:
>> On 09/12/2021 08:28, David Brown wrote:
>>> On 09/12/2021 08:46, Dmitry A. Kazakov wrote:
>>
>>>> Yes, a modern medium-sized project is extremely large. In a language
>>>> without polymorphism it would take years to design and the code would be
>>>> tenfold bigger and complex.
>>>
>>> Exactly.  Developer time is more important than computer time.
>>
>> Exactly. That is why having developers twiddle their thumbs for five
>> minutes every time they compile is a joke.
>
> When the alternative is spending weeks or months writing the code by
> hand rather than using templates?

I can think of half a dozen alternatives. Plenty of time to ponder
during the 7 hours out of 8 each day that I would be spending waiting!

One would be to develop first using a rapid-development environment (eg.
scripting). Once it works, then you port it to C++, but now it's a
working program; you will not need to build it so many times.

(But maintenance will still be a pig.)

>> Scripting languages have all the benefits of C++ and then some. Except
>> for execution speed.
>>

> Spoken like someone who does not know C++, or have any concept of
> scripting languages.

I created my first scripting languages in the late 80s (so I course I
have no concept of what they are).

When I first found out about C++ a decade later, and its 'OOP', I wanted
to find out what it could do.

The first examples I saw concerned first-class string handling; stuff
I'd already been doing for years in my script language!

> There are different languages with different balances of features and
> emphasises, and that's good. But you'd have to be completely ignorant
> to think that there are any scripting languages with /all/ the benefits
> of C++

So, what does C++ specifically offer that would be impossible in
scripting code? I don't mean things that only work with native code.


>> But the C++ in these examples is so slow to build, that a scripting
>> language is likely to be faster.
>>
>
> That would seem highly unlikely.

If the reason for slowness is some degenerate cases of template
expansion, then it is a possibility.

Then a script language that provides generics anyway won't have that
problem. It will just be a fixed factor slower for the more conventional
parts of a compiler, not exponentially slower.


Dmitry A. Kazakov

unread,
Dec 9, 2021, 9:49:26 AM12/9/21
to
On 2021-12-09 14:39, Bart wrote:

> Get me an example of polymorphism, as there seem to be various kinds,
> and I'll see if I can express that in my language.

This again shows lack of understanding. Types of polymorphism are
irrelevant as they all serve the same goal. The difference between
ad-hoc, parametric, dynamic polymorphism are in their
capabilities/limitations to express certain types of classes.

> But, on top of all the various issues involved in bigger projects, you
> don't want build times to be unnecessarily slow for no good reason.

Nobody cares about build times, because this is not how the software
development process works. People even agree to suffer nightly builds,
while calling it agile... (:-))

> For that, you need someone whose job it is to oversee that part, to
> identify such problems, and to do something to fix it if possible.

Yes, you do not know how testing and software quality assurance work either.

>> On your part we see an active denial of very existence of such
>> problems. Things you see important are just silly from the point of
>> view of a practicing software developer. Things you reject are key
>> tools for managing complexity of modern software.
>
> OK. Carry on waiting 5 minutes between each run.

Yep, like the James Webb telescope, 30 years before the first and only
run...

Again, properties of advanced programming languages such as static
verification, modularization, separation of interface and
implementation, reuse, generic programming are focused on risk analysis
and understanding that 50% of cases are fundamentally untestable, 50% of
the rest is economically, technically, timely infeasible to test, 50% of
the rest of the rest require in-depth analysis of the requirements, the
underlying physical processes and the implementation algorithms in order
to test properly.

> Hopefully it will be
> for a more worthwhile reason than over-eager, inappropriate use of
> templates.

Nobody is really fond of templates or generics. The problem is that no
practical alternative to completely supersede parametric polymorphism
was ever shown. Dynamic polymorphism is better than the parametric one
in all aspects except that it cannot handle many cases of the former and
inconsistent with regard to multiple dispatch etc.

My interest in new crazy languages is if they address these issues.
Silly suggestions about non-existing unsigned arithmetic or summing
bitfields are not.

Andy Walker

unread,
Dec 9, 2021, 10:44:03 AM12/9/21
to
On 08/12/2021 22:49, Bart wrote:
[I wrote:]
>>      Meanwhile, my I suggest a little test for you?  Instead
>> of "fannkuch(10)", please get your collection of programs and
>> compilers to produce "fannkuch(2)".  Any compilers that return
>> "2, 2" [without comment and from code equivalent to what you
>> supplied] should be kept well away from any serious computation.
> What should it produce for fannkuch(2)?
> I downloaded versions in 4 languages from the original benchmarks
> site. Three of them (Lua, Python, Julia) go wrong in giving a program
> error.
That's not "wrong"; with the code [as supplied to me in the
Algol version], there /is/ a program error. I deduce that the authors
either didn't test their code or didn't care that there was a bug.
Neither case is commendable. Given a function of "int n", it would
seem natural to me to test with at least something like 0, 1, 2, 3,
something bigger, and something negative. It costs zilch to add a
test "if n < 3 then ..." to the code.

> The fourth, in C, completes, but fannkuch(2) returns -1. I don't know
> if that is correct or not, but it sounds like it might be undefined.

The problem is perfectly well defined in the cases 1 and 2;
for 0 or -1, you get to decide what it means. But if the C is "the
same as" the Algol you supplied, then it has failed to detect the
errors that A68G detects. See below.

> The presence of these lines:
>     swap(p[2],p[3])
>     signx:=1
>     for i:=3 to n do
> plus perm1[2] in the 0-based Python, and p[3] in Julia, suggests n
> needs to be at least 3.

In my universe, it "suggests" that the code ought to check
that "n" is at least 3. If the arrays in fact have size 2, then
using "p[3]" is already an error. If that error is disguised, in
languages where "dynamic arrays" are impossible/difficult, by
declaring the arrays to have size [say] 100 [prompting another
sanity check in good code!], then there is a second error in that
"p[3]" now exists but is uninitialised. A68G detects both of
these; over to you to test other compilers.

I'm not claiming that these are the worst and most serious
bugs ever perpetrated. But they are examples of the sorts of bug
that Black Hats exploit in creating malware. They look at source
code with a keen eye on potential buffer over-runs, on dumping
naughty instructions/data in places where they may be stumbled
across by code using uninitialised storage, on using null pointers
and the like [inc, of course, race conditions, not relevant here].
Languages and compilers that don't help detect these, even in the
presence of good testing, are a menace. [Note that C (eg) /could/
detect these; but not if it's somehow essential to shave every
last microsecond off the run time.]

>>> I would expect an optimising compiler to iron out such minor
>>> differences.
>>      I suspect you're being optimistic if you expect even a good
>> optimiser to detect that [in a reasonably large and complicated
>> program] element 0 is never accessed, or systemically to reduce all
>> indexes by 1.  But you never know.
> Element 0 can stay there; it's not doing any harm with just 3 arrays.

Yes, but it was your claim that an optimising compiler should
"iron out such minor differences". My point was that what start out
as minor differences soon turn into differences that permeate the
entire program. For the sorts of toy program that we tend to give
as examples here, that barely happens. But over something bigger,
say a compiler, you make decisions early on about [eg] whether an
index should be pre- or post-incremented, that influences whether
you use the array element and /then/ change the index or vv, that
changes the value of "i" /after/ the loop, that influences the
next stage of the code and so on. Then you perhaps need to switch
from a 0-based to a 1-based language, and whereas you can perhaps
ignore an element 0 that you don't need, you can't ignore one that
has been used, so you have to change every array access.

> Being 1-based may simply mean an extra offset on some address modes;
> is that what you think is causing a slow-down? With local arrays,
> that offset should just be absorbed into the stack offset of the
> array.

I wasn't thinking about slow-downs at all; I just thought
you were being optimistic in thinking that "gcc -O3" might magically
turn some 1-based code into equivalent 0-based code or vv.

--
Andy Walker, Nottingham.
Andy's music pages: www.cuboid.me.uk/andy/Music
Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Widor

Bart

unread,
Dec 9, 2021, 12:22:55 PM12/9/21
to
On 09/12/2021 14:49, Dmitry A. Kazakov wrote:
> On 2021-12-09 14:39, Bart wrote:
>
>> Get me an example of polymorphism, as there seem to be various kinds,
>> and I'll see if I can express that in my language.
>
> This again shows lack of understanding. Types of polymorphism are
> irrelevant as they all serve the same goal. The difference between
> ad-hoc, parametric, dynamic polymorphism are in their
> capabilities/limitations to express certain types of classes.
>
>> But, on top of all the various issues involved in bigger projects, you
>> don't want build times to be unnecessarily slow for no good reason.
>
> Nobody cares about build times,
Yeah, like nobody cares about stopping at a red light!


> Yes, you do not know how testing and software quality assurance work
> either.

I guess quality assurance doesn't extend to keeping a development
environment productive.

>>> On your part we see an active denial of very existence of such
>>> problems. Things you see important are just silly from the point of
>>> view of a practicing software developer. Things you reject are key
>>> tools for managing complexity of modern software.
>>
>> OK. Carry on waiting 5 minutes between each run.
>
> Yep, like the James Webb telescope, 30 years before the first and only
> run...
>
> Again, properties of advanced programming languages such as static
> verification, modularization, separation of interface and
> implementation, reuse, generic programming are focused on risk analysis
> and understanding that 50% of cases are fundamentally untestable, 50% of
> the rest is economically, technically, timely infeasible to test, 50% of
> the rest of the rest require in-depth analysis of the requirements, the
> underlying physical processes and the implementation algorithms in order
> to test properly.
Do you think is reasonable to spend 5 minutes across 8 cores doing that
on a 25,000-line program on every routine build?

I don't. Most people would agree that that is over the top. Even with
those 'polymorphic' features (which, if the cost is to slow things down
by 100 times, are non-viable).

Perhaps you genuinely think that's how long such things should take.

Or you are just being scornful of products where it can done instantly.

anti...@math.uni.wroc.pl

unread,
Dec 10, 2021, 6:28:53 PM12/10/21
to
gcc is in different game: "how many % of runtime efficiency can we
gain if we throw 10 times more resources at compilation". But
you can have much smaller implementation of large language.
I do not think exposing IL is good idea. However, logically all you
have is small number of "special forms", like 'if', loops, assignments
and functions calls. Compiler needs to understand that '+' for
machine integer type should be replaced by single instruction.
In other words, each operation in IL represents optimization for
specific type. This can be as simple as table of tens or hundreds
of entries like:

["+", I64, I64, I64, ADD.I64]

where first position is name of operation, second is result type,
third and fourth are argument types and last one is corresponging
operation in IL. This table is part of compiler, users only knows
that "+" will add integers.
OK, if you do not need more "builtin functions", then ad-hoc approach
is fine. GNU Pascal supports hundreds of "builtin functions" (when
you count diffecences due to different types of arguments). This
forces more systematic approach. AFAIK for Cobol you will need
more buitins...

> > Bunch
> > of older languages like Cobol, Fortran, PL/I had
> > special "builtin" operations, some of them much
> > more than Pascal.
> >
> > Let me add that there are more radical approaches:
> > there are languages in which "users" can redefine
> > scanner and/or parser. So users can add on the
> > fly new operators and define their meaning. As
> > a possible example consider you 'stringimage'
> > extention:
>
> Do you mean my 'strinclude' which incorporates a text file (or
> text/binary file in dynamic code), as string data within the program?

Yes.

> > in extensible language user simply
> > declares that 'stringimage' is a new keyword that
> > should invoke its handler.
>
> That itself is not so easy. Keywords are built-in, scopeless, and known
> program-wide.

no, no, no.

> With a user-define keyword, could two modules define that
> keyword differently?

Yes, keywords are scoped and recognized only within their scope.
More precisely, each compilation stream has it own keywords which
can be added or retracted at essentially any time.

> Could another part of the program use that same
> name as an ordinary variable?

Yes.

> If so then it's not known as a new keyword after the next pass after
> parsing. So it will need to have a suitable 'shape' to be parsed as
> /something/. (My language also has out-of-order definitions.)
>
>
> Handler looks at
> > following program text, extracts filename,
> > reads the file and tells compiler to use resulting
> > string as a string constant.
>
> How? String literals are detected fairly early, they have a certain AST
> node, length etc. At what pass will this be done?

Apparently you have your compiler in mind and do not think about
alternatives. String literals produce string tokens. There
is no law of nature that forbids alternative ways of creating
string tokens. Once handler created sting token and passed it
to normal compilation mechanizm, there is no difference between
such string token and string tokens coming from string literals
in source file.

> > The whole thing is
> > probably 10 lines for handler and a single
> > line to register new keyword.
>
> In language that
> > allows syntax extentions instead of overloading
> > you can just define new operators, so overloading
> > strictly speaking is not necessary. Similarly
> > you have some way for syntax handlers to generate
> > code, which is similar to inlining, but stricly
> > speaking inlining is not needed.
>
> Do you have an example of such a language where implementing my
> 'strimport' is 10 lines of code? For example, C++. Or the one you
> mentioned. If so, how does it look?

Your 'stringimage' can be done as Lisp macro:

;;; ---<cut here>-------------
(defun slurp_file (fn)
(with-open-file (f fn)
(let ((seq (make-string (file-length f))))
(read-sequence seq f)
(coerce seq 'string)))
)

(defmacro string_include (f)
(slurp_file f))
;;; ---<cut here>-------------

Note: in source file use of 'string_include' looks like function
call (almost any Lisp construct looks like function call).
However, since it is a Lisp macro, it is executed at compile
time and result is handled like part of source.
Sure. However note the difference: you could add 'strinclude' because
you are compiler author. With extensible language any user can add
constructs that she/he needs.

--
Waldek Hebisch

anti...@math.uni.wroc.pl

unread,
Dec 10, 2021, 6:46:35 PM12/10/21
to
Bart <b...@freeuk.com> wrote:
> On 09/12/2021 14:49, Dmitry A. Kazakov wrote:
> > On 2021-12-09 14:39, Bart wrote:
<snip>
> >>> On your part we see an active denial of very existence of such
> >>> problems. Things you see important are just silly from the point of
> >>> view of a practicing software developer. Things you reject are key
> >>> tools for managing complexity of modern software.
> >>
> >> OK. Carry on waiting 5 minutes between each run.
> >
> > Yep, like the James Webb telescope, 30 years before the first and only
> > run...
> >
> > Again, properties of advanced programming languages such as static
> > verification, modularization, separation of interface and
> > implementation, reuse, generic programming are focused on risk analysis
> > and understanding that 50% of cases are fundamentally untestable, 50% of
> > the rest is economically, technically, timely infeasible to test, 50% of
> > the rest of the rest require in-depth analysis of the requirements, the
> > underlying physical processes and the implementation algorithms in order
> > to test properly.
> Do you think is reasonable to spend 5 minutes across 8 cores doing that
> on a 25,000-line program on every routine build?
>
> I don't. Most people would agree that that is over the top. Even with
> those 'polymorphic' features (which, if the cost is to slow things down
> by 100 times, are non-viable).

Well, in my main project I spent more than 5 minutes on each full build
(it in not 25000, wc counts is closer to 350000). However, full build
is relatively rare thing, normally I do hundreds of compile-edit
cycles between full builds (recompiling single file). Sure,
faster build time would be nice, and I spent some time decreasing
it (it went down from about 2 hours to few minutes, partly due to
faster muchine, partly due to efficiency improvements). But I am
not going to sacrifice features for faster build time.

> Perhaps you genuinely think that's how long such things should take.

Well, health experts says that on should take regular breaks from
programming. 5 minutes is somewhat short, but better than nothing
(10 minutes probably would be optimal).

--
Waldek Hebisch

Bart

unread,
Dec 10, 2021, 8:21:35 PM12/10/21
to
On 10/12/2021 23:28, anti...@math.uni.wroc.pl wrote:
> Bart <b...@freeuk.com> wrote:

>> Source code for gcc was 45,000 source files last I looked. I don't even
>> know if that's just for C, or for C++ too.
>>
>> That's a thousand times a larger scale than what I do. So I guess I'm in
>> less need of such methods.
>
> gcc is in different game: "how many % of runtime efficiency can we
> gain if we throw 10 times more resources at compilation".

gcc-O3 manages some 50% better performance on the typical programs /I
write/. Example (the BB-opt column, from 2020, is roughly same as my
current compiler):

https://github.com/sal55/langs/blob/master/benchsumm.md

But in general getting 2-3 times as slow as gcc-O3 is easy. Tcc manages
that, and it is a 0.18MB compiler. It will still be faster than most
interpreted languages!

This is for fairly sensible input files. When input is crazier, perhaps
machine-generated with lots of redundant code and variables [or nested
template expansions!], then you will need better optimisation methods to
clean up the mess.

>> (110 lines of code for static language which needs to generate different
>> calls for various types; 50 lines for dynamic language.)
>
> OK, if you do not need more "builtin functions", then ad-hoc approach
> is fine. GNU Pascal supports hundreds of "builtin functions" (when
> you count diffecences due to different types of arguments).

If they have function syntax (note my Print statements don't) without
too many special features, then a systematic approach is easy.

For example, for most binary ops with symmetric operand types, there is
just one AST node type, and translation to IL uses four lines of code:

proc do_bin(unit p,a,b) =
evalunit(a)
evalunit(b)
pcl_gen(p.pclop)
setmode_u(p)
end

You tend to do this when you can find patterns, to avoid duplication.


>> That itself is not so easy. Keywords are built-in, scopeless, and known
>> program-wide.
>
> no, no, no.
>
>> With a user-define keyword, could two modules define that
>> keyword differently?
>
> Yes, keywords are scoped and recognized only within their scope.
> More precisely, each compilation stream has it own keywords which
> can be added or retracted at essentially any time.

Those aren't what I call keywords! More like user-defined syntax. But
with custom syntax, you don't want it changing in different parts of the
program.

Or are you thinking about Lisp?

>> Do you have an example of such a language where implementing my
>> 'strimport' is 10 lines of code? For example, C++. Or the one you
>> mentioned. If so, how does it look?
>
> Your 'stringimage' can be done as Lisp macro:
>
> ;;; ---<cut here>-------------
> (defun slurp_file (fn)
> (with-open-file (f fn)
> (let ((seq (make-string (file-length f))))
> (read-sequence seq f)
> (coerce seq 'string)))
> )
>
> (defmacro string_include (f)
> (slurp_file f))
> ;;; ---<cut here>-------------
>
> Note: in source file use of 'string_include' looks like function
> call (almost any Lisp construct looks like function call).
> However, since it is a Lisp macro, it is executed at compile
> time and result is handled like part of source.

So this approach can also be used by C++'s constexpr feature? (Assuming
there are ways to read files into strings.)

I notice you say 'compile-time'; does Lisp have a compilation stage with
some intermediate or binary output? I thought it was run from source.
Actually I thought that that was the big deal about it, that you can
manipulate Lisp code as though it was runtime data.

The thing about a discrete compilation stage is that the resulting
binary output contains embedded versions of those string includes.

(My current dynamic language is run-from-source only. However there is a
feature to package all source/support files into one text file; the
string include file goes there.)

>
>> I think that 20 lines of code for 'strinclude' inside a compiler is
>> reasonable.
>
> Sure. However note the difference: you could add 'strinclude' because
> you are compiler author. With extensible language any user can add
> constructs that she/he needs.

No, my language isn't extendable that way. People can only add things in
the usual way: user-defined functions, records, types, etc.

I think it makes for a simpler, easier-to-implement and faster-to-run
language.

Bart

unread,
Dec 10, 2021, 8:23:53 PM12/10/21
to
On 10/12/2021 23:46, anti...@math.uni.wroc.pl wrote:
> Bart <b...@freeuk.com> wrote:


>
>> Perhaps you genuinely think that's how long such things should take.
>
> Well, health experts says that on should take regular breaks from
> programming. 5 minutes is somewhat short, but better than nothing
> (10 minutes probably would be optimal).

I take plenty of breaks, like going on forums like these!

But when I am working, then I don't want to hang about waiting until I
see the results of any change.

anti...@math.uni.wroc.pl

unread,
Dec 11, 2021, 8:09:43 AM12/11/21
to
Bart <b...@freeuk.com> wrote:
> On 10/12/2021 23:28, anti...@math.uni.wroc.pl wrote:
> > Bart <b...@freeuk.com> wrote:
<snip>
> >> That itself is not so easy. Keywords are built-in, scopeless, and known
> >> program-wide.
> >
> > no, no, no.
> >
> >> With a user-define keyword, could two modules define that
> >> keyword differently?
> >
> > Yes, keywords are scoped and recognized only within their scope.
> > More precisely, each compilation stream has it own keywords which
> > can be added or retracted at essentially any time.
>
> Those aren't what I call keywords! More like user-defined syntax. But
> with custom syntax, you don't want it changing in different parts of the
> program.
>
> Or are you thinking about Lisp?

Not only Lisp. The following is slightly redacted part of Pop11 code:

external declare XEvents in c;

typedef struct {
int type; ;;; of event */
unsigned long serial; ;;; # of last request processed by server */
Bool send_event; ;;; true if this came from a SendEvent request */
Display *display; ;;; Display the event was read from */
Window window; ;;; "event" window it is reported relative to */
Window root; ;;; root window that the event occured on */
Window subwindow; ;;; child window */
Time time; ;;; milliseconds */
int x, y; ;;; pointer x, y coordinates in event window */
int x_root, y_root; ;;; coordinates relative to root */
unsigned int state; ;;; key or button mask */
unsigned int button; ;;; detail */
Bool same_screen; ;;; same screen flag */
} XButtonEvent;

...
...
...

int *XSynchronize(display, onoff)
Display *display;
int onoff;
{}

endexternal;

define global QLength(dpy);
lvars dpy;
dpy #-> qlen
enddefine;

Part between 'external' and 'endexternal' essentially uses C syntax.
I declares C routines callable from Pop11. To properly declare C
routines 'external' part also declares bunch of C types (needed for
proper handling of arguments. After 'endexternal' you have more
normal Pop11 code. 'external' part is quite different than other
parts so it is natural to use distinct syntax. Note that Pop11
has convention that each construct has clearly visible terminator,
so there is well-defined moment to switch off syntax extention.

'external' is provided as part of Pop11, but could be defined
by user. ATM there are constructs that are considerd part of
"standard" Pop11, but which were originaly created as user
extentions.

Concerning "you don't want it changing in different parts of the
program", most extentions are intended to blend nicely into
whole language. In particular you want ability to combine
extentions. But some extentions naturally work on delimited
fragments of code and withing fragment (like 'external' above)
you may wich to use syntax which is quite different than rest
of language. Another situation is "legacy code". There may be
extention which is better than older way of doing things and
you may intend for new way to replace the old one. In particular
new way may be incompatible with old way. If you have old source
files that were not updated to new way, it is natural to switch
on old syntax in old files (or in old fragments of new files
if you want to cram a lot of things into single file), while
using new (incompatible) syntax in new code.

Anyway, AFAICS Lisp tradition is to stick to "standard" Lisp
syntax and define extentions mostly as macros (in principle
you could redefine Lisp syntax so that is accepts your
language as input, but such things seem to be relatively
infreqent). OTOH in Pop11 each extention usually has its
own syntax. Note that normal use case it to create extentions,
that is leave core language unchanged. But if you wish
you can redefine or elliminate part of core language
(in Lisp macros can not change core language, you need to
redefine syntax to effect such change).

> >> Do you have an example of such a language where implementing my
> >> 'strimport' is 10 lines of code? For example, C++. Or the one you
> >> mentioned. If so, how does it look?
> >
> > Your 'stringimage' can be done as Lisp macro:
> >
> > ;;; ---<cut here>-------------
> > (defun slurp_file (fn)
> > (with-open-file (f fn)
> > (let ((seq (make-string (file-length f))))
> > (read-sequence seq f)
> > (coerce seq 'string)))
> > )
> >
> > (defmacro string_include (f)
> > (slurp_file f))
> > ;;; ---<cut here>-------------
> >
> > Note: in source file use of 'string_include' looks like function
> > call (almost any Lisp construct looks like function call).
> > However, since it is a Lisp macro, it is executed at compile
> > time and result is handled like part of source.
>
> So this approach can also be used by C++'s constexpr feature? (Assuming
> there are ways to read files into strings.)

IIUC currently C++ constexpr has rather strong restrictions of what
can be computed and I think it can not read from files. But
assuming that such restrictions are lifted, then the approach
would work.

Note that Lisp macros can do more: they are not limited to
producind new constants, they can produce arbitrary code.

Similarly, syntax handler in Pop11 can produce arbitrary code.

> I notice you say 'compile-time'; does Lisp have a compilation stage with
> some intermediate or binary output? I thought it was run from source.

I know one "run from source" Lisp implementation (it actually compiles
to memory and you can dump memory and later restart program from
dumped memory image) but this is non-standard. Most Lisp implmentation
have some intermediate format for compiled files. Also, all
Lisp implmentations that I know can run from dumped memory image.
Most can create stanalone executables, but some insits that you
run loader program which loads previously dumped memory image
and runs from it.

> Actually I thought that that was the big deal about it, that you can
> manipulate Lisp code as though it was runtime data.

You can, as long as it source code. There are limitations of what
you can do with compiled code. If you want most (all???) Lisp
implementations will keep source together with binary code, so
you can examine, transform and maybe recompile any routine.
Due to binary format in use normally at any time you can replace
given top-level routine by different one and rest of code will use
new routine. I wrote normally, because you can _not_ replace
inlined routines in this way (you need to replace all users).

--
Waldek Hebisch
0 new messages