Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

New language and first compiler meet a Fibonacci benchmark

86 views
Skip to first unread message

James Harris

unread,
Mar 10, 2019, 11:32:15 AM3/10/19
to
In the 0 = true ? thread Bart posted this:

>> On 08/03/2019 12:22, Bart wrote:

>>> seconds C, gcc-O3
>>> 2 0.3 My static compiler (unoptimised)
>>> 3 1.7 My interpreter, unoptimised but accelerated
>>> 4 10.1 My interpreter, unoptimised, non-accelerated
>>> 5 26.7 a68g.exe
>>> My new language (which combines the current two) might also be 0.3
>>> or slightly slower.

...

>
> PROC fib = (INT n)INT:
> IF n<3 THEN
> 1
> ELSE
> fib(n-1)+fib(n-2)
> FI;
>
> FOR i TO 36 DO
> print((i,fib(i),new line))
> OD

Wow! That's a big difference in performance. And it prompted me to try
the following test.

Over the winter I wrote the first compiler - Yay! - for the language I
have been designing, and I thought this would be a good piece of code to
use as a fun test.

Here's what the function looks like in the initial dialect of my language.

function fib
parm n
if
on n$ < 3
return 1
on 1
return fib(n$ - 1) + fib(n$ - 2)
endif
endfunction

There's an additional main function which I won't bore you with but when
the program is run the output ends with

30 832040
31 1346269
32 2178309
33 3524578
34 5702887
35 9227465
36 14930352

I've checked a number of the outputs against a Fibonacci calculator I
found online, and they seem to be correct - Yay, again!

As for timings, I get the following

To compile and link: 0.7s. That is very slow but I think that's because
this compiler reads characters from the OS one at a time and is thus
heavily hit with context-switch times.

I've been really surprised by how long a compile like this takes. For
example, for the compiler to compile a version of itself takes something
of the order of 2 seconds. Again, I presume that's because of the way it
reads the source.

I'm not too bothered about the compile speed in this case because this
compiler (which I call C0) is 'sacrificial'. It is written in assembly
and its only purpose is to compile C1 which is basically the same
compiler written in the new language (so the compiler can bootstrap
itself).

Because C0 is for only short-term use it can take its time. The main
issue was to keep it simple and correct. C1 can be a bit more complex,
including things like reading the input in chunks.

Anyway, back to timings.

The run time of the Fibonacci function going up to n=36 was 2.5 seconds.

Again, the run time is not too important as yet but it makes an
interesting comparison with Bart's timings, above. This compiler
produces code which is completely naive. Its 2.5 seconds, therefore, is
about a slow as such a function should be able to get.

That makes the a68g timing, above, very surprising, if it is the
run-time for the function execution.

Surely the a68g version must be doing something odd. Floating point
numbers, perhaps? Or carrying out operations via function calls?

At any rate, it makes me feel pleased about the 2.5s figure. :-)

>
> BTW here's the benchmark above in my language, the one originally based
> on Algol68 syntax long ago:
>
>
> proc start =
> for i to 36 do
> println i,fib(i)
> od
> end
>
> function fib(int n)int =
> if n<3 then
> return 1
> else
> return fib(n-1)+fib(n-2)
> fi
> end
>
> (At some point, statements and expressions were segregated. The overhaul
> I'm doing now combines them again as they are in A68. The effect here is
> that the 'return' keyword becomes optional.

Oddly, I didn't think I needed return statements ... but found I did in
the fib code, above. Off the top of my head I can't remember changing that.

>
> If nothing else, just the fact that I can use lower case keywords makes
> this superior.)
>


--
James Harris

Bart

unread,
Mar 10, 2019, 2:54:12 PM3/10/19
to
On 10/03/2019 15:32, James Harris wrote:
> In the 0 = true ? thread Bart posted this:

> >>>     2   0.3          My static compiler (unoptimised)
> >>>     3   1.7          My interpreter, unoptimised but accelerated
> >>>     4  10.1          My interpreter, unoptimised, non-accelerated
> >>>     5  26.7          a68g.exe

> Wow! That's a big difference in performance. And it prompted me to try
> the following test.
>
> Over the winter I wrote the first compiler - Yay! - for the language I
> have been designing, and I thought this would be a good piece of code to
> use as a fun test.
>
> Here's what the function looks like in the initial dialect of my language.
>
>   function fib
>     parm n
>     if
>     on n$ < 3
>       return 1
>     on 1
>       return fib(n$ - 1) + fib(n$ - 2)
>     endif
>   endfunction

(Inspired by Perl?)


> To compile and link: 0.7s. That is very slow but I think that's because
> this compiler reads characters from the OS one at a time and is thus
> heavily hit with context-switch times.

That doesn't sound right. The C version below is about 200 bytes. At 0.7
seconds to read it, it comes out at about 2400 baud, and 20 lines per
second. (I don't suppose your OS is at the other of dial-up modem.)

Compiler start-up overheads?

>
> The run time of the Fibonacci function going up to n=36 was 2.5 seconds.

Machines are different of course. If you have a C compiler, you can
compare the timings against this version:

#include <stdio.h>

int fib(int n) {
if (n<3)
return 1;
else
return fib(n-1)+fib(n-2);
}

int main(void){
for (int i=1; i<=36; ++i) {
printf("%d %d\n",i,fib(i));
}
}

On my machine this took 0.36 seconds with gcc-O0 and 0.19 seconds with
gcc-O3.

(If -O3 is considerably faster, then disregard it as it might be
cheating by applying too aggressive optimisations. This benchmark should
call fib() 78,176,300 times with exactly that number of function entry
and exit sequences; I've seen gcc by-pass a lot of that.)

> Again, the run time is not too important as yet but it makes an
> interesting comparison with Bart's timings, above. This compiler
> produces code which is completely naive.

Native or naive?

> Its 2.5 seconds, therefore, is
> about a slow as such a function should be able to get.
>
> That makes the a68g timing, above, very surprising, if it is the
> run-time for the function execution.

The 0.2 and 0.3 timings in my table are for statically compiled native
code. The next two are for interpreted byte-code.

> Surely the a68g version must be doing something odd. Floating point
> numbers, perhaps? Or carrying out operations via function calls?

A68G's options mention a -O3 optimisation via a C backend, but that
didn't seem to work on my system. So I'm guessing that it's interpreting
(and it's not a startup delay as output starts immediately).

Here are some other timings:

Tiny C 0.3
bcc 0.3 (My own C compiler)
LuaJIT 0.9
PyPy 1.5
*Lua 6 Fastest
*Ruby 9
*Lua 15? Slowest (depends on C compiler & flags)
*Python 2 19 seconds
*Python 3 23
*CLisp 39

The ones marked * will probably be interpreters. The rest will run
native code either directly or via JIT.

James Harris

unread,
Mar 10, 2019, 3:42:08 PM3/10/19
to
On 10/03/2019 18:54, Bart wrote:
> On 10/03/2019 15:32, James Harris wrote:
>> In the 0 = true ? thread Bart posted this:
>
>> >>>     2   0.3          My static compiler (unoptimised)
>> >>>     3   1.7          My interpreter, unoptimised but accelerated
>> >>>     4  10.1          My interpreter, unoptimised, non-accelerated
>> >>>     5  26.7          a68g.exe
>
>> Wow! That's a big difference in performance. And it prompted me to try
>> the following test.
>>
>> Over the winter I wrote the first compiler - Yay! - for the language I
>> have been designing, and I thought this would be a good piece of code to
>> use as a fun test.
>>
>> Here's what the function looks like in the initial dialect of my language.
>>
>>   function fib
>>     parm n
>>     if
>>     on n$ < 3
>>       return 1
>>     on 1
>>       return fib(n$ - 1) + fib(n$ - 2)
>>     endif
>>   endfunction
>
> (Inspired by Perl?)

What is it that reminds you of Perl?

>
>
>> To compile and link: 0.7s. That is very slow but I think that's because
>> this compiler reads characters from the OS one at a time and is thus
>> heavily hit with context-switch times.
>
> That doesn't sound right. The C version below is about 200 bytes. At 0.7
> seconds to read it, it comes out at about 2400 baud, and 20 lines per
> second. (I don't suppose your OS is at the other of dial-up modem.)
>
> Compiler start-up overheads?

More likely is because I am running this in a VM and am literally making
an IO syscall into the kernel for every input and every output character
so it's possible that as well as the context switch the VM is having to
go via the host OS each time. For example, for each character needed the
read function calls the following code (which in an OS-interface
library) with count=1.

mov edx, [count]
mov ecx, [bufp]
mov ebx, [fd]
mov eax, SYSCALL_READ
int 0x80

I am /assuming/ that this character-by-character approach is what makes
it slow. It's hard to think of another likely cause; the compiler does
no backtracking or anything else that would be time consuming. (That's
speaking about compiler C0. If the slowness is still there in C1 once
I'm doing buffered reads I'll look into it but, for now, keeping the
assembly code simple was the goal.)

>
>>
>> The run time of the Fibonacci function going up to n=36 was 2.5 seconds.
>
> Machines are different of course. If you have a C compiler, you can
> compare the timings against this version:
>
> #include <stdio.h>
>
> int fib(int n) {
> if (n<3)
> return 1;
> else
> return fib(n-1)+fib(n-2);
> }
>
> int main(void){
> for (int i=1; i<=36; ++i) {
> printf("%d %d\n",i,fib(i));
> }
> }
>
> On my machine this took 0.36 seconds with gcc-O0 and 0.19 seconds with
> gcc-O3.
>
> (If -O3 is considerably faster, then disregard it as it might be
> cheating by applying too aggressive optimisations. This benchmark should
> call fib() 78,176,300 times with exactly that number of function entry
> and exit sequences; I've seen gcc by-pass a lot of that.)

I am not sure if you are talking about compile time or execution time.
Here are both.

gcc -O0: compile 0.15s, run 1.2s
gcc -O3: compile 0.28s, run 0.4s

>
>> Again, the run time is not too important as yet but it makes an
>> interesting comparison with Bart's timings, above. This compiler
>> produces code which is completely naive.
>
> Native or naive?

Naive. It produces very naive assembly, i.e. each element of the source
language produces its own output code. For example, here's the generated
code for what would be, in C, if(n<3)

The value of n is loaded by

lea eax, [n]
mov eax, [eax] ;Deref

3 is loaded by

push eax ;Save operand
mov eax, 3

The compare operation sets a boolean value in EAX by

pop ebx ;Retrieve operand
cmp ebx, eax
setl al
and eax, 1 ;Compare l

And the branch is

test eax, eax
jz _$2_onend


>
>> Its 2.5 seconds, therefore, is
>> about a slow as such a function should be able to get.
>>
>> That makes the a68g timing, above, very surprising, if it is the
>> run-time for the function execution.
>
> The 0.2 and 0.3 timings in my table are for statically compiled native
> code. The next two are for interpreted byte-code.
>
>> Surely the a68g version must be doing something odd. Floating point
>> numbers, perhaps? Or carrying out operations via function calls?
>
> A68G's options mention a -O3 optimisation via a C backend, but that
> didn't seem to work on my system. So I'm guessing that it's interpreting
> (and it's not a startup delay as output starts immediately).
>
> Here are some other timings:
>
> Tiny C 0.3
> bcc 0.3 (My own C compiler)
> LuaJIT 0.9
> PyPy 1.5
> *Lua 6 Fastest
> *Ruby 9
> *Lua 15? Slowest (depends on C compiler & flags)
> *Python 2 19 seconds
> *Python 3 23
> *CLisp 39
>
> The ones marked * will probably be interpreters. The rest will run
> native code either directly or via JIT.
>

That sounds likely. FWIW a Python version in this environment takes 31
seconds.


--
James Harris

Bart

unread,
Mar 10, 2019, 4:42:14 PM3/10/19
to
On 10/03/2019 19:42, James Harris wrote:
> On 10/03/2019 18:54, Bart wrote:

>> (Inspired by Perl?)
>
> What is it that reminds you of Perl?

Those trailing dollar symbols. (Unless you deliberately chose n$ as an
identifier.)


>> Compiler start-up overheads?
>
> More likely is because I am running this in a VM and am literally making
> an IO syscall into the kernel for every input and every output character
> so it's possible that as well as the context switch the VM is having to
> go via the host OS each time. For example, for each character needed the
> read function calls the following code (which in an OS-interface
> library) with count=1.
>
>   mov edx, [count]
>   mov ecx, [bufp]
>   mov ebx, [fd]
>   mov eax, SYSCALL_READ
>   int 0x80
>
> I am /assuming/ that this character-by-character approach is what makes
> it slow.

Do nothing except load the whole file a character at a time, and see if
it's still slow.

But even from ASM, you should be able to make use of C file functions
(from shared libraries msvcrt.dll or libc.so.6), which will also let you
load the file all at once. This should be the least time consuming bit!

(Also check out your AV; on my laptop Windows Defender can slow down my
compiles by a factor of 6.)

>>     int fib(int n) {

> I am not sure if you are talking about compile time or execution time.

(Runtime. Compile-time shouldn't be an issue with such a small program,
although there might be something funny going on with your compiler.)

> Here are both.
>
> gcc -O0: compile 0.15s, run 1.2s
> gcc -O3: compile 0.28s, run 0.4s


This benchmark is quite tricky to optimise and on most C compilers the
difference made by optimising is narrow. That you get a 3x speed-up here
is suspicious.

So I would go with the 1.2 seconds, which means that your naive code
being within a factor of about 2 is pretty good.

>
> Naive. It produces very naive assembly, i.e. each element of the source
> language produces its own output code. For example, here's the generated
> code for what would be, in C, if(n<3)
>
> The value of n is loaded by
>
>   lea eax, [n]
>   mov eax, [eax] ;Deref
>
> 3 is loaded by
>
>   push eax ;Save operand
>   mov eax, 3
>
> The compare operation sets a boolean value in EAX by
>
>   pop ebx ;Retrieve operand
>   cmp ebx, eax
>   setl al
>   and eax, 1 ;Compare l
>
> And the branch is
>
>   test eax, eax
>   jz _$2_onend

That's not so bad. My current project uses a stack-based model for
intermediate code; very easy to generate. Each instruction can be
individually translated to a self-contained ASM sequence, so that a line
such as 'a=b+c' would end up as this x64 code:

x64: intermediate:

push qword [b] push [b]
push qword [c] push [c]
pop D1 add
pop D0 D0, D1 are 64-bit regs
add D0,D1
push D0
pop qword [a] pop [a]

I did a mock-up of this with that fib benchmark, and it was 150% slower
(2.5 times longer execution) that the normal register-based code.

So I've had to take a more complex approach to translating to native
code (which is still on-going).

Rod Pemberton

unread,
Mar 11, 2019, 4:33:34 AM3/11/19
to
On Sun, 10 Mar 2019 15:32:11 +0000
James Harris <james.h...@gmail.com> wrote:

> [...]
> Over the winter I wrote the first compiler - Yay! - for the language
> I have been designing, and I thought this would be a good piece of
> code to use as a fun test.

Well, it sure took you long enough ... So, what's the time frame now
on your OS? Three decades? :-) Sorry, some guy did that to me once
on CLAX, when I said I took me a few years to do something. It was both
annoying and funny. I'm not sure how you'll take it.

(FYI James, I have no comments below on timing issues, just language
issues.)

> Here's what the function looks like in the initial dialect of my
> language.
>
> function fib
> parm n
> if
> on n$ < 3
> return 1
> on 1
> return fib(n$ - 1) + fib(n$ - 2)
> endif
> endfunction

It appears that the 'on' does the testing, not the 'if' of the if-endif.
If so, then the if-endif are just curly braces.

Alternately, why is the "on n$ < 3" inside the if-endif conditional,
assuming 'on' tests the condition? Or, if the 'if' is actually testing
the conditions of an 'on', how does the if test two 'on'
conditions ... What is the 'if' testing?

Does this code convert directly into C by your parser/lexer? It
appears that it could do so rather easily, except for the initial
brace of the procedure and ending paren of the parameter list...

> To compile and link: 0.7s. That is very slow but I think that's
> because this compiler reads characters from the OS one at a time and
> is thus heavily hit with context-switch times.

With a sliding window of just 3 characters, you can do LALR(1) and
probably others.

> I've been really surprised by how long a compile like this takes. For
> example, for the compiler to compile a version of itself takes
> something of the order of 2 seconds. Again, I presume that's because
> of the way it reads the source.

So, this is not your language converted into C? (No.)

How many stages (and/or standalone programs) do you use to compile?

> I'm not too bothered about the compile speed in this case because
> this compiler (which I call C0) is 'sacrificial'. It is written in
> assembly and its only purpose is to compile C1 which is basically the
> same compiler written in the new language (so the compiler can
> bootstrap itself).

Isn't that somewhat reversed? Why wouldn't you produce the assembly
for C0 from C1? Bootstrapping?

Why would you avoid using C as an intermediate stage since it and Forth
are the most ubiquitous languages and since it can generate assembly
output usually? No access to C?


Rod Pemberton
--
Apple opposes "glorifying violence" and "dehumanizing language". Yet,
it manufactures products in China which commits crimes against humanity.

Rod Pemberton

unread,
Mar 11, 2019, 4:33:43 AM3/11/19
to
On Sun, 10 Mar 2019 19:42:03 +0000
James Harris <james.h...@gmail.com> wrote:

> On 10/03/2019 18:54, Bart wrote:
> > On 10/03/2019 15:32, James Harris wrote:
> >> In the 0 = true ? thread Bart posted this:

> >> [...]
> >>
> >> Here's what the function looks like in the initial dialect of my
> >> language.
> >>
> >>   function fib
> >>     parm n
> >>     if
> >>     on n$ < 3
> >>       return 1
> >>     on 1
> >>       return fib(n$ - 1) + fib(n$ - 2)
> >>     endif
> >>   endfunction
> >
> > (Inspired by Perl?)
>
> What is it that reminds you of Perl?
>

It reminds me of C, BASIC (due to $), and either Pascal or block
scripting languages.

It seems that it looks like block structured languages like Ada, Fortran
90, PL/1, Python, Ruby, as well as the other half-dozen languages that
use curly braces on Wikipedia's page below, assuming your if-endif
represents curly braces:

https://en.wikipedia.org/wiki/Do_while_loop

> > Compiler start-up overheads?
>
> More likely is because I am running this in a VM [...]

Head-spin.

Why? Is your language so complicated you need a mainframe at school
or work to run it? What is wrong with your home computer? Seriously.
Surely, you don't need a VM to protect yourself from your own code
going "off into the weeds".

Rod Pemberton

unread,
Mar 11, 2019, 4:33:48 AM3/11/19
to
On Sun, 10 Mar 2019 18:54:11 +0000
Bart <b...@freeuk.com> wrote:

> [...]
>
> Tiny C 0.3

Why does Tiny C do so well?

> bcc 0.3 (My own C compiler)

Oh, I thought that was BCC, as in Bruce's C Compiler.

https://linux.die.net/man/1/bcc
https://www.ibiblio.org/pub/micro/pc-stuff/freedos/files/repositories/1.2/pkg-html/bcc.html

Is it time for a name change? ...

Andy Walker

unread,
Mar 11, 2019, 7:07:04 AM3/11/19
to
On 10/03/2019 15:32, James Harris wrote:
> That makes the a68g timing, above, very surprising, if it is the
> run-time for the function execution.
> Surely the a68g version must be doing something odd. Floating point
> numbers, perhaps? Or carrying out operations via function calls?

Not "odd"; "a68g" is an *interpreter* -- you can expect a huge
factor between that and compiled code. About 100 is said to be typical,
but of course it depends on the mix of calculations, I/O and standard
functions/procedures. All you [should] care about with an interpreter
is whether it is fast *enough* for your purposes.

Bart's "fib" test shows that "a68g" manages around 10^7 procedure
calls per second. That's around 100x faster than the mainframes I was
using in the '60s and '70s could manage in raw machine code, and on which
I did useful and complicated computing. That's fast enough for me; YMMV.

--
Andy Walker,
Nottingham.

James Harris

unread,
Mar 11, 2019, 7:12:34 AM3/11/19
to
On 10/03/2019 20:42, Bart wrote:
> On 10/03/2019 19:42, James Harris wrote:
>> On 10/03/2019 18:54, Bart wrote:
>
>>> (Inspired by Perl?)
>>
>> What is it that reminds you of Perl?
>
> Those trailing dollar symbols. (Unless you deliberately chose n$ as an
> identifier.)

Ah, OK. The dollar signs are not meant to be permanent but there's an
interesting story behind them in the way that they helped make the
initial compiler smaller and easier to write.

An integer, say 3, can be loaded very simply with

mov eax, 3

A name, however, can be used in a language in two different contexts.
Sometimes it means the value. That might be implemented with

mov eax, [name]

Other times it means the address, as in

lea eax, [name]

As an example of both, there's this array reference (in normal syntax)

A[B]

Code for that would use the value of B but the address of A. I realised,
however, that I could avoid having to deal with two cases by treating
both as addresses. That's because addresses can be 'converted' to values
but not vice versa. The 'conversion' is done simply by adding a
dereference operation. That's what the dollar sign does. So in the
initial dialect

A means the address of A
A$ results in the value of A

In both cases, the A in the source results in

lea eax, [A]

but in the second case the $ dereference appends

mov eax, [eax]

thus resulting in the value of A.

That made the compiler quite a lot simpler. It can process source code
element by element and whenever it comes across a name it generates the
LEA code to load its address. No need to distinguish between whether a
given name should be translated into an address or a value. Nor,
usefully, is there a need for a separate parser for the LHS of
assignments because /all/ names are Rvals. So the code above would, at
the moment, need to be written

A[B$]

meaning to take the value of B and use it to index into array A.

In assembly,

lea eax, [A]
push eax
lea eax, [B]
mov eax, [eax] ;Deref <===== the effect of the $
shl eax, 2
pop ebx
mov [ebx], eax

Basically, a name, N, always results in

lea eax, [N]

and a dollar sign always means

mov eax, [eax]

The arrangement works pretty well because in most contexts a name refers
to an address rather than a value. For example, in normal syntax

A[...]
F(...)
U := ...

All of those names refer to addresses.

Also, in my initial dialect, pointer dereferences are explicit and are
accomplished by a as many trailing dollar signs as required.

name$$$

would result in

lea eax, [name]
mov eax, [eax] ; Deref
mov eax, [eax] ; Deref
mov eax, [eax] ; Deref


In summary, the dollar signs are not meant to be part of the language
but they are there at the moment to make the initial assembly language
compiler easier to write.

>
>
>>> Compiler start-up overheads?
>>
>> More likely is because I am running this in a VM and am literally making
>> an IO syscall into the kernel for every input and every output character
>> so it's possible that as well as the context switch the VM is having to
>> go via the host OS each time. For example, for each character needed the
>> read function calls the following code (which in an OS-interface
>> library) with count=1.
>>
>>   mov edx, [count]
>>   mov ecx, [bufp]
>>   mov ebx, [fd]
>>   mov eax, SYSCALL_READ
>>   int 0x80
>>
>> I am /assuming/ that this character-by-character approach is what makes
>> it slow.
>
> Do nothing except load the whole file a character at a time, and see if
> it's still slow.
>
> But even from ASM, you should be able to make use of C file functions
> (from shared libraries msvcrt.dll or libc.so.6), which will also let you
> load the file all at once. This should be the least time consuming bit!

I am deliberately taking a fairly minimalist approach to this to try to
avoid influencing the language design too much with things which already
exist. That's why I wanted to write the initial compiler in assembly,
for example, rather than using C or Python.

I've taken a similar approach to run-time support in writing my own
(very small) set of run-time routines. Hair shirts are us!

>
> (Also check out your AV; on my laptop Windows Defender can slow down my
> compiles by a factor of 6.)

Something like that happened on this computer the other day. I was
alarmed because when I ran a certain exe it took a while to start. And
something popped up on the screen but went away again before I saw what
it was. I thought maybe the program had abended or generated a dump.
Checked the logs. Nothing. Was concerned that maybe I had some sort of
malware until I realised what had happened. Turned out the virus scanner
had blocked execution until it had checked the code. Subsequent runs of
the same exe were normal.
Understood. I remember from a compiler course I did some years ago that
it's important to have an invariant for expression evaluation. I chose
to have EAX hold the most recent result (with the rest on the stack) and
you chose to have them all on the stack.

>
> I did a mock-up of this with that fib benchmark, and it was 150% slower
> (2.5 times longer execution) that the normal register-based code.
>
> So I've had to take a more complex approach to translating to native
> code (which is still on-going).
>

I have a long way to go to get to where you are but I am looking forward
to moving this along further.

Have to say that even at the level I am at getting a program to produce
asm code is very satisfying!


--
James Harris

Bart

unread,
Mar 11, 2019, 8:44:03 AM3/11/19
to
On 11/03/2019 08:36, Rod Pemberton wrote:
> On Sun, 10 Mar 2019 18:54:11 +0000
> Bart <b...@freeuk.com> wrote:
>
>> [...]
>>
>> Tiny C 0.3 [tcc 0.9.27]
>
> Why does Tiny C do so well?

Some kinds of benchmark are hard to optimise so that poor compilers
(like Tiny C and mine) can still do comparatively well.

Real applications can be the same. If I take my C compiler (which is not
written in C), translate it to a single cc.c file, build that with
various C compilers and use the resulting bcc.exe to build the sqlite3
app mentioned in another post:

cc.c compiled with: Time to build sqlite3.exe:

gcc-O3 0.48 seconds
Tiny C 0.74
bcc 0.61

There's little in it: 1/4 second between gcc and Tiny C. Consider that
most inputs will be smaller.


>> bcc 0.3 (My own C compiler)
>
> Oh, I thought that was BCC, as in Bruce's C Compiler.
>
> https://linux.die.net/man/1/bcc
> https://www.ibiblio.org/pub/micro/pc-stuff/freedos/files/repositories/1.2/pkg-html/bcc.html
>
> Is it time for a name change? ...

For mine or for Bruce's?

Mine is not public at the moment (there are a lot of issues that I don't
want to spend years fixing), so I don't think it matters.

David Brown

unread,
Mar 11, 2019, 9:50:17 AM3/11/19
to
On 11/03/2019 13:44, Bart wrote:
> On 11/03/2019 08:36, Rod Pemberton wrote:
>> On Sun, 10 Mar 2019 18:54:11 +0000
>> Bart <b...@freeuk.com> wrote:
>>
>>> [...]
>>>
>>>     Tiny C       0.3   [tcc 0.9.27]
>>
>> Why does Tiny C do so well?
>
> Some kinds of benchmark are hard to optimise so that poor compilers
> (like Tiny C and mine) can still do comparatively well.
>

There is also the fact that modern x86 processors are designed to handle
poor code very efficiently. It doesn't matter much if the generated
code does three times as many register moves as necessary - these run in
a fraction of a cpu cycle due to superscaling. It doesn't matter much
if local data is put on the stack rather than in registers - the store
buffers, prefetches and L0 cache hide most of the effect. It doesn't
matter if the generated code wastes 50 clock cycles when you need to
wait 200 cycles for a memory load.

And of course on real code, a fair amount of the time is usually spent
in system calls, waiting for IO, networking, etc. Compiler optimisation
makes no difference there.

So the relevance of optimisation varies significantly with the type of
code, and the type of processor.

Bart

unread,
Mar 11, 2019, 10:13:11 AM3/11/19
to
On 11/03/2019 11:12, James Harris wrote:
> On 10/03/2019 20:42, Bart wrote:

>> Those trailing dollar symbols. [In Perl they go at the beginning]

>   A means the address of A
>   A$ results in the value of A
>
> In both cases, the A in the source results in
>
>   lea eax, [A]
>
> but in the second case the $ dereference appends
>
>   mov eax, [eax]
>
> thus resulting in the value of A.
>
> That made the compiler quite a lot simpler.

(I wrote some stuff but it's better summarised below about dereferencing.)

> The arrangement works pretty well because in most contexts a name refers
> to an address rather than a value. For example, in normal syntax
>
>   A[...]
>   F(...)
>   U := ...
>
> All of those names refer to addresses.

I disagree. I think most names are used as rvalues.

And in your last example:

U := V

this might compile to:

MOV R,[U]
MOV [V],R

Or (in some cpus):

MOV [U],[V] # assume right to left

The address modes for both sides are exactly the same.

It might be argued that a name ALWAYS refers to an address, but whatever
the underlying mechanism, in a HLL names usually refer to their values,
even names of arrays.

Also lvalues, although this one is counter-intuitive; it's the same rank
(counting the number of indirections to get at the value), but you are
writing the value not reading it.

> Also, in my initial dialect, pointer dereferences are explicit and are
> accomplished by a as many trailing dollar signs as required.
>
>   name$$$

Now this starts to make more sense, if $ is an operator. I use "^" for
this purpose, so in a language where 'A' always means the equivalent of
'&A' elsewhere, that means:

Other HLL Your HLL
int A A A^
pointer to int P P^ P^^

To get at (or to write to) the int value in each case .

> I am deliberately taking a fairly minimalist approach to this to try to
> avoid influencing the language design too much with things which already
> exist. That's why I wanted to write the initial compiler in assembly,
> for example, rather than using C or Python.

There's an extraordinary decision to use assembly. (I haven't written a
whole program in assembly since the mid-80s, and that was because I had to.)

Much as I dislike writing in C, if the choice was C or ASM, I would
choose C.

I'm not surprised you have to keep your compiler simple!

> I have a long way to go to get to where you are but I am looking forward
> to moving this along further.

(Well, I've been doing this a long time (my first 'compiler' was running
in 1980.)

> Have to say that even at the level I am at getting a program to produce
> asm code is very satisfying!

Best bit is as soon as you have loops, or some way of repeating (eg.
recursive fib) and can let it loose on a real benchmark so that you know
where you are. Even if the first result is disappointing.

(One beautiful intermediate language, based on three-address-code,
started off at half the speed of a compiler which I'd just thrown
together with no thought. It would have take months just to get it near
that other compiler. Sadly that approach had to be dropped.)

James Harris

unread,
Mar 11, 2019, 11:41:30 AM3/11/19
to
On 11/03/2019 08:36, Rod Pemberton wrote:
> On Sun, 10 Mar 2019 15:32:11 +0000
> James Harris <james.h...@gmail.com> wrote:
>
>> [...]
>> Over the winter I wrote the first compiler - Yay! - for the language
>> I have been designing, and I thought this would be a good piece of
>> code to use as a fun test.
>
> Well, it sure took you long enough ... So, what's the time frame now
> on your OS? Three decades? :-) Sorry, some guy did that to me once
> on CLAX, when I said I took me a few years to do something. It was both
> annoying and funny. I'm not sure how you'll take it.

All of >:-( :-o| and :-)) !

You and I are kindred spirits in this stuff. It does take a long time to
make progress on the sorts of things we are working on.

Re my language, yes it has taken many years but it turned out that to do
what I wanted to do I had to design a few things in parallel:

1. The basic language (which is all I intended to do at the start)
2. An intermediate form (I.F.)
3. A configurable scheme for converting the language to the I.F.
4. A plan for a network-based code-distribution system
5. And finally, a way to adapt the I.F. to different CPUs

In reality, I couldn't begin developing until I had all those things
sorted out.

>
> (FYI James, I have no comments below on timing issues, just language
> issues.)

Cool.

>
>> Here's what the function looks like in the initial dialect of my
>> language.
>>
>> function fib
>> parm n
>> if
>> on n$ < 3
>> return 1
>> on 1
>> return fib(n$ - 1) + fib(n$ - 2)
>> endif
>> endfunction
>
> It appears that the 'on' does the testing, not the 'if' of the if-endif.
> If so, then the if-endif are just curly braces.
>
> Alternately, why is the "on n$ < 3" inside the if-endif conditional,
> assuming 'on' tests the condition? Or, if the 'if' is actually testing
> the conditions of an 'on', how does the if test two 'on'
> conditions ... What is the 'if' testing?

OK, my if construct is, currently, a series of condition-action pairs as in

if C1
A1
on C2
A2
on C3
A3
on C4
A4
endif

That's conventional if you think of "on" as elseif. The word "on" was
chosen deliberately to be the same length as "if" as I have a preference
to aid code readability by letting all the conditions line up.

Note that each "on" is a full statement by itself. End of line means end
of statement.

I also support a slightly different form where "if" is not followed by a
condition but is also a statement by itself, as in

if
on C1
A1
on C2
A2
on C3
A3
on C4
A4
endif

In that case, if and endif are no more than brackets (as you surmised)
but they are what you might call 'identified brackets'. In C one can
find oneself scanning up and down a page working out which close brace
corresponds with which open brace. The idea of the keyword endif is that
it can only end an if - which IMO can help make code a little bit easier
to navigate. (For where there are many ifs near each other I also intend
to allow 'named blocks' to further help improve code navigability but
they are not implemented yet.)


>
> Does this code convert directly into C by your parser/lexer? It
> appears that it could do so rather easily, except for the initial
> brace of the procedure and ending paren of the parameter list...

C is not involved. The source code is all compiled to Nasm assembly code.

>
>> To compile and link: 0.7s. That is very slow but I think that's
>> because this compiler reads characters from the OS one at a time and
>> is thus heavily hit with context-switch times.
>
> With a sliding window of just 3 characters, you can do LALR(1) and
> probably others.
>
>> I've been really surprised by how long a compile like this takes. For
>> example, for the compiler to compile a version of itself takes
>> something of the order of 2 seconds. Again, I presume that's because
>> of the way it reads the source.
>
> So, this is not your language converted into C? (No.)
>
> How many stages (and/or standalone programs) do you use to compile?

I am not sure what you mean by stages. What happens at the moment is the
compiler, which is called C0, converts source code written in my new, as
yet unnamed, language into Nasm assembly code. Nasm is used to convert
that to an object file. And multiple object files can be linked together
with a small run-time library to form an executable image.

When the image is run the first part to get control is the run-time
library (RTL). I guess that you could call it a replacement for crt0, if
that's the right name.

As well as invoking the compiled code the RTL provides OS interfacing.
The ideas is that there be one RTL per environment - e.g. one for
Linux32, one for DOS, one for Windows64, etc. And, and this is fun, my
RTL also supports an exception-handling mechanism. That's really cool in
that if I throw an exception anywhere in the application the stack will
be unwound all the way until the RTL regains control. The RTL will then
report that an exception occurred, write a 32-bit exception word, and
return to the OS.

Having exceptions is really handy in that it removes the need for
checking if something went wrong after each call. You know in C where
you might write

if ((do x) went wrong)
handle the error
if ((do y) went wrong)
handle the error
if ((do z) went wrong)
handle the error

Thanks to support in my compilers (C0 and C1) and the RTL I can replace
the above with

do x
do y
do z

which I find massively more readable.

>
>> I'm not too bothered about the compile speed in this case because
>> this compiler (which I call C0) is 'sacrificial'. It is written in
>> assembly and its only purpose is to compile C1 which is basically the
>> same compiler written in the new language (so the compiler can
>> bootstrap itself).
>
> Isn't that somewhat reversed? Why wouldn't you produce the assembly
> for C0 from C1? Bootstrapping?

The goal was, indeed, to bootstrap a compiler. I actually wrote the
compiler in a prototype version of my new language first, then hand
converted (and corrected and debugged) it to assembly to make compiler C0.

The next step will be to convert C0 (which is in Nasm) into compiler C1
(which will be written in my new language) and use C0 to compile C1.
Although I say it's the next step, much of C1 is already written but
there will be plenty of debugging to do.

>
> Why would you avoid using C as an intermediate stage since it and Forth
> are the most ubiquitous languages and since it can generate assembly
> output usually? No access to C?

The aim of boostrapping via assembly was to help keep the language as
free of conventional influences as possible. I've gone rogue!


--
James Harris

Andy Walker

unread,
Mar 11, 2019, 11:46:50 AM3/11/19
to
On 11/03/2019 14:13, Bart wrote:
> On 11/03/2019 11:12, James Harris wrote:
>> The arrangement works pretty well because in most contexts a name
>> refers to an address rather than a value. For example, in normal
>> syntax
>>    A[...]
>>    F(...)
>>    U := ...
>> All of those names refer to addresses.

Welcome to the world of Algol!

> I disagree. I think most names are used as rvalues.

That depends on the application. If you have a lot of
"a := b+c*d;", then that's three R to one L, or in the 20K Bart bench
mark 60001 R to 20004 L. But "a := 1" is just one L. You get lots of
L's in some sorts of procedure call, not in others, it just depends
too much to make categorical assertions.

[...]
> It might be argued that a name ALWAYS refers to an address, but
> whatever the underlying mechanism, in a HLL names usually refer to
> their values, even names of arrays.

That, again, depends on the application. An array or struct
rvalue *ought* to mean that writing to it is forbidden; whether it
really does or not depends on the language, but we always used to be
warned against using arrays as parameters "by value" because it
implied taking a copy. OTOH, this can be Really Useful; I have
quite often used the idea

[,] INT saveboard = board; # identity, not assignment #
... play move, recurse, blah, lots of things that change board ...;
board := saveboard # assignment, restored board #
# the alternative is to stack and unstack the changes, but this
is often much less code #

[...]
> Now this starts to make more sense, if $ is an operator. I use "^"
> for this purpose, so in a language where 'A' always means the
> equivalent of '&A' elsewhere, that means:
>                        Other HLL    Your HLL
>    int A                 A            A^
>    pointer to int P      P^           P^^
> To get at (or to write to) the int value in each case .

It's easier [FSVO ...] to omit "&", "^" etc everywhere they occur
and design a language in which dereferencing is almost always implicit.
You never need an exception in "introductory" programming, and only
occasionally in "advanced" programming [where you want explicitly to
follow a pointer in an lvalue -- rvalues always work implicitly].

[...]
> There's an extraordinary decision to use assembly. (I haven't written
> a whole program in assembly since the mid-80s, and that was because I
> had to.)

I agree. I haven't done it since 1971; I've tinkered quite a lot
since, but not a whole program, just tweaking Unix and other things that
others have already written.

--
Andy Walker,
Nottingham.

James Harris

unread,
Mar 11, 2019, 11:57:53 AM3/11/19
to
On 11/03/2019 08:36, Rod Pemberton wrote:
> On Sun, 10 Mar 2019 19:42:03 +0000
> James Harris <james.h...@gmail.com> wrote:
>
>> On 10/03/2019 18:54, Bart wrote:
>>> On 10/03/2019 15:32, James Harris wrote:
>>>> In the 0 = true ? thread Bart posted this:
>
>>>> [...]
>>>>
>>>> Here's what the function looks like in the initial dialect of my
>>>> language.
>>>>
>>>>   function fib
>>>>     parm n
>>>>     if
>>>>     on n$ < 3
>>>>       return 1
>>>>     on 1
>>>>       return fib(n$ - 1) + fib(n$ - 2)
>>>>     endif
>>>>   endfunction
>>>
>>> (Inspired by Perl?)
>>
>> What is it that reminds you of Perl?
>>
>
> It reminds me of C, BASIC (due to $), and either Pascal or block
> scripting languages.

I know the feeling. I keep reading n$ as a string variable! But it's
not. The dollar sign is not intended to be a permanent feature. It's
only there in the initial language to make the initial compiler easier
to write. In this initial dialect

n means the address of n
$ dereferences whatever is to its left

See the fuller explanation I wrote to Bart.

>
> It seems that it looks like block structured languages like Ada, Fortran
> 90, PL/1, Python, Ruby, as well as the other half-dozen languages that
> use curly braces on Wikipedia's page below, assuming your if-endif
> represents curly braces:
>
> https://en.wikipedia.org/wiki/Do_while_loop
>
>>> Compiler start-up overheads?
>>
>> More likely is because I am running this in a VM [...]
>
> Head-spin.
>
> Why? Is your language so complicated you need a mainframe at school
> or work to run it? What is wrong with your home computer? Seriously.
> Surely, you don't need a VM to protect yourself from your own code
> going "off into the weeds".

That's purely because my laptop runs Windows but I prefer to develop
under Unix. So I run Oracle VirtualBox with a Ubuntu guest, and I
develop under Ubuntu.

Some of the Ubuntu filesystem is shared with the Windows host. And I can
telnet and ssh from the host to the guest. So it's as though the Ubuntu
guest is part of the Windows environment - Windows Explorer can navigate
the Unix directories and see the files, I edit them in Notepad, and at
command prompts I can run commands on the Unix guest.

And, of course, the Unix side has all the good development tools - make,
cc, bcc, objdump, nasm, python, ld etc.

When I was developing C0 the code did go off into the weeds quite
often(!) ... but it seems quite well behaved now. Lots of macros helped.


--
James Harris

James Harris

unread,
Mar 11, 2019, 12:17:38 PM3/11/19
to
On 11/03/2019 14:13, Bart wrote:
> On 11/03/2019 11:12, James Harris wrote:
>> On 10/03/2019 20:42, Bart wrote:
>
>>> Those trailing dollar symbols. [In Perl they go at the beginning]
>
>>   A means the address of A
>>   A$ results in the value of A
>>
>> In both cases, the A in the source results in
>>
>>   lea eax, [A]
>>
>> but in the second case the $ dereference appends
>>
>>   mov eax, [eax]
>>
>> thus resulting in the value of A.
>>
>> That made the compiler quite a lot simpler.
>
> (I wrote some stuff but it's better summarised below about dereferencing.)
>
>> The arrangement works pretty well because in most contexts a name refers
>> to an address rather than a value. For example, in normal syntax
>>
>>   A[...]
>>   F(...)
>>   U := ...
>>
>> All of those names refer to addresses.
>
> I disagree. I think most names are used as rvalues.

Most occurrences of a name in source are rvals but most of the different
cases that the compiler must handle are lvals. Arrays, function calls,
assignment targets and address-ofs are all lvals. And by treating them
as such I can handle all of them in exactly the same way. A name "name"
in source results in just one instruction

lea eax, [name]

Every occurrence of a name is handled with exactly that form of
instruction. IMO that's pretty cool! It certainly simplified the compiler.

...

>> Also, in my initial dialect, pointer dereferences are explicit and are
>> accomplished by a as many trailing dollar signs as required.
>>
>>   name$$$
>
> Now this starts to make more sense, if $ is an operator.

Yes, $ is an operator. It dereferences the argument to its left.

>
> I use "^" for
> this purpose, so in a language where 'A' always means the equivalent of
> '&A' elsewhere, that means:
>
> Other HLL Your HLL
> int A A A^
> pointer to int P P^ P^^
>
> To get at (or to write to) the int value in each case .

Once I've developed the compiler a bit the $ signs will no longer be
needed and my source code will look more conventional.

>
>> I am deliberately taking a fairly minimalist approach to this to try to
>> avoid influencing the language design too much with things which already
>> exist. That's why I wanted to write the initial compiler in assembly,
>> for example, rather than using C or Python.
>
> There's an extraordinary decision to use assembly. (I haven't written a
> whole program in assembly since the mid-80s, and that was because I had to.)

The hard part was writing the compiler in the first place.
Hand-converting it to assembly was easier than I expected.

By the way, it's quite a personal achievement for me to write a
compiler. I struggled for decades to understand how such clever pieces
of code were produced. I read up on parsers, grammars, sentential forms,
lexers, state machines, code generators and the like. All that lot ended
up being more misleading than helpful. In the end it was easier to just
write it myself...!

>
> Much as I dislike writing in C, if the choice was C or ASM, I would
> choose C.
>
> I'm not surprised you have to keep your compiler simple!

:-)

>
>> I have a long way to go to get to where you are but I am looking forward
>> to moving this along further.
>
> (Well, I've been doing this a long time (my first 'compiler' was running
> in 1980.)
>
>> Have to say that even at the level I am at getting a program to produce
>> asm code is very satisfying!
>
> Best bit is as soon as you have loops, or some way of repeating (eg.
> recursive fib) and can let it loose on a real benchmark so that you know
> where you are. Even if the first result is disappointing.
>
> (One beautiful intermediate language, based on three-address-code,
> started off at half the speed of a compiler which I'd just thrown
> together with no thought. It would have take months just to get it near
> that other compiler. Sadly that approach had to be dropped.)
>


--
James Harris

Bart

unread,
Mar 11, 2019, 12:21:29 PM3/11/19
to
On 11/03/2019 15:57, James Harris wrote:

> That's purely because my laptop runs Windows but I prefer to develop
> under Unix. So I run Oracle VirtualBox with a Ubuntu guest, and I
> develop under Ubuntu.
>
> Some of the Ubuntu filesystem is shared with the Windows host. And I can
> telnet and ssh from the host to the guest. So it's as though the Ubuntu
> guest is part of the Windows environment - Windows Explorer can navigate
> the Unix directories and see the files, I edit them in Notepad, and at
> command prompts I can run commands on the Unix guest.
>
> And, of course, the Unix side has all the good development tools - make,
> cc, bcc, objdump, nasm, python, ld etc.

I sometimes run VirtualBox with Ubuntu. If I create a C rendering of my
C compiler, cc.c, and compile it with gcc-O3 under Windows, then using
that to compile sqlite3.c to .asm takes 0.31 seconds.

The same program compiled with gcc-O3 under Ubuntu, used to compile
sqlite3.c to .asm in Ubuntu, takes:

osboxes@osboxes:~$ time ./cc -c sqlite3
Compiling sqlite3.c to sqlite3.asm

real 0m0.773s
user 0m0.408s
sys 0m0.064s

I don't actually know what that means. I think it either took 0.4
seconds or 0.8. If I get 'cc' to also time itself (calls to clock() at
entry and exit):

osboxes@osboxes:~$ time ./cc -c -time sqlite3
Compiling sqlite3.c to sqlite3.asm
222065 Lines
Program: 460 ms 482K Lines per second

real 0m0.745s
user 0m0.400s
sys 0m0.072s

then it's about half a second.

Anyway it's is not greatly slower than Windows, about half the speed
(482Klps compared with 880Klps). (I can't compile to .exe under Linux
because it targets Win64 and requires things like msvcrt.dll accessed
via a couple of WinAPI functions.)

But remember your machine was taking 0.7 seconds to compile what, 20
lines? That's why I said something seemed wrong.

(But you also said it took 2 seconds to compile your compiler, which is
confusing if that is supposedly written in assembly.)

James Harris

unread,
Mar 11, 2019, 12:33:28 PM3/11/19
to
On 11/03/2019 16:21, Bart wrote:

...

> (But you also said it took 2 seconds to compile your compiler, which is > confusing if that is supposedly written in assembly.)
C0 is written in assembly. C1 is written in my new language. AIRI it
takes about 2 seconds for C0 to compile C1. (And once C1 is working C0
will no longer be needed.)

--
James Harris

Bart

unread,
Mar 11, 2019, 1:34:03 PM3/11/19
to
On 11/03/2019 15:46, Andy Walker wrote:
> On 11/03/2019 14:13, Bart wrote:
>> On 11/03/2019 11:12, James Harris wrote:
>>> The arrangement works pretty well because in most contexts a name
>>> refers to an address rather than a value. For example, in normal
>>> syntax
>>>    A[...]
>>>    F(...)
>>>    U := ...
>>> All of those names refer to addresses.
>
>     Welcome to the world of Algol!
>
>> I disagree. I think most names are used as rvalues.
>
>     That depends on the application.  If you have a lot of
> "a := b+c*d;", then that's three R to one L, or in the 20K Bart bench
> mark 60001 R to 20004 L.  But "a := 1" is just one L.

But how many times will 'a' be used subsequently? It's got to be at
least once, so that you have at least the same number of rvalue 'a's as
lvalue.

Some names I would anyway not include because they are not variable
names, and a compile will know what they are and how to deal with them.

Those include function names, labels, module names, named constants and
enumerations. So needing the address of F in F() can't be used argue
that addresses of variables are needed more often than their contents.

>> Now this starts to make more sense, if $ is an operator. I use "^"
>> for this purpose, so in a language where 'A' always means the
>> equivalent of '&A' elsewhere, that means:
>>                         Other HLL    Your HLL
>>     int A                 A            A^
>>     pointer to int P      P^           P^^
>> To get at (or to write to) the int value in each case .
>
>     It's easier [FSVO ...] to omit "&", "^" etc everywhere they occur
> and design a language in which dereferencing is almost always implicit.

I'm working towards that in my hybrid language which is more informal,
and that is about allowing the ^ in p^.m, a^[i] and f^(x) to be omitted.

In a lower level language, you want it to be more transparent, and more
explicit, so that you know exactly what's happening.

And suppose you had this:

int a := 87
ref int p := &a

print p

Will that display 87, or the address of a? Currently I have to write:

print p^, p

to get both.

> You never need an exception in "introductory" programming, and only
> occasionally in "advanced" programming [where you want explicitly to
> follow a pointer in an lvalue -- rvalues always work implicitly].

In my higher level language which is dynamically typed, explicit pointer
ops are rare. You can pass structs, arrays, strings etc around by
reference without ever having to worry about inserting the right number
of derefs.

Look, for example, at Python.

Andy Walker

unread,
Mar 11, 2019, 3:06:41 PM3/11/19
to
On 11/03/2019 17:34, Bart wrote:
>>> I disagree. I think most names are used as rvalues.
>>      That depends on the application.  If you have a lot of
>> "a := b+c*d;", then that's three R to one L, or in the 20K Bart bench
>> mark 60001 R to 20004 L.  But "a := 1" is just one L.
> But how many times will 'a' be used subsequently? It's got to be at
> least once, so that you have at least the same number of rvalue 'a's
> as lvalue.

It doesn't *have* to be. I have referred elsewhere to the friend
whose compiler insisted on "a" being used at least once between assignments
to it, and found that all sorts of people, inc library routines, had uses
that didn't conform to that pattern. Perhaps the simplest example is
setting up a unit matrix -- clear the matrix, then set the diagonal to 1,
faster than "a[i,j] := if i = j then 1 else 0 fi", which replaces N
unwanted assignments by N^2 tests, or the obscure "a[i,j] := (i%j)*(j%i)"
which does 3N^2 arithmetic operations --, but there are others. Note too
that if you pass an integer variable to a deeply recursive procedure, you
may get lots of lvalues before the result comes back to be used. But I
agree that in most normal code you are likely to find variables used more
often for their value than for their address; not sure why it should
matter, though. You need both!

> Some names I would anyway not include because they are not variable
> names, and a compile will know what they are and how to deal with
> them.
> Those include function names, labels, module names, named constants
> and enumerations.

Algol 68 would agree with you; but such things don't [formally]
have addresses anyway. [That's not the official A68 language.] In
principle, a "named constant" such as "real pi = 4*arctan(1);" or
"ref real aij = a[i,j];" does not imply the reservation of any storage;
how an implementation deals with them may well be another matter.

[...]
> And suppose you had this:
>     int a := 87
>     ref int p := &a
>     print p
> Will that display 87, or the address of a?

The HLL view would be that you have no business getting the
address [as an integer] of "a". In Algol, you have no way of doing so,
and "print(p)" displays 87. You might have a case in a debugging mode,
but then someone has to provide a special interface to the compiler to
permit this.

--
Andy Walker,
Nottingham.

David Brown

unread,
Mar 11, 2019, 5:30:38 PM3/11/19
to
On 11/03/2019 17:21, Bart wrote:
> On 11/03/2019 15:57, James Harris wrote:
>
>> That's purely because my laptop runs Windows but I prefer to develop
>> under Unix. So I run Oracle VirtualBox with a Ubuntu guest, and I
>> develop under Ubuntu.
>>
>> Some of the Ubuntu filesystem is shared with the Windows host. And I
>> can telnet and ssh from the host to the guest. So it's as though the
>> Ubuntu guest is part of the Windows environment - Windows Explorer can
>> navigate the Unix directories and see the files, I edit them in
>> Notepad, and at command prompts I can run commands on the Unix guest.
>>
>> And, of course, the Unix side has all the good development tools -
>> make, cc, bcc, objdump, nasm, python, ld etc.
>
> I sometimes run VirtualBox with Ubuntu. If I create a C rendering of my
> C compiler, cc.c, and compile it with gcc-O3 under Windows, then using
> that to compile sqlite3.c to .asm takes 0.31 seconds.
>
> The same program compiled with gcc-O3 under Ubuntu, used to compile
> sqlite3.c to .asm in Ubuntu, takes:
>
>   osboxes@osboxes:~$ time ./cc -c sqlite3
>   Compiling sqlite3.c to sqlite3.asm
>
>   real    0m0.773s
>   user    0m0.408s
>   sys    0m0.064s
>
> I don't actually know what that means. I think it either took 0.4
> seconds or 0.8.

It means the wall-clock time was 0.773 seconds, but the processor was
only busy for 0.408 seconds running user-level code and 0.064 seconds
running OS code (system calls). The difference will be time spent
waiting for IO, usually, or because the processor was busy doing other
things. If your code is multi-threaded, you can also see "user" times
that are higher than the "real" time.

The speed of pure processor operations should be the same in the VM as
the host. Memory can be a little slower. But IO can often be slower,
especially if you have the filesystems mixed (as James has), and
depending on the type of virtual disk you have and the type of emulation
(the VM can have raw access to part of the disk, or it can be cached via
the host OS).

I have had a Linux VM inside a Windows desktop with two disks, and put a
VirtualBox virtual disk on each of the Windows physical disks. I then
had Linux software raid for these two virtual disks. The result was
that filesystem operations were more than twice as fast inside the VM
than on the host.

Usually for tests like this, you just run them twice and everything is
in ram cache anyway, so the IO speeds don't matter.

Bart

unread,
Mar 11, 2019, 9:24:35 PM3/11/19
to
On 11/03/2019 11:06, Andy Walker wrote:
> On 10/03/2019 15:32, James Harris wrote:
>> That makes the a68g timing, above, very surprising, if it is the
>> run-time for the function execution.
>> Surely the a68g version must be doing something odd. Floating point
>> numbers, perhaps? Or carrying out operations via function calls?
>
>     Not "odd";  "a68g" is an *interpreter* -- you can expect a huge
> factor between that and compiled code.  About 100 is said to be typical,
> but of course it depends on the mix of calculations, I/O and standard
> functions/procedures.  All you [should] care about with an interpreter
> is whether it is fast *enough* for your purposes.

OK, 10-100 times slowdown for an interpreter may be the necessary price
to pay for a language that offers other benefits: dynamic typing, easy
to write code, rapid development.

But Algol68 is statically typed. There's supposed to be a C back-end
(for A68G) but I don't understand how that works, and it didn't appear
to function on my version.

So you get the disadvantages of static typing, with the speed of an
interpreter!


Andy Walker

unread,
Mar 12, 2019, 8:07:47 AM3/12/19
to
On 12/03/2019 01:24, Bart wrote:
[I wrote, to James:]
>>      Not "odd";  "a68g" is an *interpreter* -- you can expect a huge
>> factor between that and compiled code.  About 100 is said to be typical,
>> but of course it depends on the mix of calculations, I/O and standard
>> functions/procedures.  All you [should] care about with an interpreter
>> is whether it is fast *enough* for your purposes.
> OK, 10-100 times slowdown for an interpreter may be the necessary
> price to pay for a language that offers other benefits: dynamic
> typing, easy to write code, rapid development.

It's only "necessary" because Marcel has not [yet?] written an
Algol compiler. But the benefits of a much nicer language than C, C++,
Pascal, Java, ... -- including easy to write code and rapid development,
but also extremely thorough checking [arithmetic overflow, index out of
bounds, memory leaks, dangling pointers ...], lots of built-in facilities
[multi-length arithmetic, array slicing, identities, parallel processing,
large library (inc curses, plotlib and scientific functions), user-defined
operators, automatic coercions, automatic garbage collection, ...] -- far
outweigh the fact that my program takes 0.2s, inc compilation and yours
takes 0.02s or even 0.002s. That's real programs, not toy benchmarks.

> But Algol68 is statically typed. There's supposed to be a C back-end
> (for A68G) but I don't understand how that works, and it didn't
> appear to function on my version.

There is not "supposed to be a C back-end"; there *is* a "unit
compiler", which, *for Linux and compatible systems*, emits C code for
many units [expressions] which is compiled by "gcc" and linked in. You
are advised not to use it while a program is in development as it omits
many checks. There is more about it in Marcel's book on A68 Genie.
The source of "a68g" is available; I can't imagine that it would be
that hard to compile and link the emitted C on your system, perhaps just
needing the right flags to your C compiler [in which case it's easily
set up in the "Makefile"].

> So you get the disadvantages of static typing, with the speed of an
> interpreter!

If I ever need dynamic typing, I know where to get it; but it's
irrelevant for the sorts of program I write.

Re speed: As above, I'm not a big fan of benchmarks, though I've
done a fair amount of it for one reason or another [mostly to compare one
computer with another when purchasing]. But in the mid-70s, I did a lot
of testing on the University's new ICL 1906A, then allegedly one of the
fastest machines in the UK. The Malvern A68-R compiler came out at 628
KWips on the Whetstone benchmark [there is a full report, inc comparisons
with other languages, in a paper I wrote for an "Applications of Algol
68" conference]. Out of interest, today I tried A68G on the identical
benchmark; it clocks 549500 KWips, 875x faster, on my home PC. IOW, one
second on my PC is worth nearly a quarter of an hour on a fast mainframe
of the 1970s. I don't think I ever had a program that ran for that long;
time was rationed, and IIRC 5 minutes was the default limit after which
your program was zapped by the operating system. So A68G is plenty fast
enough for me; YMMV. [The Whetstone benchmark comes with the A68G
sources; or a listing is given in Marcel's book.]

--
Andy Walker,
Nottingham.

Bart

unread,
Mar 12, 2019, 9:43:41 AM3/12/19
to


On 12/03/2019 12:07, Andy Walker wrote:
> On 12/03/2019 01:24, Bart wrote:
> [I wrote, to James:]
>>>      Not "odd";  "a68g" is an *interpreter* -- you can expect a huge
>>> factor between that and compiled code.  About 100 is said to be typical,
>>> but of course it depends on the mix of calculations, I/O and standard
>>> functions/procedures.  All you [should] care about with an interpreter
>>> is whether it is fast *enough* for your purposes.
>> OK, 10-100 times slowdown for an interpreter may be the necessary
>> price to pay for a language that offers other benefits: dynamic
>> typing, easy to write code, rapid development.
>
>     It's only "necessary" because Marcel has not [yet?] written an
> Algol compiler.

I understand that Algol68 is mainly a research tool or, for people like
me, an interesting curiosity.

But I still consider it statically typed.

  But the benefits of a much nicer language than C, C++,
> Pascal, Java, ... -- including easy to write code and rapid development,
> but also extremely thorough checking [arithmetic overflow, index out of
> bounds, memory leaks, dangling pointers ...], lots of built-in facilities
> [multi-length arithmetic, array slicing, identities,

Those are all advantages of any scripting language, but such a language
has the extra benefits of not being constrained by a formal type system:

a := (10, 20.0, "thirty", 40..49, [50,51,55], (60,61,62))

println a[3..4]
println reverse(a)
println "thirty" in a

forall x in a do
println tostr(x):"12",":",x.type
od

Output:

(thirty,40..49)
((60,61,62),[50..51,55],40..49,thirty,20.000000,10)
3
10 : <tint>
20.000000 : <treal>
thirty : <tstring>
40..49 : <trange>
[50..51,55] : <tset>
(60,61,62) : <tlist>

> parallel processing,

For speed? In that case that is out of place in a list of reasons why
lower performance might be acceptable! Although I suppose it might help
mitigate.

>> But Algol68 is statically typed. There's supposed to be a C back-end
>> (for A68G) but I don't understand how that works, and it didn't
>> appear to function on my version.
>
>     There is not "supposed to be a C back-end";

a68g --help includes this output:

-O0, -O1, -O2, -O3: switch compilation on and pass option to back-end
C compiler.

Although using it tells me:

a68: warning: 1: optimisation has no effect on this platform.

(This is Windows.)

>     Re speed:  As above, I'm not a big fan of benchmarks, though I've
> done a fair amount of it for one reason or another [mostly to compare one
> computer with another when purchasing].  But in the mid-70s, I did a lot
> of testing on the University's new ICL 1906A, then allegedly one of the
> fastest machines in the UK.  The Malvern A68-R compiler came out at 628
> KWips on the Whetstone benchmark [there is a full report, inc comparisons
> with other languages, in a paper I wrote for an "Applications of Algol
> 68" conference].  Out of interest, today I tried A68G on the identical
> benchmark;  it clocks 549500 KWips, 875x faster, on my home PC.  IOW, one
> second on my PC is worth nearly a quarter of an hour on a fast mainframe
> of the 1970s.

Maybe. But in 1981 I was working with, for example, images of 16K 4-bit
pixels; today they might be 10M 24-bit pixels. I like using my
interpreted language, but it is sluggish for such tasks.

BTW those benchmarks appear to be for floating point.

I actually wrote an arbitrary-precision integer/floating point library a
few months back. It worked well, but for higher precision, its speed was
very slow compared with some libraries that are out there (not just gmp,
there are some even faster than that.)

And by slow it's not just 1-2 magnitudes, but perhaps half a dozen. Yet,
for the sort of stuff I played with, it seemed adequate. It's just the
threshold at which it became impractical was lower compared with other
libraries.

This is similar to your point about a slow interpreted language still
being useful.

Still, sometimes I write programs to do brute-force puzzle solving, then
every bit of speed helps.

Andy Walker

unread,
Mar 12, 2019, 9:35:16 PM3/12/19
to
On 12/03/2019 13:43, Bart wrote:
> I understand that Algol68 is mainly a research tool or, for people
> like me, an interesting curiosity.
> But I still consider it statically typed.

Yes; that has never been a problem for me in real life.

[...]
> Those are all advantages of any scripting language, but such a
> language has the extra benefits of not being constrained by a formal
> type system: [...]

When you say "not constrained by [...]", what you mean is that
type errors are not detected by the compiler. There are applications
for which that does not matter, and applications for which it is
absolutely crucial. Horses for courses. Only a fool would ever claim
that Algol, of any flavour, is always the right programming tool.

>> parallel processing,
> For speed?

No; for, um, parallel processing. Splitting a calculation into
two or more concurrent threads. Why? Because, in some cases, it's the
most logical way to describe an algorithm.

> In that case that is out of place in a list of reasons why
> lower performance might be acceptable! Although I suppose it might
> help mitigate.

I wasn't talking about "why lower performance might be acceptable"
but about why Algol is a nice language in which to write programs. It has
lots of facilities that help programmer productivity, yet all within a
simple and orthogonal language. Algol 68 is not, in itself, a "lower
performance" language, as the speed of the Malvern compiler showed;
"a68g" *is* such a language, not because it's Algol but because it's
primarily an interpreter which does much more careful checking than most
interpreters [and than virtually any true compiler]. If you don't like
the speed of "a68g", use a compiler instead; it's a bit mean to complain
about a product that Marcel has made available, both as source and as
pre-compiled binaries, to everyone free of charge.

[...]
>> There is not "supposed to be a C back-end";
> a68g --help includes this output:
> -O0, -O1, -O2, -O3: switch compilation on and pass option to
> back-end C compiler.

Yes; as I have explained several times now, there is not a "C
back end", there is what Marcel calls a "unit compiler" that emits C
code for *some* units. There is a subtle distinction.

> Although using it tells me:
> a68: warning: 1: optimisation has no effect on this platform.
> (This is Windows.)

So the OS you're using is not compatible with Linux. See the
A68G documentation:

" As of version 2 and on Linux or compatible operating systems, Algol
" 68 Genie can run in optimising mode, in which it employs a unit
" compiler that emits C code for many units involving operations on
" primitive modes INT, REAL ,BOOL, CHAR and BITS and simple structures
" thereof such as COMPLEX. "

[...]
> Still, sometimes I write programs to do brute-force puzzle solving,
> then every bit of speed helps.
Sure; so do I. But, much more often than not, algorithmic
improvements are far more important than speed of execution.

--
Andy Walker,
Nottingham.

Rod Pemberton

unread,
Mar 13, 2019, 7:37:42 AM3/13/19
to
On Mon, 11 Mar 2019 15:41:27 +0000
James Harris <james.h...@gmail.com> wrote:

> On 11/03/2019 08:36, Rod Pemberton wrote:
> > On Sun, 10 Mar 2019 15:32:11 +0000
> > James Harris <james.h...@gmail.com> wrote:

> >> [...]
> >> Over the winter I wrote the first compiler - Yay! - for the
> >> language I have been designing, and I thought this would be a good
> >> piece of code to use as a fun test.
> >
> > Well, it sure took you long enough ...
>
> Re my language, yes it has taken many years but it turned out that to
> do what I wanted to do I had to design a few things in parallel:
>
> 1. The basic language (which is all I intended to do at the start)
> 2. An intermediate form (I.F.)
> 3. A configurable scheme for converting the language to the I.F.
> 4. A plan for a network-based code-distribution system
> 5. And finally, a way to adapt the I.F. to different CPUs
>
> In reality, I couldn't begin developing until I had all those things
> sorted out.
>

1, 2, 3, and 5 make sense. I'm not sure why 4 needed to be in the
mix. That seems to be a huge project interjected into the middle.
Or, elif.

> >> I've been really surprised by how long a compile like this takes.
> >> For example, for the compiler to compile a version of itself takes
> >> something of the order of 2 seconds. Again, I presume that's
> >> because of the way it reads the source.
> >
> > So, this is not your language converted into C? (No.)
> >
> > How many stages (and/or standalone programs) do you use to
> > compile?
>
> I am not sure what you mean by stages.

stages ... passes ... number of times (re)parsing the code ...

Single-pass? Multiple pass?

I'm beginning to think it's set up as single-pass based on things in
other posts.

> Having exceptions is really handy in that it removes the need for
> checking if something went wrong after each call. You know in C where
> you might write
>
> if ((do x) went wrong)
> handle the error
> if ((do y) went wrong)
> handle the error
> if ((do z) went wrong)
> handle the error
>
> Thanks to support in my compilers (C0 and C1) and the RTL I can
> replace the above with
>
> do x
> do y
> do z
>
> which I find massively more readable.
>

Where do you specify the three error routines? Or, do you just
implement a universal routine? ... That question sounds familiar. Did
we go over this in assembly a while back?

> I've gone rogue!
>

Other than you voting for Brexit, I doubt that.

Rod Pemberton

unread,
Mar 13, 2019, 7:39:48 AM3/13/19
to
Well, there is no array there in the code itself as both A and B are
both address, and B$ is a value. This is just like how C works which
also has no arrays, only array declarations to allocate space. It
works just like C's [] subscript operator which takes an address and
offset (in any order for C), scaling the index by the type the address
pointer points to.

So, I'm glad to see that you finally came around to understanding how C
actually works, i.e., addressed based and array-less, as you've
recreated it correctly here. Congratulations!

[P.S. For David Brown, please do not respond to that. I've been
involved in more arguments on that topic regarding C that I care to
count.]
It's clean.

It's by reference, i.e., address based like C and PL/1. No
backtracking to re-parse code, nor any need to save information to do
the dereferences. Of course, the '$' is not used much in languages.
So, there is no overloading of a symbol like '*' being dereference and
multiplication in C by using the '$'.

Rod Pemberton

unread,
Mar 13, 2019, 7:50:13 AM3/13/19
to
Well, I may have jumped the gun slightly on that congratulations for
correctly recreating it. I don't see any code for scaling the index.

Do you do scaling of the index by the size of the type pointed to by
the pointer? i.e., size of the data type at the address of A. I don't
see that, but I assumed so:

A[B$] = A + B$ * sizeof(A$)

Bart

unread,
Mar 13, 2019, 8:29:41 AM3/13/19
to
On 13/03/2019 01:35, Andy Walker wrote:
> On 12/03/2019 13:43, Bart wrote:
>> I understand that Algol68 is mainly a research tool or, for people
>> like me, an interesting curiosity.
>> But I still consider it statically typed.
>
>     Yes;  that has never been a problem for me in real life.
>
> [...]
>> Those are all advantages of any scripting language, but such a
>> language has the extra benefits of not being constrained by a formal
>> type system: [...]
>
>     When you say "not constrained by [...]", what you mean is that
> type errors are not detected by the compiler.  There are applications
> for which that does not matter, and applications for which it is
> absolutely crucial.  Horses for courses.  Only a fool would ever claim
> that Algol, of any flavour, is always the right programming tool.

Static type declarations have advantages, as does dynamic typing. That's
why my hybrid language will allow both (although uses of variant types
will be greatly reduced). With dynamic typing:

record point = (int x,y)

p := point(10,20)

With static typing:

p := (10,20)

It knows what type p needs to be. Without the 'point()', (10,20) is just
a list of two elements.

>>> parallel processing,
>> For speed?
>
>     No;  for, um, parallel processing.  Splitting a calculation into
> two or more concurrent threads.  Why?  Because, in some cases, it's the
> most logical way to describe an algorithm.
>
>>          In that case that is out of place in a list of reasons why
>> lower performance might be acceptable! Although I suppose it might
>> help mitigate.
>
>     I wasn't talking about "why lower performance might be acceptable"
> but about why Algol is a nice language in which to write programs.  It has
> lots of facilities that help programmer productivity, yet all within a
> simple and orthogonal language.

That will have to be your opinion then as someone very familiar with
that language.

I found it exceedingly painful to write the simplest of programs, but
then I've never used it. (Which is a little surprising as I've been
borrowing what I'd thought of as Algol68 syntax since the early 80s.
Obviously it must have diverged considerably.)

It also struck me as overly strict:

INT a = 16rFFFFFFFF;

a68: syntax error: 1: tag "FFFFFFFF" has not been declared properly.

INT a = 16rffffffff;

a68: error: 1: BITS cannot be coerced to INT in a strong-unit

INT a=4294967295;

a68: runtime error: 1: error in INT denotation (result too large)

INT a=NOT 0;

a68: error: 1: monadic operator "NOT" INT has not been declared

Even this (should be the 0x80000000 bit pattern):

INT a=-2147483648;

a68: runtime error: 1: error in INT denotation (result too large)

This is extremely frustrating. My language (the static one, which has
64-bit ints):

int a := 0xFFFF'FFFF'FFFF'FFFF
int b := 0xffff'ffff'ffff'ffff
int c := 18446744073709551615
int d := inot 0
int e := -9223372036854775808

println a:"hw" # (h=hex, w=as 'word' which is unsigned)
println b:"hw"
println c:"hw"
println d:"hw"
println e:"hw"

Output:

FFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFF
FFFFFFFFFFFFFFFF
8000000000000000

It Just Works. One more of my measures of practicality. (Yet another is
that my syntax is case-insensitive.)

As I mentioned an unsigned type above, does Algol68 even have such a
type? I can't see it.

(I've tried to do away with an unsigned integer type (for fixed width
ints), but found it too useful.)

James Harris

unread,
Mar 13, 2019, 10:19:56 AM3/13/19
to
On 13/03/2019 11:42, Rod Pemberton wrote:
> On Mon, 11 Mar 2019 11:12:31 +0000
> James Harris <james.h...@gmail.com> wrote:
>
>> Ah, OK. The dollar signs are not meant to be permanent but there's an
>> interesting story behind them in the way that they helped make the
>> initial compiler smaller and easier to write.

...
Yes, that's exactly how it works.

>
> So, I'm glad to see that you finally came around to understanding how C
> actually works, i.e., addressed based and array-less, as you've
> recreated it correctly here. Congratulations!

LOL! I thought /I/ taught /you/ that.... ;-)

In fact, C got that from BCPL. Martin Richards deserves the credit for
that 'breakthrough'! (Or, in fact, maybe he got it from CPL. Don't know.)
Yes, the whole compiler works one symbol at a time, generating asm
output as appropriate for that symbol. And you have hit on exactly why I
chose to use the dollar sign. (Am not sure what I'll use for dereference
in the long term but it probably won't be a dollar sign, although it
will be a postfix operator.)

Speaking of saving information, I did initially read the source bit by
bit into a large buffer (with no deletions so I didn't have to do any
storage reclamation). But I found that the only thing I actually had to
save throughout the entire compiler was the name of a function. And
since functions cannot be nested there can only be one function current
at a time. I therefore have just one small buffer; it holds the current
function name if we are in a function. And that saved function name is
used to generate asm labels. That removed even the need for buffer
management. There's something satisfying about being able to strip out
code that's not needed after all.

There is some stack use. Precedences and associativity in expressions
are evaluated by code which, naturally, uses the machine's stack for
saved registers and return addresses. And I did have to implement a
small context stack to keep track of nested ifs, loops and similar. But,
overall, there's very little complexity in the code.


--
James Harris

James Harris

unread,
Mar 13, 2019, 10:54:15 AM3/13/19
to
On 13/03/2019 11:52, Rod Pemberton wrote:
> On Wed, 13 Mar 2019 07:42:24 -0400
> Rod Pemberton <inv...@lkntrgzxc.com> wrote:
>
>> On Mon, 11 Mar 2019 11:12:31 +0000
>> James Harris <james.h...@gmail.com> wrote:

...

>>> A[B$]
>>>
>>> meaning to take the value of B and use it to index into array A.

...

> Well, I may have jumped the gun slightly on that congratulations for
> correctly recreating it. I don't see any code for scaling the index.
>
> Do you do scaling of the index by the size of the type pointed to by
> the pointer? i.e., size of the data type at the address of A. I don't
> see that, but I assumed so:
>
> A[B$] = A + B$ * sizeof(A$)

Yes, as said, I 'use' the value of B to index into array A. It /is/
scaled. In fact, as I still get a buzz from seeing my own code generated
(I am still at that stage!) here's a complete example.

Say the input program is as follows.


dialect 001
function F
A[B$]
endfunction


That gets compiled to the following - which is the complete output of
the compiler. You can see the scaling below in the SHL instruction.


cpu 386
bits 32
extern ___exception



;****************************************
;
; Function F
;
;****************************************

global F
F:
%push
%stacksize flat
%assign %$localsize 0
push edx
push ecx
push ebx
push ebp
%arg .dummyarg0:dword
%arg .dummyarg1:dword
%arg .dummyarg2:dword
mov ebp, esp
sub esp, _$_F_localsize + _$_F_tempsize


lea eax, [A]
push eax ;Save operand
lea eax, [B]
mov eax, [eax] ;Deref
pop ebx ;Retrieve operand
shl eax, 2 ;Convert index to offset
add eax, ebx ;Element address


mov eax, 0

_$_F_return:
mov esp, ebp
pop ebp
pop ebx
pop ecx
pop edx
ret
_$_F_localsize equ %$localsize
_$_F_tempsize equ 0 ;No temporaries used
%pop




;Compile ended. 0 errors



A couple of other points. The array reference as coded results in the
/address/ of the element. To get the value would need a dollar sign to
be appended.

And you can see the function name, F, has to appear many times - no
fewer than 8, if I counted correctly, and some of them being after the
label's first use. That's why as I mentioned in a prior post that it was
the one piece of the source which had to be saved.


--
James Harris

David Brown

unread,
Mar 13, 2019, 11:28:37 AM3/13/19
to
That is not an example of static and dynamic typing.

"Dynamic typing" means the type of an object or identifier is only known
at run time, while "static typing" means it is fixed at compile (or
interpretation) time.

In python, if you write:

def foo(a) :
return len(a)

then you have dynamic typing. When the function "foo" is created, the
compiler (byte compiler) does not know the type of "a". The function
will use late binding to figure out at run-time what "len" function
should be applied to "a", based on the type it has at the time.
Sometimes the function "foo" will be called with a string, sometimes it
will be called with a list, sometimes with an integer (in which case,
the late binding for "len" will fail and an exception will be thrown).

In your first example, you have static typing - the compiler (or
interpreter) knows what a "point" type is, and it knows that "p" is an
instance of that type. In the second example, you also have static
typing - the compiler knows that "p" is an instance of type "pair of two
integers", or "list of integers", or whatever type that is in your language.

James Harris

unread,
Mar 13, 2019, 11:30:37 AM3/13/19
to
On 13/03/2019 11:40, Rod Pemberton wrote:
> On Mon, 11 Mar 2019 15:41:27 +0000
> James Harris <james.h...@gmail.com> wrote:
>
>> On 11/03/2019 08:36, Rod Pemberton wrote:
>>> On Sun, 10 Mar 2019 15:32:11 +0000
>>> James Harris <james.h...@gmail.com> wrote:
>
>>>> [...]
>>>> Over the winter I wrote the first compiler - Yay! - for the
>>>> language I have been designing, and I thought this would be a good
>>>> piece of code to use as a fun test.
>>>
>>> Well, it sure took you long enough ...
>>
>> Re my language, yes it has taken many years but it turned out that to
>> do what I wanted to do I had to design a few things in parallel:
>>
>> 1. The basic language (which is all I intended to do at the start)
>> 2. An intermediate form (I.F.)
>> 3. A configurable scheme for converting the language to the I.F.
>> 4. A plan for a network-based code-distribution system
>> 5. And finally, a way to adapt the I.F. to different CPUs
>>
>> In reality, I couldn't begin developing until I had all those things
>> sorted out.
>>
>
> 1, 2, 3, and 5 make sense. I'm not sure why 4 needed to be in the
> mix. That seems to be a huge project interjected into the middle.

A huge project? It is a bit. I say I've got it worked out that that's
just in outline. I still need to bottom out some of the details and it's
something I'm working on at the moment. I may post about it at some
point to see what others think.

Suffice to say just now that it's 'needed' because I want to be able to
avoid things like explicit code distribution and installation steps etc
and have modules simply refer to external objects by their names in a
universal naming system. With your memory you may remember me talking
about different universal naming schemes a while back.
Or, elsif. :-)

In fact, I quite expected you to see things you didn't like with my if
construct so in some ways I'm quite pleased that you had no objections
to it. Or, at least, you didn't mention any. I thought you would.
(That's a compliment, by the way. I think of you as someone who raises
some first-rate challenges.)

>
>>>> I've been really surprised by how long a compile like this takes.
>>>> For example, for the compiler to compile a version of itself takes
>>>> something of the order of 2 seconds. Again, I presume that's
>>>> because of the way it reads the source.
>>>
>>> So, this is not your language converted into C? (No.)
>>>
>>> How many stages (and/or standalone programs) do you use to
>>> compile?
>>
>> I am not sure what you mean by stages.
>
> stages ... passes ... number of times (re)parsing the code ...
>
> Single-pass? Multiple pass?
>
> I'm beginning to think it's set up as single-pass based on things in
> other posts.

The compiler itself it definitely single pass. But that's mainly because
it needed to be simple. Much of the initial arrangement is meant just
for bootstrap purposes. But all other things being equal I'd rather keep
simplicity where possible.

>
>> Having exceptions is really handy in that it removes the need for
>> checking if something went wrong after each call. You know in C where
>> you might write
>>
>> if ((do x) went wrong)
>> handle the error
>> if ((do y) went wrong)
>> handle the error
>> if ((do z) went wrong)
>> handle the error
>>
>> Thanks to support in my compilers (C0 and C1) and the RTL I can
>> replace the above with
>>
>> do x
>> do y
>> do z
>>
>> which I find massively more readable.
>>
>
> Where do you specify the three error routines? Or, do you just
> implement a universal routine? ... That question sounds familiar. Did
> we go over this in assembly a while back?

The idea is that an exception would be propagated down the call stack to
a caller which was prepared to handle that type of exception. Yes, they
could be handled in the module which calls x, y and z if that's what the
programmer wanted to do but the point is that they don't have to.

Specifically, if there's no exception mechanism then exceptions are
ignored by default; but if there's an exception mechanism then they are
flagged-up by default.

They are also often more informative.

For example, in a C-like form (i.e. this is not meant to be C) one has
to explicitly test for exceptions or they won't be noticed.


if ((fd = open(...)) != 0) {
fprintf(stderr, "Open error %i on filename %s\n", ...);
break;
}
if (!(m = malloc(...)) {
fprintf(stderr, "Out of memory requesting %i bytes\n", ...);
break;
}
if ((bytes = read(...)) == 0) {
fprintf(stderr, "Error: file %s is empty\n", ...);
close(...);
break;
}

As well as having to check return codes, there's a natural burden on the
programmer to try to include useful information in an error message. But
with exception handling the exception can hold all sorts of relevant
info which will be reported by default.


--
James Harris

Andy Walker

unread,
Mar 13, 2019, 11:40:37 AM3/13/19
to
On 13/03/2019 12:29, Bart wrote:
[...]
> [Algol 68] also struck me as overly strict: [...]

That's because, no doubt based on experience with C, you don't want
to distinguish integer, bits and Boolean types. It is, of course, easier
to conflate those types, but it leads to type errors further down the line
when you've forgotten what type "a" [in your examples] is really supposed
to be, and whether you proposed to use it for arithmetic, as an array of
flags, or as a true/false flag. If you use "INT"/"BITS"/"BOOL" as nature
intended, you won't get into the messes you describe.

Aside: You really can't expect to write Algol fluently based on
seeing it 40-odd years ago and subsequently relying on memory and C.
You would no doubt have the same problems with [eg] Pascal, Basic, Ada,
... if you tried to write code in them on the same basis. Marcel's book
on A68G gives lots of examples, and is faster than me trying to explain
everything to you.

> As I mentioned an unsigned type above, does Algol68 even have such a
> type? I can't see it.

Not within the official language, but it's not hard to roll your
own for the occasional application where "no overflow" is necessary, for
modular arithmetic [I did so a couple of weeks back for one of the Euler
problems]. For most well-written C code, it works virtually one-for-one
to replace unsigned ints by "BITS" and signed ints by "INT". [Yes, I
know there are exceptions, usually where you really, really want to
use integers between "maxint" and "2*maxint" and really, really don't
want to use longs.]

--
Andy Walker,
Nottingham.

Bart

unread,
Mar 13, 2019, 1:02:48 PM3/13/19
to
The examples could have been clearer. The dynamic one has distinct
'record' and 'struct' types (one variant, one packed, defined
differently), which I will fix here:

Dynamic (with 'struct):

type point = struct # packed record type
int64 x,y
end
...
var p # optional when p is local
p := point(x,y) # p has type 'point'; x,y checked at runtime
p := (x,y) # p has type 'list'; x,y can be anything

Static:

record point = (int x,y)
...
point p
p := (x,y) # p /always/ has type 'point'

----------------------

The purpose of the examples was to show that the dynamic language needs
a 'point' conversion to ensure the desired types:

(point(u,v), point(x,y))

which is not needed in the static one:

((u,v), (x,y))

So, unusually, static means less clutter in this case.

What might have confused you is being able to define a record or struct
type in a dynamic language, which is unusual (certainly Python can't do
that, as a bunch of incompatible bolt-ons to achieve similar). But it is
necessary for interfacing to external libraries.

David Brown

unread,
Mar 13, 2019, 2:13:31 PM3/13/19
to
That's a good step clearer, but not there yet. I don't see the
distinction between "record" and "struct" here - nor do I see why you
would need it. In the "dynamic" case, "var p" just declares "p" to be a
variable without a specified type - it gets its type at run-time. In
the "static" case, "point p" declares "p" to be of type "point". The
details of what "point" is don't really matter for the example or for
the dynamic/static distinction.

> ----------------------
>
> The purpose of the examples was to show that the dynamic language needs
> a 'point' conversion to ensure the desired types:
>
>     (point(u,v), point(x,y))
>
> which is not needed in the static one:
>
>     ((u,v), (x,y))
>
> So, unusually, static means less clutter in this case.

Static typing often means more code when a variable is declared, but
less when it is used (unless it's use is entirely generic).

>
> What might have confused you is being able to define a record or struct
> type in a dynamic language, which is unusual (certainly Python can't do
> that, as a bunch of incompatible bolt-ons to achieve similar). But it is
> necessary for interfacing to external libraries.

No, what confused me was that both your examples were static typing.
But you've fixed that now.

Bart

unread,
Mar 13, 2019, 3:28:27 PM3/13/19
to
On 13/03/2019 18:13, David Brown wrote:
> On 13/03/2019 18:02, Bart wrote:

>> Dynamic (with 'struct):
>>
>>      type point = struct    # packed record type
>>          int64 x,y
>>      end
>>      ...
>>      var p                  # optional when p is local
>>      p := point(x,y)        # p has type 'point'; x,y checked at runtime
>>      p := (x,y)             # p has type 'list'; x,y can be anything
>>
>> Static:
>>
>>      record point = (int x,y)
>>      ...
>>      point p
>>      p := (x,y)                 # p /always/ has type 'point'
>>
>
> That's a good step clearer, but not there yet.  I don't see the
> distinction between "record" and "struct" here - nor do I see why you
> would need it.  In the "dynamic" case, "var p" just declares "p" to be a
> variable without a specified type - it gets its type at run-time.  In
> the "static" case, "point p" declares "p" to be of type "point".  The
> details of what "point" is don't really matter for the example or for
> the dynamic/static distinction.

The distinction between struct and record (in the dynamic language) is this:

type ws_systemtime = struct
word16 year # word16 = uint16
word16 month
word16 dayofweek
word16 day
word16 hour
word16 minute
word16 second
word16 milliseconds
end

This above is a specific layout, 16 bytes in total, and is necessary to
communicate with external, usually C-based APIs.

record huffnode =
var child0, child1
var symbol
var suppbits
end

In the above, each field is nominally a fixed, 16-byte tagged variant.
But each can contain any data type of any size with any amount of nesting.

Note that in this language, it is only variables and objects that have
dynamic type. Unlike in Python, everything else is static. Python is
/too/ dynamic and that generates some problens,

David Brown

unread,
Mar 13, 2019, 4:27:24 PM3/13/19
to
You are still not telling me the distinction between "struct" and
"record". It is also not clear if "huffnode" here is a type or a
variable - your syntax is not obvious.

I get the impression that "var" is a variant type, with a tag to track
the actual type stored:

<https://en.wikipedia.org/wiki/Tagged_union>

If it can hold any type, then presumably you have some kind of late
binding for functions using it - that makes it a true dynamic type. (A
tagged union is actually a static type - the types it can contain are
statically determined.)

I don't see from your code above whether it is possible to have a
standalone variable of type "var", a struct containing "var" members, a
record containing members that are not "var", whether "huffnode" is
declared as a type here, or what the distinction is between "record" and
"struct".

Bart

unread,
Mar 13, 2019, 5:37:31 PM3/13/19
to
On 13/03/2019 20:27, David Brown wrote:
> On 13/03/2019 20:28, Bart wrote:

>>      type ws_systemtime = struct

>>      record huffnode =


> You are still not telling me the distinction between "struct" and
> "record".

I use "struct" to mean a low level, packed record of flat, fixed size
elements, corresponding to the structs used in C APIs.

I use "record" to mean a more general aggregate type, which can contain
anything.

  It is also not clear if "huffnode" here is a type or a
> variable - your syntax is not obvious.

OK, I forgot that you are used to C where struct definitions can be all
mixed up with variable declarations:

struct {int x,y;}; // defines nothing
struct T {int x,y;}; // defines struct tag T
struct {int x,y;} A; // defines variable A
struct U {int x,y;} B; // struct tag U and variable B
typedef struct {int x,y;}; // nothing
typedef struct V {int x,y;}; // struct tag V
typedef struct W {int x,y;} C; // struct tag W, usertype C

In the above language, type definitions are /always/ separate from
variable definitions. The examples define two user types:
"ws_systemtime" and "huffnode", and 0 variables. Individual instances
can be created like this:

a := new(ws_systemtime)

or:

b := huffnode(0,0,0,0)

Neither a nor b need declaring, but that is done as follows:

var a, b


> I get the impression that "var" is a variant type, with a tag to track
> the actual type stored:
>
> <https://en.wikipedia.org/wiki/Tagged_union>

Well it's more of a (tag, value/pointer). The actual structure is rather
elaborate (see sig).

> If it can hold any type, then presumably you have some kind of late
> binding for functions using it - that makes it a true dynamic type.

Thanks. I've been using variations on this since approx the mid-80s;
good to know it has had a dynamic type all along!

Since a variant uses an actual type tag field which is initialised to
'void' when created, then it is necessarily a dynamic type as it can't
assume any other until assigned to. (The same with unnamed objects.)

  (A
> tagged union is actually a static type - the types it can contain are
> statically determined.)

Only inside the host interpreter.

--

Variant type record. (In this static language, 'record' means the same
thing as 'struct'. But 'struct' and 'union' blocks inside the record are
for organising fields, not naming new types.)

global record varrec = ! 16 bytes
union
struct
word16 tag
byte hasref
union
byte stackadj
byte opdims
byte ittype
end
end
word32 tagx
end
union
struct
word16 refelemtag
byte refbitoffset
byte spare2
end
struct
int16 frameoffset
u8 exceptiontype
u8 nexceptions
end
int32 frameptr_low
int32 itcount
word32 bndigits
end
union
ref objrec objptr
variant varptr
ref byte packptr
ref byte refptr
ref int64 dptr
int64 value
word64 uvalue
real64 xvalue
ref intpc retaddr
struct
int32 range_lower
int32 range_upper
end
end
end

global record objrec = !32 bytes
union
struct
word32 refcount
word16 elemtag
union
byte mutable
byte bitoffset
end
byte objtype
end
word64 refcountx
end
union
ref byte ptr
variant vptr
ichar strptr
ref void bnptr
ref[0:]int32 oldbnptr
end
union
struct
union
struct
word32 length
int32 lower
end
end
union
ref objrec objptr2
struct
word32 allocated
word32 dictitems
end
word64 allocated64
end
end
[16]byte data128
end
end

David Brown

unread,
Mar 14, 2019, 5:11:17 AM3/14/19
to
On 13/03/2019 22:37, Bart wrote:
> On 13/03/2019 20:27, David Brown wrote:
>> On 13/03/2019 20:28, Bart wrote:
>
>>>      type ws_systemtime = struct
>
>>>      record huffnode =
>
>
>> You are still not telling me the distinction between "struct" and
>> "record".
>
> I use "struct" to mean a low level, packed record of flat, fixed size
> elements, corresponding to the structs used in C APIs.
>
> I use "record" to mean a more general aggregate type, which can contain
> anything.
>

This is /slow/ progress.

You use "struct" to mean structures with identical layout to C structs,
for compatibility with external functions. Fair enough.

(C structs don't need to be fixed size, as they can end in a flexible
array member. And they don't need to be flat - pointers held within the
struct can be used to give more advanced structures. Other higher-level
languages may hide such implementation details - and that hiding is
usually a good thing - but it's the same when you dig down far enough.)

I am still far less clear what "record" is doing. is "record" purely a
dynamic construct, so that the elements in it can take different sizes
and types at run-time, and are found by name lookup (or indexed lookup)
rather than at specific offsets?


>   It is also not clear if "huffnode" here is a type or a
>> variable - your syntax is not obvious.
>
> OK, I forgot that you are used to C where struct definitions can be all
> mixed up with variable declarations:

Ah, you mean you forgot I am used to C where the language is specified,
and I understand it - rather than being used to your personal language
which is known only to you, and where I have to guess given tiny
snippets of code and vague descriptions? Surely that difference is more
important than whether you can declare a type and variable in a single
statement? (In C, they are not "mixed up" - the parts are quite clear
and distinct. But there is plenty of scope for writing unclear C code,
of that there is no doubt.)

>
>     struct {int x,y;};                // defines nothing
>     struct T {int x,y;};              // defines struct tag T
>     struct {int x,y;} A;              // defines variable A
>     struct U {int x,y;} B;            // struct tag U and variable B
>     typedef struct {int x,y;};        // nothing
>     typedef struct V {int x,y;};      // struct tag V
>     typedef struct W {int x,y;} C;    // struct tag W, usertype C

Are we talking C here, or your language? The "defines nothing" cases
/do/ actually define a user-defined type in C. But as they don't define
a name that can be used to refer to it, the definition is useless. I
think perhaps that form is a leftover from the way structs were defined
in the early days of C (the struct member names were globally visible,
not just within the context of the struct). Alternatively, it can be
viewed as an artefact of the syntax used for typedef on a struct,
allowing you to name the type without giving a struct tag. The code
lines you described as "defines nothing" are harmless and have no
noticeable effect (at least with non-ancient C), and thus no one writes
them or cares about them.

But in all your efforts to try to show that C is complicated (it isn't)
or mixed up (it isn't), you managed to miss one form:

typedef struct { int x, y; } D;

That is the form I use for defining types. I don't bother giving struct
tags unless the type will be recursive.


What you continually fail to understand about C is that many of its
grammar rules are designed to be /simple/. The idea is to make them
simple and flexible, for the convenience of people writing compilers (or
parsers), for the convenience of people writing the language
specifications and rules, and for the convenience of people learning and
using the language. This does mean that you get more possible
combinations of how you can use the syntax - and can write things that
become hard to follow. But it does not mean that you /have/ to do that,
or that programmers actually /do/ write such code. (There are always a
few programmers who take pleasure in writing incomprehensible code.
They will do so regardless of the language and rules - if they used your
languages, they would do so in a way that was illegible to you.)

C has rules to say "this is how you declare a struct", "this is is how
you give a type alias", "this is how you declare a variable" - and you
can combine them as you find practical in your code.

Another language can have different rules, and different ways of
combining them. You might /prefer/ those rules to the rules of C, but
that does not make C inherently bad. (And because C is flexible, there
is nothing stopping you writing your own code as though the combinations
you dislike were not allowed.)

>
> In the above language, type definitions are /always/ separate from
> variable definitions. The examples define two user types:
> "ws_systemtime" and "huffnode", and 0 variables.

It's fair enough that you require the declaration of user types and the
definition of variables to be separate. It's different from the
requirements of C, but not different from the practice of most C
programming.

But why do you have wildly different syntax for the two ways to declare
user types?

type ws_systemtime = struct ...

record huffnode = ...

Why not

type huffnode = record ...

or

struct ws_systemtime = ...

Surely that would be more consistent?


> Individual instances
> can be created like this:
>
>    a := new(ws_systemtime)
>
> or:
>
>    b := huffnode(0,0,0,0)

Again, why the different syntaxes? If they have different meanings
(such as one being heap allocated, and one being static or stack-local),
then please do not mix up such differences while you are trying to
explain the difference between records and structs.

>
> Neither a nor b need declaring, but that is done as follows:
>
>    var a, b
>
>
>> I get the impression that "var" is a variant type, with a tag to track
>> the actual type stored:
>>
>> <https://en.wikipedia.org/wiki/Tagged_union>
>
> Well it's more of a (tag, value/pointer). The actual structure is rather
> elaborate (see sig).

OK, it's basically a tag and a generic pointer. The rest is just
implementation detail, such as for more efficient handling of types that
will fit into the fixed size struct without extra indirection. (There
is no need to give the whole declaration here - and if there were need,
please don't abuse signatures for the purpose.)

>
>> If it can hold any type, then presumably you have some kind of late
>> binding for functions using it - that makes it a true dynamic type.
>
> Thanks. I've been using variations on this since approx the mid-80s;
> good to know it has had a dynamic type all along!
>
> Since a variant uses an actual type tag field which is initialised to
> 'void' when created, then it is necessarily a dynamic type as it can't
> assume any other until assigned to. (The same with unnamed objects.)

That does not follow. You can happily define a tagged union that
includes void types. In C, you can do this:

typedef enum { typeVoid, typeInt8, typeInt16, ...,
typeCharP, typeIntP, ...,
typeVariant
} typeTags;

typedef struct {
typeTags typeTag;
union {
uint8_t u8;
uint16_t u16;
uint32_t u32;
uint64_t u64;
void * p;
} data;
}

There is a statically defined tagged union, where the contained type
will be "void" with zero initialisation.

(If you want to stretch your brain a little, C++17 has not only static
type-safe union types std::variant, but the /static/ type-safe generic
std::any type. Don't ask me how these are implemented, especially
std::any - it is C++ magic beyond the understanding of most C++ experts.
That's okay, of course, as very few C++ programmers need to implement
such types - they just need to use them.)

Rod Pemberton

unread,
Mar 14, 2019, 6:36:59 AM3/14/19
to
On Wed, 13 Mar 2019 15:30:32 +0000
James Harris <james.h...@gmail.com> wrote:

> On 13/03/2019 11:40, Rod Pemberton wrote:
> > On Mon, 11 Mar 2019 15:41:27 +0000
> > James Harris <james.h...@gmail.com> wrote:

> >> OK, my if construct is, currently, a series of condition-action
> >> pairs as in
> >>
> >> if C1
> >> A1
> >> on C2
> >> A2
> >> on C3
> >> A3
> >> on C4
> >> A4
> >> endif
> >>
> >> That's conventional if you think of "on" as elseif.
> >
> > Or, elif.
>
> Or, elsif. :-)
>
> In fact, I quite expected you to see things you didn't like with my
> if construct so in some ways I'm quite pleased that you had no
> objections to it. Or, at least, you didn't mention any. I thought you
> would. (That's a compliment, by the way. I think of you as someone
> who raises some first-rate challenges.)

"you didn't mention any"

I mentioned the (second) construct effectively being braces for blocks.
Without the (first) construct, I would definitely have had some
comments on the (second) construct's logic ...

But, if you so wish for some objections, how do you implement blocks of
code without using 'identified brackets' or curly braces etc?
Unnecessary? E.g., due to block structured language. Do you allow
this form (below)? I.e., without an endif. If you do, I think you need
something similar to curly braces to allow A1 to be a block instead of
a single line.

if C1
A1

As for 'on' instead of elif etc, as long as you (or others) understand
what it means, it doesn't matter what the word is ... E.g., BASIC (or
certain variants) had ON GOTO which people understood. Think of
someone who doesn't speak English programming in C. They don't need to
actually understand the word's meaning in English, just what the
equivalent action or nearest effect of it is in their own language.
I have some really mixed thoughts on this ...

E.g., I don't like the fact that the error handling is moved away from
the location for the source of the error. In my experience, doing this
is usually a code maintenance issue, i.e., harder to locate a problem
and harder to code initially. At the same time, I don't know if a
generic exception handler can be designed to handle every error
satisfactorily or to always emit an appropriate error message, i.e.,
limitations. And, I'm still not sure if the an error routine or
exception mechanism must be set up by the programmer for each error or
not. However, I do like the fact that the code is more clean or
simpler, which makes me wonder if a different local construct would be
more effective. I also don't know what effect turning an error into an
exception will create, e.g. exception conflicts or perhaps failure to
exit etc since the code would still be executing for the exception
handler instead of exiting etc for an error.

In time, I guess you'll decide if there were any unforeseen issues.

Rod Pemberton

unread,
Mar 14, 2019, 7:19:21 AM3/14/19
to
NASM's local labels wouldn't work? E.g., for _$_F_return.

NASM local labels
https://www.nasm.us/doc/nasmdoc3.html#section-3.9


Why are you defining EQU equates at the end? ...

It appears that _$_F_localsize is always "%$localsize". So, why are
you even using an EQU for it? ... Does it change?

Also, _$_F_return is not used by the posted code. Assuming elsewhere.

I'm not sure about how you compute a value for _$_F_tempsize, so you
might be stuck with that or not depending ...

The EQU section of the NASM doc says the EQU starts with a label. So,
I don't see anything stating that a local label can't be used for the
label of an EQU. NASM's preprocessor tends to be very generic. Did
you try this? I.e., local labels for _$_F_localsize and _$_F_tempsize

NASM EQU
https://www.nasm.us/doc/nasmdoc3.html#section-3.2.4

I'm guessing that you used EQU's for _$_F_localsize and _$_F_tempsize
in order to have NASM perform an addition instead of maybe being
required to use multiple assembly instructions by NASM, i.e., two sub's
against esp. Yes? This may be part of the issue you're having, i.e.,
needing EQU's and label's created from the procedure's name.

If local labels work for both labels and EQUs, then I think you may
only need F twice, e.g., for global F and F:

As for "push eax ... pop ebx", you could do a "mov ebx, eax" or "xchg
ebx, eax". Or, you could "lea ebx, [B] ...", i.e., B against ebx,
which would allow you to eliminate the esp/ebp prologue and epilogue
code, as you'd be working against registers without stack parameters.

Bart

unread,
Mar 14, 2019, 7:50:30 AM3/14/19
to
On 14/03/2019 09:11, David Brown wrote:
> On 13/03/2019 22:37, Bart wrote:

>> OK, I forgot that you are used to C where struct definitions can be all
>> mixed up with variable declarations:
>
> Ah, you mean you forgot I am used to C where the language is specified,

Where the language is more chaotic, and so you might expect others to be
as well.

> and I understand it - rather than being used to your personal language
> which is known only to you, and where I have to guess given tiny
> snippets of code and vague descriptions?

The original point of my post was to show that sometimes, static typing
can lead to tighter and less cluttered code. Usually the opposite is
true. It didn't need an extensive tutorial on both languages used; they
could even have been made up pseudo-code.

(If interested, look at the sections titled "Records" and "Structs"
here: https://github.com/sal55/qx/blob/master/qdocs.md, for the dynamic
language, and look for "Defining Records" in
https://github.com/sal55/qx/blob/master/mdocs.md for the static one. I'm
in the process of combining those languages into one, rather an
ambitious project.)


>> (1)  struct {int x,y;};                // defines nothing
>> (2)  struct T {int x,y;};              // defines struct tag T
>> (3)  struct {int x,y;} A;              // defines variable A
>> (4)  struct U {int x,y;} B;            // struct tag U and variable B
>> (5)  typedef struct {int x,y;};        // nothing
>> (6)  typedef struct V {int x,y;};      // struct tag V
>> (7)  typedef struct W {int x,y;} C;    // struct tag W, usertype C

> But in all your efforts to try to show that C is complicated (it isn't)
> or mixed up (it isn't), you managed to miss one form:
>
> (8) typedef struct { int x, y; } D;
>
> That is the form I use for defining types. I don't bother giving struct
> tags unless the type will be recursive.

I've numbered the examples (1) to (8) including yours. Here's how I
define those examples in my static language (that one doesn't have the
confusion between 'static' and 'record'); I use the more compact (...)
delimiters:

(1) Not valid

(2) record T = (int x,y) # defines type not a tag

(3) record T = (int x,y) # *must* have a name for A's type
T A # now declare the variable A

(4) record U = (int x,y) # define type name not tag
U B # define variable B of type U

(5) Not valid

(6) record V = (int x,y) # define type not tag

(7) record C = (int x,y) # define type name; discard 'tag'

(8) record D = (int x,y)

Where the C did something useful, notice that the versions here are all
completely consistent: define a type (all exactly the same way) and
define a variable.

> (And because C is flexible, there
> is nothing stopping you writing your own code as though the combinations
> you dislike were not allowed.)

That would be /my/ code. Everyone else /could/ be using a bad style and
that has to be considered. Even you considered that my examples may have
been declaring variables not types, because /C/ allows variables to be
declared of anonymous struct types.


> But why do you have wildly different syntax for the two ways to declare
> user types?
>
> type ws_systemtime = struct ...
>
> record huffnode = ...
>
> Why not
>
> type huffnode = record ...
>
> or
>
> struct ws_systemtime = ...
>
> Surely that would be more consistent?

<Shrug> I used to have only 'type'. Then I experimented with 'record'
having a dedicated syntax. That seemed to work, so I might roll it out
to 'struct'. (My new language will probably use 'record' for a 'flat'
struct-like aggregate type, and 'class' for a higher level, managed
version.)

>
>> Individual instances
>> can be created like this:
>>
>>    a := new(ws_systemtime)
>>
>> or:
>>
>>    b := huffnode(0,0,0,0)
>
> Again, why the different syntaxes?

Why does C? In C you do:

static systemtime A; # create A initialised to zeros
static huffnode B = {0,0,0,0}; # create B initialised to zeros

In a dynamic language, you need way to assign a value of a certain type,
and the generic way of doing that is with new(), but there need to be
alternatives:

a := new(list,6,77) # 6 elements all initialised to 77
a := (77,77,77,77,77,77) # same thing

b := new(list,10 million) # 10M elements all set to 'void'
b := (void,void,void,... # this could take some time to type
b := (void,)*10 million # alternative

c := ???
c := (10,20,30,40,50,60)

Here this can't be done with 'new', as new doesn't allow the contents to
be enumerated. (Although it is trivial to write a user-function that
does exactly that.)

d:= new(array,byte,'A'..'Z') # a 26-element byte-array indexed
# from 65 to 90 ('A' to 'Z')
d:= ???

Here there is no tidy constructor syntax to specify the same. However,
this is possible:

type T = ['A'..'Z']byte
d := new(T)
d := T(1,2,3,... 26) # there must be 26 elements

As well as doing it via a user-function.

It's all useful diversity, but still based on only two main features:

x := new(T,...) # use new (just a function really)
y := T(....) # use a constructor for aggregate types
z := T(a) # That syntax is also type conversion


> (If you want to stretch your brain a little, C++17 has not only static
> type-safe union types std::variant, but the /static/ type-safe generic
> std::any type. Don't ask me how these are implemented, especially
> std::any - it is C++ magic beyond the understanding of most C++ experts.

My new hybrid language will attempt something similar:

variant a,b,c # can assume any type
println a,b,c # display void void void

Which allow some odd-looking things:

a := a[i] # works when a is a list
a := a^ # (deref) works when a is a pointer
a := (a,a,a,a,a) # etc

Because normal static typing rules don't apply.

I'm fairly confident I can make it work without getting into C++
complexities.

David Brown

unread,
Mar 14, 2019, 9:04:04 AM3/14/19
to
On 14/03/2019 12:50, Bart wrote:
> On 14/03/2019 09:11, David Brown wrote:
>> On 13/03/2019 22:37, Bart wrote:
>
>>> OK, I forgot that you are used to C where struct definitions can be all
>>> mixed up with variable declarations:
>>
>> Ah, you mean you forgot I am used to C where the language is specified,
>
> Where the language is more chaotic, and so you might expect others to be
> as well.
>

If you didn't start with the assumption that anything C does must be the
worst possible choice and viewed in the most negative light, then I
expect your programming life would be easier, your languages would be
better, and your Usenet discussions would be more friendly.

>> and I understand it - rather than being used to your personal language
>> which is known only to you, and where I have to guess given tiny
>> snippets of code and vague descriptions?
>
> The original point of my post was to show that sometimes, static typing
> can lead to tighter and less cluttered code. Usually the opposite is
> true.

That strikes me as an odd point to try to make. I don't see static or
dynamic typing as particularly relevant there. What usually makes more
of a difference is when you have explicit typing compared to implicit or
inferred typing.

> It didn't need an extensive tutorial on both languages used; they
> could even have been made up pseudo-code.

I didn't want a tutorial. You introduced the two concepts of "structs"
and "records" in your language, and then over the course of 3 or 4 posts
you have failed to explain the distinction. I think that is mainly
because you are so desperate to show that they are not C, rather than
trying to explain what they actually /are/ in your language. Perhaps if
you didn't include "look how evil C is" interjections at all
opportunities, you'd get time to talk about your own language.
(Comparisons with C, or other common programming languages, are fine.)

>
> (If interested, look at the sections titled "Records" and "Structs"
> here: https://github.com/sal55/qx/blob/master/qdocs.md, for the dynamic
> language, and look for "Defining Records" in
> https://github.com/sal55/qx/blob/master/mdocs.md for the static one. I'm
> in the process of combining those languages into one, rather an
> ambitious project.)
>

I am not interested in reading your documentation (not at the moment,
anyway). If you can't explain the difference in a couple of paragraphs
in a Usenet post, then maybe you don't have a clear idea yourself.


>
>>> (1)  struct {int x,y;};                // defines nothing
>>> (2)  struct T {int x,y;};              // defines struct tag T
>>> (3)  struct {int x,y;} A;              // defines variable A
>>> (4)  struct U {int x,y;} B;            // struct tag U and variable B
>>> (5)  typedef struct {int x,y;};        // nothing
>>> (6)  typedef struct V {int x,y;};      // struct tag V
>>> (7)  typedef struct W {int x,y;} C;    // struct tag W, usertype C
>
>> But in all your efforts to try to show that C is complicated (it isn't)
>> or mixed up (it isn't), you managed to miss one form:
>>
>> (8)  typedef struct { int x, y; } D;
>>
>> That is the form I use for defining types.  I don't bother giving struct
>> tags unless the type will be recursive.
>
> I've numbered the examples (1) to (8) including yours. Here's how I
> define those examples in my static language (that one doesn't have the
> confusion between 'static' and 'record'); I use the more compact (...)
> delimiters:

It would again help if you stick to /one/ of your languages. If you
have reason to distinguish them or compare them, then at least give them
names.

>
> (1) Not valid
>
> (2) record T = (int x,y)         # defines type not a tag
>
> (3) record T = (int x,y)         # *must* have a name for A's type
>     T A                          # now declare the variable A
>
> (4) record U = (int x,y)         # define type name not tag
>     U B                          # define variable B of type U
>
> (5) Not valid
>
> (6) record V = (int x,y)         # define type not tag
>
> (7) record C = (int x,y)         # define type name; discard 'tag'
>
> (8) record D = (int x,y)
>

Why not just say something like :


A "record" is a user-defined structure type holding data member fields.
You define a record type like this:

record T = (int x, y)

In this language, you must define and name types before using them - you
can't use unnamed types or define them on the fly, and you can't use the
type to declare variables until the record definition is complete.


To me, that is clear and simple. (It still doesn't distinguish between
"struct" and "record", but maybe that is not part of this language.)




> Where the C did something useful, notice that the versions here are all
> completely consistent: define a type (all exactly the same way) and
> define a variable.
>
>> (And because C is flexible, there
>> is nothing stopping you writing your own code as though the combinations
>> you dislike were not allowed.)
>
> That would be /my/ code. Everyone else /could/ be using a bad style and
> that has to be considered.

That's because the whole programming world is against you, and everyone
writes their C code with the specific aim of irritating you personally.

And since we are talking mainly about your own private languages that no
one else will ever see or use, surely the C comparison would be to C
code that you write yourself, not to C code written by others?


> Even you considered that my examples may have
> been declaring variables not types, because /C/ allows variables to be
> declared of anonymous struct types.
>

You gave two completely different looking syntaxes:

type ws_systemtime = struct ...
record huffnode =

It is entirely reasonable to assume they do two completely different
things - like declaring a type, and declaring a variable. And with
dynamic typing, it is entirely reasonable to suppose that fields can be
added to an instance of the type, rather than just to the type
definition (you can do that with Python, for example).

My question was not based on what C allows, because we were talking
about /your/ language, not C. My question was based on the assumption
that your language was reasonable and logical, and that you might have
been flexible regarding dynamic types. Was that an unwarranted assumption?

>
>> But why do you have wildly different syntax for the two ways to declare
>> user types?
>>
>>     type ws_systemtime = struct ...
>>
>>     record huffnode = ...
>>
>> Why not
>>
>>     type huffnode = record ...
>>
>> or
>>
>>     struct ws_systemtime = ...
>>
>> Surely that would be more consistent?
>
> <Shrug> I used to have only 'type'. Then I experimented with 'record'
> having a dedicated syntax. That seemed to work, so I might roll it out
> to 'struct'. (My new language will probably use 'record' for a 'flat'
> struct-like aggregate type, and 'class' for a higher level, managed
> version.)

So really, your language here - that is /so/ much clearer than C - is
very much in flux, with a jumble of syntaxes made up as you go along?

>
>>
>>> Individual instances
>>> can be created like this:
>>>
>>>     a := new(ws_systemtime)
>>>
>>> or:
>>>
>>>     b := huffnode(0,0,0,0)
>>
>> Again, why the different syntaxes?
>
> Why does C? In C you do:
>
> static systemtime A;              # create A initialised to zeros
> static huffnode B = {0,0,0,0};    # create B initialised to zeros

I know the differences between these, and why they are there - as does
anyone who knows C. So why are you asking about C? I am asking about
/your/ language, because /you/ are the only one who knows it.

How about you start trying to answer questions about your language with
answers about your language, rather than more desperate attacks on C?

>
> In a dynamic language, you need way to assign a value of a certain type,
> and the generic way of doing that is with new(), but there need to be
> alternatives:
>
>    a := new(list,6,77)            # 6 elements all initialised to 77
>    a := (77,77,77,77,77,77)       # same thing
>
>    b := new(list,10 million)      # 10M elements all set to 'void'
>    b := (void,void,void,...       # this could take some time to type
>    b := (void,)*10 million        # alternative
>
>    c := ???
>    c := (10,20,30,40,50,60)
>
> Here this can't be done with 'new', as new doesn't allow the contents to
> be enumerated. (Although it is trivial to write a user-function that
> does exactly that.)
>
>    d:= new(array,byte,'A'..'Z')   # a 26-element byte-array indexed
>                                   # from 65 to 90 ('A'  to 'Z')
>    d:= ???

None of that answers my question.

I asked you why you gave different syntaxes here. I asked you if they
had different meanings - in particular, if the space for the object
itself is allocated in a different way. You haven't given any
explanation for the difference, or any reason why you picked one syntax
for one type of variable and the other syntax for the other type of
variable. You haven't said how they are allocated. You haven't said
what "new" means in your language - you just seem to assume it obvious.


>
> Here there is no tidy constructor syntax to specify the same. However,
> this is possible:
>
>    type T = ['A'..'Z']byte
>    d := new(T)
>    d := T(1,2,3,... 26)           # there must be 26 elements
>
> As well as doing it via a user-function.
>
> It's all useful diversity, but still based on only two main features:
>
>    x := new(T,...)           # use new (just a function really)
>    y := T(....)              # use a constructor for aggregate types
>    z := T(a)                 # That syntax is also type conversion
>
>
>> (If you want to stretch your brain a little, C++17 has not only static
>> type-safe union types std::variant, but the /static/ type-safe generic
>> std::any type.  Don't ask me how these are implemented, especially
>> std::any - it is C++ magic beyond the understanding of most C++ experts.
>
> My new hybrid language will attempt something similar:
>
>    variant a,b,c           # can assume any type
>    println a,b,c           # display void void void
>
> Which allow some odd-looking things:
>
>    a := a[i]               # works when a is a list
>    a := a^                 # (deref) works when a is a pointer
>    a := (a,a,a,a,a)        # etc
>
> Because normal static typing rules don't apply.
>

This can be handled fine with static typing, using two points:

1. Allow variable definition to use type deduction or type inference
from the initialiser to determine the type. That is like "auto" in
C++11, or "let" in Ocaml. Here, you are allowing it without needing a
keyword at all - perhaps using := for declaring and initialising data,
distinct from = for assignment.

2. Allow variables with the same name to be re-declared, replacing the
old variable within the same scope.


> I'm fairly confident I can make it work without getting into C++
> complexities.

It's possible.

All the complexities in C++ are there for a reason. Sometimes that
reason is for backwards compatibility with C and C++, resulting in a
language that is more awkward than one that is designed from scratch to
have the same features. That kind of complexity you should be able to
avoid.

Bart

unread,
Mar 14, 2019, 11:29:30 AM3/14/19
to
On 14/03/2019 13:04, David Brown wrote:
> On 14/03/2019 12:50, Bart wrote:

>> The original point of my post was to show that sometimes, static typing
>> can lead to tighter and less cluttered code. Usually the opposite is
>> true.
>
> That strikes me as an odd point to try to make. I don't see static or
> dynamic typing as particularly relevant there. What usually makes more
> of a difference is when you have explicit typing compared to implicit or
> inferred typing.
>
>> It didn't need an extensive tutorial on both languages used; they
>> could even have been made up pseudo-code.
>
> I didn't want a tutorial. You introduced the two concepts of "structs"
> and "records" in your language,

Actually the original example used only 'record'. If I can make the
point again, and assuming a user type in either language called Point,
an aggregate of two values, then:

Point p := (10,20) # p has a static type, (10,20) is converted
# to a point type by the compiler

Variant q := (10,20) # q has variant type, (10,20) is a regular
# list type

Variant r := Point(10,20) # A variant type needs the equivalent
# of a cast for the desired type

(I say either language, but in my new language these can all co-exist.
Technically, p and q have a static type that happens to be 'variant'. It
is the special qualities of a variant type that will make type
processing and type checking challenging.)

> I am not interested in reading your documentation

Oh, I thought you wanted more than:

>>> which is known only to you, and where I have to guess given tiny
>>> snippets of code and vague descriptions?

> So really, your language here - that is /so/ much clearer than C - is
> very much in flux, with a jumble of syntaxes made up as you go along?

Look, C is one huge mess (C++ is worse). I think many people can agree
with that:

* Mixing up struct definitions with variable definitions
* Mixing up variable definitions with function declarations and function
definitions
* Mixing up variable definitions with usertypes
* Mixing up function declarations with usertypes
* Mixing up struct definitions with usertypes
* And I hardly need to mention type declarations syntax where the type
of variable can be split into three separate parts.

Anyone designing such a language now, if lining them against a wall and
shooting them if not allowed, at least ought to be fired.

None of that applies to my syntax. No matter that no formal specs exist,
or that you can't be bothered to read what docs I do have, nor do I want
to go into that much detail when illustrating a point.

The subject line is still Algol68, and although it has diverged
considerably and is more informal, my syntax is largely still
recognisably based on Algol68.

>>    variant a,b,c           # can assume any type

>> Which allow some odd-looking things:
>>
>>    a := a[i]               # works when a is a list
>>    a := a^                 # (deref) works when a is a pointer
>>    a := (a,a,a,a,a)        # etc
>>
>> Because normal static typing rules don't apply.
>>
>
> This can be handled fine with static typing, using two points:
>
> 1. Allow variable definition to use type deduction or type inference
> from the initialiser to determine the type. That is like "auto" in
> C++11, or "let" in Ocaml. Here, you are allowing it without needing a
> keyword at all - perhaps using := for declaring and initialising data,
> distinct from = for assignment.

What type does that 'a' have to be, for a:=a[i] to make sense? (This is
for the usual case where the a in a[i] has a type of []T and you are
selecting one of those elements, not where you overload "[]" to mean
whatever you want it to mean.)

> 2. Allow variables with the same name to be re-declared, replacing the
> old variable within the same scope.

That is not an acceptable solution, and for many reasons will not work,
one being beacause they are separate variables.

James Harris

unread,
Mar 14, 2019, 12:35:25 PM3/14/19
to
Such a test with an arbitrary number of actions could be written

if C1
A1
endif

if the programmer wanted (e.g. that form may be convenient if there's a
chance of editing the body to such as

if C1
A1a
A1b
endif

But if the programmer prefers, the idea is to let a single line follow
the keyword "so" as in

if C1 so
A1

or

if C1 so A1

Personally, I really like that. It's easy to write code that way and,
IMO, easy to read it back. And I love that it looks clean, not needing
parens or other punctuation.
Some answers to that paragraph.

1. The programmer can choose where to handle an exception. If he wants
to handle if near where it occurs he can. But if he does not include
code for it there then it will be propagated up to the caller.

2. Exceptions can contain plenty of information about the details of
what occurred, where it occurred, perhaps when it occurred, etc. And it
may even be able to include a bit of context - the values of related
variables at the time the exception was thrown, parameters passed to the
erroring routine etc.

3. Exceptions can be usefully specific or vague. For example, there are
lots of different types of IO read errors: timeout, seek error, invalid
CRC, and even door not closed. :-) They can be constructed in plenty of
detail at the point where they are detected. But a programmer might
prefer to handle them all as an IO error. Even if he does, the error
message can still say what specifically went wrong.

Maybe the simplest way to think of exceptions of this type is that
whereas error returns are /ignored/ by default exceptions are /flagged
up/ by default. If you want to trap them you can. But if you don't
include code to detect them then they will propagate up the call chain
until they are handled or suppressed.

--
James Harris

James Harris

unread,
Mar 14, 2019, 1:36:03 PM3/14/19
to
Nasm's local labels extend only to the next non-local label. And there
could be normal labels between the beginning and the end of the function.

>
>
> Why are you defining EQU equates at the end? ...
>
> It appears that _$_F_localsize is always "%$localsize". So, why are
> you even using an EQU for it? ... Does it change?

That's because %$localsize can change /after/ the line which uses it
(something Nasm doesn't support). The extra label avoids the problem.

>
> Also, _$_F_return is not used by the posted code. Assuming elsewhere.

That's label is there as a target for a return statement. For example,
if in the middle of the function there is

return 9

the emitted code will say

mov eax, 9
jmp _$_F_return

It's because the function could contain one or more return statements
that the label is included in the function trailer.

>
> I'm not sure about how you compute a value for _$_F_tempsize, so you
> might be stuck with that or not depending ...
>
> The EQU section of the NASM doc says the EQU starts with a label. So,
> I don't see anything stating that a local label can't be used for the
> label of an EQU. NASM's preprocessor tends to be very generic. Did
> you try this? I.e., local labels for _$_F_localsize and _$_F_tempsize
>
> NASM EQU
> https://www.nasm.us/doc/nasmdoc3.html#section-3.2.4
>
> I'm guessing that you used EQU's for _$_F_localsize and _$_F_tempsize
> in order to have NASM perform an addition instead of maybe being
> required to use multiple assembly instructions by NASM, i.e., two sub's
> against esp. Yes? This may be part of the issue you're having, i.e.,
> needing EQU's and label's created from the procedure's name.
>
> If local labels work for both labels and EQUs, then I think you may
> only need F twice, e.g., for global F and F:

The function name could still be needed - e.g in the return statement,
above.

I should say that storing the function name is not a problem. It's a lot
less to store than I began with. :-)

>
> As for "push eax ... pop ebx", you could do a "mov ebx, eax" or "xchg
> ebx, eax". Or, you could "lea ebx, [B] ...", i.e., B against ebx,
> which would allow you to eliminate the esp/ebp prologue and epilogue
> code, as you'd be working against registers without stack parameters.

Remember that this is the output from an initial code generator. There's
no optimisation being done. Each fragment of code is output in response
to an input token.

To try to make an example to walk you through the code gen consider the
expression we were talking about,

A[B$]

In parsing that a token at a time the compiler first sees A and emits

lea eax, [A]

It then sees '[' which is regarded as a binop telling it it has to
evaluate what follows before it can complete the current value it's
working on, so it saves the current value by emitting

push eax ;Save operand

Then it starts to parse the bracketed subexpression. It sees B and emits

lea eax, [B]

Then it sees the dereference operator $ and emits

mov eax, [eax] ;Deref

Having reached the end of the expression it needs the value saved on the
stack to be retrieved so it emits

pop ebx ;Retrieve operand

That gives it the two operands of the '[' operator, and it combines them
by emitting

shl eax, 2 ;Convert index to offset
add eax, ebx ;Element address

That's the array indexing completed.

I don't know about you but as someone who has written asm programs over
the years it's fun to see the output of a code generator. Or maybe
that's just me!


--
James Harris

Andy Walker

unread,
Mar 14, 2019, 1:40:29 PM3/14/19
to
On 14/03/2019 16:35, James Harris wrote:
> Such a test with an arbitrary number of actions could be written
[...]
>   if C1
>     A1a
>     A1b
>   endif
> But if the programmer prefers, the idea is to let a single line
> follow the keyword "so" as in
>   if C1 so
>     A1
> or
>   if C1 so A1
> Personally, I really like that. It's easy to write code that way and,
> IMO, easy to read it back. And I love that it looks clean, not
> needing parens or other punctuation.

Um. Keywords like "if", "so", "endif" [and presumably in your
first example newline] *are*, for all practical purposes, "parens or
other punctuation". The advantage of "endif" is that you get much
better error checking for the [all too common, IME] case where some
mismatch has occurred. It doesn't really matter whether bracketing is
"begin ... end" or "(...)" or "if ... endif" or "case ... esac" or
"[...]" or "while ... do ... done", but it does matter, IMO, that the
bracketing be present. It's not so that the program looks pretty, it's
so that when [not if!] you make a mistake the compiler can point it out
to you.

From that point of view, lack of punctuation is a disaster
waiting to happen.

In similar vein, I'm surprised that people [not necessarily you,
I don't know enough about your language] still invent languages with
"empty" statements. It means that misplaced semicolons, for example,
easily done while editing, can completely alter the structure of a
program:

if x > 0 so ;
y := sqrt(x)

--
Andy Walker,
Nottingham.

James Harris

unread,
Mar 14, 2019, 2:37:51 PM3/14/19
to
Yes, they are punctuation in that sense. And I agree about endif. The
point I was making, though, was that syntax which omits punctuation
/characters/ can be easier to read. For example, instead of

if (c) {
d
} else if (e) {
f
}

I've gone for

if c
d
on e
f
endif

I find that better.

Perhaps surprisingly, it took me ages to work out ways to omit many of
the punctuation characters. I know it looks obvious now that they are
not needed but for various reasons such punctuation existed in the
syntax for years!

>
> From that point of view, lack of punctuation is a disaster
> waiting to happen.
>
> In similar vein, I'm surprised that people [not necessarily you,
> I don't know enough about your language] still invent languages with
> "empty" statements. It means that misplaced semicolons, for example,
> easily done while editing, can completely alter the structure of a
> program:
>
> if x > 0 so ;
> y := sqrt(x)
>

That's a good point. Here's a test case.

if x > 0

;

so

;

y := sqrt(x)

The blank lines and semicolons are just to emphasise that an arbitrary
number of blank lines or empty statements could occur.

In that, "so" would continue until it 'catches' a non-empty statement.
So the assignment to y would only execute if x > 0.

How does that look to you?


--
James Harris

Bart

unread,
Mar 14, 2019, 3:24:04 PM3/14/19
to
On 14/03/2019 16:35, James Harris wrote:
It looks a little unbalanced to me if, in the interests of a short form
version without 'endif', you have to /add/ a new keyword.

But I'm interested in what happens when A1 is also an IF statement. Is
the following allowed:

if C1 so if C2 so A1

What about:

if C1 so if C2
A1
A2
endif # (where this does go?)

if C1
if C2 so
if C3 so
A1
endif

if C1
if C2
if C3 so
A1
A2 # what does this belong too?
A3
endif
A4
endif

(See the problem with some if's ending with 'endif' and some not, and
the likelihood of indentation errors, unless your compiler detects those.)

I rather think that for a single statement, you either have this form:

if C1 so A1

or:

if C1
A1
endif

As there isn't great deal of difference between the latter and:

if C1 so
A1

There are the same number of tokens. Also this is the approach used by
Python, although there is always a ':':

if C1: A1

if C1:
A1

It also allows:

if C1: A1; A2; A3

James Harris

unread,
Mar 14, 2019, 4:21:18 PM3/14/19
to
It is, of course, a very short keyword.... :-)

>
> But I'm interested in what happens when A1 is also an IF statement. Is
> the following allowed:
>
> if C1 so if C2 so A1

That would be allowed and would behave as you might expect.

>
> What about:
>
> if C1 so if C2
> A1
> A2
> endif # (where this does go?)

The endif would go with the inner if (because the inner if had not been
closed; the idea is that "so" refers to one command AND closes the
enclosing block).

>
> if C1
> if C2 so
> if C3 so
> A1
> endif

That would be as you've indented it.

>
> if C1
> if C2
> if C3 so
> A1
> A2 # what does this belong too?
> A3
> endif
> A4
> endif

That would be different from your indentation. The "so" on C3 would only
capture one statement, the A1.

It looks like both A2 and A3 would be under C2.

>
> (See the problem with some if's ending with 'endif' and some not, and
> the likelihood of indentation errors, unless your compiler detects those.)

I don't see a big problem. Programmers can choose whether to write
obfuscated code if they want to ... but they wouldn't have to!

As for compiler detection don't forget that my compiler at present is
very basic! But I did suggest here some years ago that a compiler could
check source code against local coding standards.

>
> I rather think that for a single statement, you either have this form:
>
> if C1 so A1
>
> or:
>
> if C1
> A1
> endif
>
> As there isn't great deal of difference between the latter and:
>
> if C1 so
> A1

Noted. I quite like shorter statements at times.

There's a syntactic-sugary thing going on which I ought to explain. The
block would 'officially' be

if C1
so
A1

That's because "if C1" is a statement in its own right. So is "so". And
so is "A1". The same could be expressed as

if C1; so; A1

It's purely because I think that that is a bit too full of semicolons
that I have in mind allowing them to be omitted.

>
> There are the same number of tokens. Also this is the approach used by
> Python, although there is always a ':':
>
> if C1: A1
>
> if C1:
> A1

Agreed. I prefer "so". :-)

>
> It also allows:
>
> if C1: A1; A2; A3
>

You mean that if C1 is true Python would go on to A1, A2 and A3? I
didn't know that.


--
James Harris

David Brown

unread,
Mar 14, 2019, 4:24:59 PM3/14/19
to
On 14/03/2019 16:29, Bart wrote:
> On 14/03/2019 13:04, David Brown wrote:
>> On 14/03/2019 12:50, Bart wrote:
>
>>> The original point of my post was to show that sometimes, static typing
>>> can lead to tighter and less cluttered code. Usually the opposite is
>>> true.
>>
>> That strikes me as an odd point to try to make.  I don't see static or
>> dynamic typing as particularly relevant there.  What usually makes more
>> of a difference is when you have explicit typing compared to implicit or
>> inferred typing.
>>
>>> It didn't need an extensive tutorial on both languages used; they
>>> could even have been made up pseudo-code.
>>
>> I didn't want a tutorial.  You introduced the two concepts of "structs"
>> and "records" in your language,
>
> Actually the original example used only 'record'. If I can make the
> point again, and assuming a user type in either language called Point,
> an aggregate of two values, then:
>
>     Point p := (10,20)    # p has a static type, (10,20) is converted
>                           # to a point type by the compiler
>
>     Variant q := (10,20)  # q has variant type, (10,20) is a regular
>                           # list type
>
>     Variant r := Point(10,20)   # A variant type needs the equivalent
>                                 # of a cast for the desired type
>

OK, I get that. I still can't get you to tell me why you need "record"
and "struct" in the language, and what the difference is, but maybe I'll
just have to accept that you either don't know the difference, or can't
explain it.

> (I say either language, but in my new language these can all co-exist.
> Technically, p and q have a static type that happens to be 'variant'. It
> is the special qualities of a variant type that will make type
> processing and type checking challenging.)
>
>> I am not interested in reading your documentation
>
> Oh, I thought you wanted more than:
>
> >>> which is known only to you, and where I have to guess given tiny
> >>> snippets of code and vague descriptions?
>

I had really hoped there would be a clear and simple difference between
"struct" and "record" that you could explain in a post, but that does
not seem to be the case.

>> So really, your language here - that is /so/ much clearer than C - is
>> very much in flux, with a jumble of syntaxes made up as you go along?
>
> Look, C is one huge mess (C++ is worse). I think many people can agree
> with that:
>
> * Mixing up struct definitions with variable definitions
> * Mixing up variable definitions with function declarations and function
> definitions
> * Mixing up variable definitions with usertypes
> * Mixing up function declarations with usertypes
> * Mixing up struct definitions with usertypes
> * And I hardly need to mention type declarations syntax where the type
> of variable can be split into three separate parts.
>
> Anyone designing such a language now, if lining them against a wall and
> shooting them if not allowed, at least ought to be fired.
>
> None of that applies to my syntax. No matter that no formal specs exist,
> or that you can't be bothered to read what docs I do have, nor do I want
> to go into that much detail when illustrating a point.
>

You didn't get my point about moaning about C, did you? Perhaps you
have misunderstood, but I have been trying desperately for a half dozen
posts to get you to explain about /your/ language. But most of what I
get is "C is horrible", "C lets you do things I don't like", "I don't
understand C".

> The subject line is still Algol68, and although it has diverged
> considerably and is more informal, my syntax is largely still
> recognisably based on Algol68.
>
>>>     variant a,b,c           # can assume any type
>
>>> Which allow some odd-looking things:
>>>
>>>     a := a[i]               # works when a is a list
>>>     a := a^                 # (deref) works when a is a pointer
>>>     a := (a,a,a,a,a)        # etc
>>>
>>> Because normal static typing rules don't apply.
>>>
>>
>> This can be handled fine with static typing, using two points:
>>
>> 1. Allow variable definition to use type deduction or type inference
>> from the initialiser to determine the type.  That is like "auto" in
>> C++11, or "let" in Ocaml.  Here, you are allowing it without needing a
>> keyword at all - perhaps using := for declaring and initialising data,
>> distinct from = for assignment.
>
> What type does that 'a' have to be, for a:=a[i] to make sense? (This is
> for the usual case where the a in a[i] has a type of []T and you are
> selecting one of those elements, not where you overload "[]" to mean
> whatever you want it to mean.)

Read point 2. It is a different "a" each time. I know that will cause
you to throw a tantrum, because you can't cope with such new ideas.

>
>> 2. Allow variables with the same name to be re-declared, replacing the
>> old variable within the same scope.
>
> That is not an acceptable solution, and for many reasons will not work,
> one being beacause they are separate variables.
>

Of course it will work, precisely because they are separate variables.
Who cares about the old variable, once you have the new one? It does
not matter that the old one is now inaccessible.

Bart

unread,
Mar 14, 2019, 6:19:45 PM3/14/19
to
On 14/03/2019 20:21, James Harris wrote:
> On 14/03/2019 19:23, Bart wrote:

>>
>>       if C1
>>           if C2
>>               if C3 so
>>                   A1
>>                   A2        # what does this belong too?
>>               A3
>>           endif
>>           A4
>>       endif
>
> That would be different from your indentation. The "so" on C3 would only
> capture one statement, the A1.
>
> It looks like both A2 and A3 would be under C2.
>
>>
>> (See the problem with some if's ending with 'endif' and some not, and
>> the likelihood of indentation errors, unless your compiler detects
>> those.)
>
> I don't see a big problem. Programmers can choose whether to write
> obfuscated code if they want to ... but they wouldn't have to!

The problem is in inadvertently writing the wrong code. You have two
flavours of IF: 'if' and 'if-so', the former of which N statements and
ends with 'endif', the latter takes 1 statement and doesn't end with
'endif'.

It sounds rather error prone, and reminds me of the optional {,} braces
in C.

I forgot to ask with happens with 'else-if' ('on'?) and 'else'. From
what I saw, you seem to use 'on 1' for else.

For 'on' is it possible to selectively use 'so' with each? Ie.

if C1
A1
A2
on C2 so
A3
on C3
A4
A5
on 1 so
A6

Will this form need an 'endif'?

(Note I recently looked at allowing single statement blocks for my new
language. There I was considering using:

if C1: S1

in place of the normal:

if C1 then S1 fi

Until I saw how messy it would make the description of the grammar. So
decided to leave it. However, that syntax also allows conditional
versions of certain control flow statements:

return if C1
goto L when C2
exit unless C3 # (exit from a loop)

which covers a few of the use-case for short-IF. And the same syntax
allows has a compact if-then:

(C1 | S1) # if C1 then S1 fi
(C1 | S1 | S2) # if C1 then S1 else S2 fi

)

Bart

unread,
Mar 14, 2019, 6:52:04 PM3/14/19
to
On 14/03/2019 20:24, David Brown wrote:
> On 14/03/2019 16:29, Bart wrote:

> OK, I get that.  I still can't get you to tell me why you need "record"
> and "struct" in the language, and what the difference is, but maybe I'll
> just have to accept that you either don't know the difference, or can't
> explain it.

Perhaps you can first explain to me why Python needs attributes, named
tuples AND, when emulating C-structs, the 'struct' module.

I think I've explained the difference in my language enough times, so I
can only surmise that it so far outside your own experience that you
can't get it.

I won't try any further except to illustrate exactly what they do (using
consistent syntax as that seemed to throw you):

type date1 = struct (byte day,month; u16 year)
type date2 = record (var day, month, year)

date1 layout (values in bytes):

Size Offset Type
day 1 0 u8
month 1 1 u8
year 2 2 u16
Total 4 - -

date2 layout:

Size Offset Type
day 16 0 variant (size excludes any extra heap data)
month 16 16 variant
year 16 32 variant
Total 48 -

This language's execution model is stack-based and works primarily with
variant types and data structures like date2. Flat, non-managed packed
data like date1 is used for precise storage control and for interfacing
with libraries.

> I had really hoped there would be a clear and simple difference between
> "struct" and "record" that you could explain in a post, but that does
> not seem to be the case.

They are just terms. I will probably switch to 'record' and 'class'
respectively.

> You didn't get my point about moaning about C, did you?  Perhaps you
> have misunderstood, but I have been trying desperately for a half dozen
> posts to get you to explain about /your/ language.  But most of what I
> get is "C is horrible", "C lets you do things I don't like", "I don't
> understand C".

And my point was, because C is messed up, you think other languages are
the same so expect the worst.

> Of course it will work, precisely because they are separate variables.
> Who cares about the old variable, once you have the new one?  It does
> not matter that the old one is now inaccessible.

Try this:

a := (10,(3,1,4,1,5,9),30) # (1-based indexing)
a[2] := a[2,4] # ie. select (3,1,4,1,5,9)[4]
println a # display (10,1,3)

How do you redeclare the middle element of a? It doesn't have a name!
Here's a Python example if you are so scathing of my language:

a = [10,[3,1,4,1,5,9],30]
a[1] = a[1][3]
print (a)

The problem: how to allow this same behaviour in a strictly statically
typed language, where none of the types of variant objects are known at
compile type, so that the normal type-checking can't be done.

In your mention of C++ allowing something like this, you called it
magic. When I attempt it, you dismiss it as something you can easily
achieve by simply re-declaring variables every time you assign to them.
What's going on is at a much deeper level.

James Harris

unread,
Mar 15, 2019, 2:37:28 AM3/15/19
to
Yes, I used "on 1" in the original compiler as a substitute for "on
true" as it kept the compiler simpler, though it was not meant as a
long-term option. There are various real options for an else clause:

on else
on other
on true
on
else
otherwise

Which do you think are best? I think I could get used to the plain "on"
statement as in

if a < 3
return 1
on
return fib(a - 1) + fib(a - 2)
endif


>
> For 'on' is it possible to selectively use 'so' with each? Ie.
>
> if C1
> A1
> A2
> on C2 so
> A3
> on C3
> A4
> A5
> on 1 so
> A6

I don't know if there would be any point in allowing "so" to end a long
chain. Its only real purpose is as a convenience to allow the programmer
to shorten a common case such as

if a
b
endif

That's a bit bulky, IMO, and it looks better as

if a so
b


>
> Will this form need an 'endif'?

Probably.

>
> (Note I recently looked at allowing single statement blocks for my new
> language. There I was considering using:
>
> if C1: S1
>
> in place of the normal:
>
> if C1 then S1 fi
>
> Until I saw how messy it would make the description of the grammar. So
> decided to leave it. However, that syntax also allows conditional
> versions of certain control flow statements:
>
> return if C1
> goto L when C2
> exit unless C3 # (exit from a loop)

I first came across trailing condition clauses in a form of Basic when I
was at college. I can't say I've ever really taken to them, mainly
because they read in the wrong order. But don't your examples show the
value of having a short if? For example, your first and last of those three:

if C1 so return
if not C3 so exit

Isn't it easier reading such statements left to right?

>
> which covers a few of the use-case for short-IF. And the same syntax
> allows has a compact if-then:
>
> (C1 | S1) # if C1 then S1 fi
> (C1 | S1 | S2) # if C1 then S1 else S2 fi
>
> )
>


--
James Harris

Andy Walker

unread,
Mar 15, 2019, 7:09:17 AM3/15/19
to
On 15/03/2019 06:37, James Harris wrote:
> On 14/03/2019 22:19, Bart wrote:
[...]
>>      return if C1
>>      goto L when C2
>>      exit unless C3      # (exit from a loop)

Atlas Autocode had the "S if C", "S unless C" forms, though I
have a vague memory that they weren't in the original language of 1963
or thereabouts but were added a year or two later. [Incidentally, a
lot of the stuff about AA on the web is utter rubbish -- don't trust
anything not written by Tony Brooker!]

> I first came across trailing condition clauses in a form of Basic
> when I was at college. I can't say I've ever really taken to them,
> mainly because they read in the wrong order.

Is that because you're reading them as C rather than as English?
We say things like "let's go for a walk tomorrow if it's sunny" rather
than "if it's sunny then let's go for a walk tomorrow", "don't come
unless you can afford it" rather than "unless you can afford it then
don't come", and even less do we say "if not you can afford it so don't
come" [unless you've been watching too many episodes of "Friends"].

> But don't your examples
> show the value of having a short if? For example, your first and last
> of those three:
> if C1 so return
> if not C3 so exit
> Isn't it easier reading such statements left to right?

It's what you're used to. In short cases, where everything is
on one line, I find "return if C1" and "exit unless C3" at least as
readable as your versions. What you don't want is

begin S1;
...;
# many lines of statements #
...
end unless C1

where the whole structure of what you're reading comes as a surprise.
OTOH, it's difficult to design a clean syntax that allows the short
version but not the long, so probably it should be left up to the
programmer's discretion.

>> which covers a few of the use-case for short-IF. And the same syntax
>> allows has a compact if-then:
>>
>>      (C1 | S1)                 # if C1 then S1 fi
>>      (C1 | S1 | S2)            # if C1 then S1 else S2 fi

Also "( C1 | S1 |: C | S2 | S3 )" ["elif"]?

--
Andy Walker,
Nottingham.

Andy Walker

unread,
Mar 15, 2019, 7:42:41 AM3/15/19
to
On 14/03/2019 18:37, James Harris wrote:
[me:]
>>     In similar vein, I'm surprised that people [not necessarily you,
>> I don't know enough about your language] still invent languages with
>> "empty" statements.  It means that misplaced semicolons, for example,
>> easily done while editing, can completely alter the structure of a
>> program:
>>     if x > 0 so ;
>>       y := sqrt(x)
> That's a good point. Here's a test case.
>
>   if x > 0
>
>     ;
>
>     so
>
>       ;
>
>       y := sqrt(x)
> The blank lines and semicolons are just to emphasise that an
> arbitrary number of blank lines or empty statements could occur.

Perhaps you missed the point? IMO, "empty statements"
should not be allowed in any modern language. If you want to
"do nothing", then say so. You can add a keyword, such as "skip",
"empty", "nothing", or overload "nil" or "null", or allow a
"do nothing" expression such as "0", but the programmer should be
required to put something there to acknowledge the existence of
a statement. Yes, it's sometimes a faff; the worst case, IME,
is when commenting out code:

S1;
# S2 #
end

"error, statement expected". Grr! Try again:

S1;
# S2 # skip
end

> In that, "so" would continue until it 'catches' a non-empty
> statement. So the assignment to y would only execute if x > 0.> How does that look to you?

ISTM that in solving one problem, you've created another.
If you edited/commented out what was intended to be the controlled
statement of "x > 0", then you've accidentally and silently turned
the assignment to "y" into the controlled statement when it should
be a compilation error [however annoying those may be].

Incidentally, many years ago I was playing with several large
Pascal programs, inc two compilers, and with tweaking Pascal to be a
better [IMHO] language by including things like "+:=", "endif", ...,
and one of the things was explicit "skip" for empty statements. The
tweaking was done by a transduction grammar and a pretty-printer, and
once the "skip"s were put in and pretty-printed, several blunders of
the sort above were uncovered and obvious, inc in the compiler.

[Even more incidentally, if empty statements in Pascal are
written as "skip", then, IIRC, statements don't need terminators, and
the semicolons can all go. Or you could spell "skip" as ";", and then
most semicolons can go. This is not necessarily an improvement, so I
don't entirely recommend it.]

--
Andy Walker,
Nottingham.

Bart

unread,
Mar 15, 2019, 7:53:14 AM3/15/19
to
Here that would have to be personal choice as I see you have a certain
style in mind for your language. (For example, I associate "on" with
event processing.) Some ideas here as to what a range of languages do:

http://rosettacode.org/wiki/Conditional_structures

(But probably one of the worst is C's:

default:

as used in its switch statement. Make a slight typo:

defualt:

and it's an undetectable error with different behaviour.)

Myself, I've gotten used to 'else' for everything (if, if/elsif, switch,
case, even loops ala Python) unless I'm using "|" in the compact form.

>>      return if C1
>>      goto L when C2
>>      exit unless C3      # (exit from a loop)
>
> I first came across trailing condition clauses in a form of Basic when I
> was at college. I can't say I've ever really taken to them, mainly
> because they read in the wrong order. But don't your examples show the
> value of having a short if? For example, your first and last of those
> three:
>
>   if C1 so return
>   if not C3 so exit
>
> Isn't it easier reading such statements left to right?

Probably, but as I said it makes the syntax awkward to have one form for
both long and short versions.

But using the trailing condition makes the action more prominent,
instead of being buried inside an if branch, and lines it up with other
non-conditional statements. Real example:

do
c := fgetc(f)
exit when c = c_eof
fputc(c,logdest)
od

David Brown

unread,
Mar 15, 2019, 8:00:19 AM3/15/19
to
On 14/03/2019 23:52, Bart wrote:
> On 14/03/2019 20:24, David Brown wrote:
>> On 14/03/2019 16:29, Bart wrote:
>
>> OK, I get that.  I still can't get you to tell me why you need
>> "record" and "struct" in the language, and what the difference is, but
>> maybe I'll just have to accept that you either don't know the
>> difference, or can't explain it.
>
> Perhaps you can first explain to me why Python needs attributes, named
> tuples AND, when emulating C-structs, the 'struct' module.

No - I am asking /you/ to explain about /your/ language. Can't you
understand that?

>
> I think I've explained the difference in my language enough times, so I
> can only surmise that it so far outside your own experience that you
> can't get it.

Read back through your posts. You failed consistently, because you
invariably mixed in all sorts of other differences without making any
clear distinctions. And of course, anything that might have been
relevant was squeezed in between rants about C.

>
> I won't try any further except to illustrate exactly what they do (using
> consistent syntax as that seemed to throw you):
>
>     type date1 = struct (byte day,month; u16 year)
>     type date2 = record (var day, month, year)
>
>     date1 layout (values in bytes):
>
>              Size   Offset  Type
>     day      1      0       u8
>     month    1      1       u8
>     year     2      2       u16
>     Total    4      -       -
>
>     date2 layout:
>
>              Size   Offset  Type
>     day      16     0       variant (size excludes any extra heap data)
>     month    16     16      variant
>     year     16     32      variant
>     Total    48     -
>

This is beginning to get somewhere. "struct" is a structure where each
data member can be any type except variant, while "record" is a
structure where all the data members are of type "variant". Is that
correct? ("variant" being the underlying static type, with the actual
content type being dynamically chosen.)

That does, of course, lead right back to the question of why you want to
have both - rather than just allowing variants along with other types in
your structures (which you might call either "struct" or "record").

> This language's execution model is stack-based and works primarily with
> variant types and data structures like date2. Flat, non-managed packed
> data like date1 is used for precise storage control and for interfacing
> with libraries.
>
>> I had really hoped there would be a clear and simple difference
>> between "struct" and "record" that you could explain in a post, but
>> that does not seem to be the case.
>
> They are just terms. I will probably switch to 'record' and 'class'
> respectively.
>
>> You didn't get my point about moaning about C, did you?  Perhaps you
>> have misunderstood, but I have been trying desperately for a half
>> dozen posts to get you to explain about /your/ language.  But most of
>> what I get is "C is horrible", "C lets you do things I don't like", "I
>> don't understand C".
>
> And my point was, because C is messed up, you think other languages are
> the same so expect the worst.
>

I have worked in over a dozen different programming languages. C is
/not/ messed up - your misunderstanding of it and your knee-jerk
prejudice against it is messed up. No one, certainly not me, thinks C
is in any way a "perfect" language - but I have never seen a language
that could be considered "perfect". I think /your/ language is confused
and inconsistent because your examples are confused and inconsistent,
and your repeated failure to explain simple aspects shows that even
/you/ are not clear on it. It is not surprising, of course, since it is
an exercise in navel gazing, and no one can be expected to design a good
language without working with other people.

>> Of course it will work, precisely because they are separate variables.
>> Who cares about the old variable, once you have the new one?  It does
>> not matter that the old one is now inaccessible.
>
> Try this:
>
>     a := (10,(3,1,4,1,5,9),30)    # (1-based indexing)
>     a[2] := a[2,4]         # ie. select (3,1,4,1,5,9)[4]
>     println a              # display (10,1,3)
>
> How do you redeclare the middle element of a? It doesn't have a name!
> Here's a Python example if you are so scathing of my language:
>
>     a = [10,[3,1,4,1,5,9],30]
>     a[1] = a[1][3]
>     print (a)

That is doing a different thing from what you first talked about. You
gave an example, I showed how it could be handled simply and clearly
with static typing. (I wouldn't have done it like that myself in a
programming language - I prefer a specific way to explicitly declare
variables.) You can't then tell me I was wrong because I didn't read
your mind and the future and consider the very different example you
were going to post next!

>
> The problem: how to allow this same behaviour in a strictly statically
> typed language, where none of the types of variant objects are known at
> compile type, so that the normal type-checking can't be done.

Sure, this could be done with a statically typed language. What you
have here is a type T that is logically either an int, or a list of T's.
You can't statically make such a type directly (what size would it
be?), but if your list contains pointers to T then there is no problem.
Implementing it might be a little messy, especially if you are trying
to get an efficient implementation and retain a neat appearance of the
user code (without caring about the appearance of the implementation
part). Clearly it is vastly simpler if you have a language if you have
high-level support for variant types and lists built into the language.

But here's the thing - if you used C++ (or whatever statically typed
language you want with a bit more power than C, in order to get the
neater syntax - maybe Jacob's extended C would do) to implement roughly
this (probably with curly brackets rather than square ones), and wrote
"a[2] = "x";", the compiler could tell you it's a fault in the typing.
You get the checks from static strong typing that you don't get from
freer dynamic typing.

>
> In your mention of C++ allowing something like this, you called it
> magic. When I attempt it, you dismiss it as something you can easily
> achieve by simply re-declaring variables every time you assign to them.
> What's going on is at a much deeper level.

"Any technology sufficiently advanced is indistinguishable from magic."
The C++ involved in defining types like std::any is, currently, magic
to me. To the people writing those libraries, it is just difficult coding.

And I didn't dismiss what you were doing - I said that the example you
gave could be implemented with static typing with relatively little
effort in the language design. I didn't make any suggestions about how
to handle the new example you had yet to post - that would be harder to
handle in static typing, but by no means impossible.

Bart

unread,
Mar 15, 2019, 8:19:52 AM3/15/19
to
On 15/03/2019 11:42, Andy Walker wrote:
> On 14/03/2019 18:37, James Harris wrote:

>>    if x > 0
>>
>>      ;
>>
>>      so
>>
>>        ;
>>
>>        y := sqrt(x)
>> The blank lines and semicolons are just to emphasise that an
>> arbitrary number of blank lines or empty statements could occur.
>
>     Perhaps you missed the point?  IMO, "empty statements"
> should not be allowed in any modern language.  If you want to
> "do nothing", then say so.  You can add a keyword, such as "skip",
> "empty", "nothing", or overload "nil" or "null", or allow a
> "do nothing" expression such as "0", but the programmer should be
> required to put something there to acknowledge the existence of
> a statement.  Yes, it's sometimes a faff;  the worst case, IME,
> is when commenting out code:
>
>     S1;
>     # S2 #
>     end
>
>     "error, statement expected".  Grr!  Try again:

This only applies when ";" is strictly used only as a separator, and
can't end the last statement of a sequence. Be tolerant of extra ";",
and it doesn't matter.


>     S1;
>     # S2 # skip
>     end

I want to be able to do:

if cond then
<commented out code>
fi

without it being an error, as I comment out code in that way all the
time. Or I haven't yet written it. Or I've deleted it and haven't yet
replaced it.

Commenting out one line is a single key command in my editor, and
multiple lines can be rapidly done by repeating the same key. Having to
to keep inserting and then removing (or commenting out) "skip" would be
onerous.

>     Incidentally, many years ago I was playing with several large
> Pascal programs, inc two compilers, and with tweaking Pascal to be a
> better [IMHO] language by including things like "+:=", "endif", ...,
> and one of the things was explicit "skip" for empty statements.  The
> tweaking was done by a transduction grammar and a pretty-printer, and
> once the "skip"s were put in and pretty-printed, several blunders of
> the sort above were uncovered and obvious, inc in the compiler.
>
>     [Even more incidentally, if empty statements in Pascal are
> written as "skip", then, IIRC, statements don't need terminators, and
> the semicolons can all go.  Or you could spell "skip" as ";", and then
> most semicolons can go.  This is not necessarily an improvement, so I
> don't entirely recommend it.]

My syntaxes generally treat newline as ";" (with some exceptions such as
after "(" or ","). Since statements are nominally separated by ";", that
means actual semicolons very rarely need to be written.

The downside is that sometimes a line continuation character is needed.

Also, it is necessary for the grammar to be tolerant of extra
semicolons, as this piece of code:

if a
then
#comment
s1

s2;
s3
fi


is actually seen as this:

if a;
then;
;
s1;
;
s2;;
s3;
fi;
;

Effectively, ";" is just a form of white-space, which is necessary to
separate tokens. Extra white-space is harmless.

Bart

unread,
Mar 15, 2019, 8:52:58 AM3/15/19
to
On 15/03/2019 12:00, David Brown wrote:
> On 14/03/2019 23:52, Bart wrote:
>> On 14/03/2019 20:24, David Brown wrote:
>>> On 14/03/2019 16:29, Bart wrote:
>>
>>> OK, I get that.  I still can't get you to tell me why you need
>>> "record" and "struct" in the language, and what the difference is, but
>>> maybe I'll just have to accept that you either don't know the
>>> difference, or can't explain it.
>>
>> Perhaps you can first explain to me why Python needs attributes, named
>> tuples AND, when emulating C-structs, the 'struct' module.
>
> No - I am asking /you/ to explain about /your/ language. Can't you
> understand that?

My reasons are the same as Python's; named tuples can't do the job of
creating data structures that can be passed to a C API that expects
C-style structs.

> This is beginning to get somewhere. "struct" is a structure where each
> data member can be any type except variant, while "record" is a
> structure where all the data members are of type "variant". Is that
> correct?

Yes.

> That does, of course, lead right back to the question of why you want to
> have both - rather than just allowing variants along with other types in
> your structures (which you might call either "struct" or "record").

Because it would be massively more complicated to implement. One
(struct) has elements of mixed types and sizes; the other (record) has
elements of one type (variant) and one size, and actually shares
implementation code with lists of variants.

They are totally different types, other than both are compound types
with elements accessed by name, so they share the same 'interface'.

Note that my treatment of structs doesn't exist at all in most other
scripting languages, you have to use third party libraries and various
ad hoc approaches, eg:

import struct
v = struct.pack('hhl',1,2,3)

Compare with my:

type S = struct # built-in to the language
int16 a, b
int32 c
end

v := S(1,2,3)
v.a = v.b+v.c # (Can this be done in Python? Perhaps
# with another couple of libraries)

Since, to you, NOTHING is ever too much trouble, perhaps you'd care to
explain to the Python guys how easy it would be to get rid of the struct
module, and tweak one of the named-tuple/record-handling add-ons, to
also cope with packed struct data like this.

Because that is exactly what you suggesting to me.

> That is doing a different thing from what you first talked about.

It's the same thing. You made the assumption that the 'a' in 'a:=a[i]'
is always going to be a simple variable name, rather than be the same
object.

My point is simply that variant types introduce something that is alien
to traditional type analysers of statically-typed languages.

It's not about introducing some workaround alternative for variant types
in some simple cases. Variant types /are/ being introduced; it's about
how they can be handled side-by-side with normal static types.

Andy Walker

unread,
Mar 15, 2019, 9:06:55 AM3/15/19
to
On 15/03/2019 12:19, Bart wrote:
> [...] Be tolerant of extra ";", and it doesn't matter.

It's being 'tolerant of extra ";"' that causes lots of empty
statements to appear in the parse tree, and causes the problems noted
elsewhere in the thread. Tolerance is Good in five-finger exercises,
where you want to write a few lines of code in two minutes and find
something out:

$ a68g -p "PR precision=1000 PR long long sqrt(2)"

It's Bad in serious code, where you want the compiler to find as many
as possible of the errors in your code.

[...]
> I want to be able to do:
>     if cond then
>         <commented out code>
>     fi
> without it being an error, as I comment out code in that way all the
> time. Or I haven't yet written it. Or I've deleted it and haven't yet
> replaced it.

Yes, likewise. Tho' unwritten/deleted code can easily be
called "skip", or "skip # not yet written !!!!! #" or whatever.
For the cases where I often switch between code and comment and the
code can't simply be deleted, I usually have "statement # skip #"
switching with "# statement # skip" which can be achieved by a very
simple [one-key] edit to switch "#" between start and end of line.
The same "trick" works with code that can simply be deleted, exc
that you can leave out the "skip".

[...]
> My syntaxes generally treat newline as ";" (with some exceptions such
> as after "(" or ","). Since statements are nominally separated by
> ";", that means actual semicolons very rarely need to be written.
> The downside is that sometimes a line continuation character is needed.

How very old-fashioned of you!

--
Andy Walker,
Nottingham.

James Harris

unread,
Mar 15, 2019, 9:07:04 AM3/15/19
to
On 15/03/2019 11:09, Andy Walker wrote:
> On 15/03/2019 06:37, James Harris wrote:
>> On 14/03/2019 22:19, Bart wrote:
> [...]
>>>      return if C1
>>>      goto L when C2
>>>      exit unless C3      # (exit from a loop)
>
> Atlas Autocode had the "S if C", "S unless C" forms, though I
> have a vague memory that they weren't in the original language of 1963
> or thereabouts but were added a year or two later. [Incidentally, a
> lot of the stuff about AA on the web is utter rubbish -- don't trust
> anything not written by Tony Brooker!]
>
>> I first came across trailing condition clauses in a form of Basic
>> when I was at college. I can't say I've ever really taken to them,
>> mainly because they read in the wrong order.
>
> Is that because you're reading them as C rather than as English?

No, it's because the parts are written in the opposite order in which
they will be processed.

> We say things like "let's go for a walk tomorrow if it's sunny" rather
> than "if it's sunny then let's go for a walk tomorrow", "don't come
> unless you can afford it" rather than "unless you can afford it then
> don't come", and even less do we say "if not you can afford it so don't
> come" [unless you've been watching too many episodes of "Friends"].

It's OK for English.


--
James Harris

James Harris

unread,
Mar 15, 2019, 9:27:49 AM3/15/19
to
On 15/03/2019 11:42, Andy Walker wrote:
> On 14/03/2019 18:37, James Harris wrote:
> [me:]
>>>     In similar vein, I'm surprised that people [not necessarily you,
>>> I don't know enough about your language] still invent languages with
>>> "empty" statements.  It means that misplaced semicolons, for example,
>>> easily done while editing, can completely alter the structure of a
>>> program:
>>>     if x > 0 so ;
>>>       y := sqrt(x)
>> That's a good point. Here's a test case.
>>
>>   if x > 0
>>
>>     ;
>>
>>     so
>>
>>       ;
>>
>>       y := sqrt(x)
>> The blank lines and semicolons are just to emphasise that an
>> arbitrary number of blank lines or empty statements could occur.
>
> Perhaps you missed the point? IMO, "empty statements"
> should not be allowed in any modern language. If you want to
> "do nothing", then say so.

Perhaps you miss the point. The 'empty statements' in the above are just
empty space - like blank lines. I can't believe you would object to
those so I am not clear as to what your objection is. All blank lines
and semicolons in the above would be skipped/ignored.

>
> You can add a keyword, such as "skip",
> "empty", "nothing", or overload "nil" or "null", or allow a
> "do nothing" expression such as "0", but the programmer should be
> required to put something there to acknowledge the existence of
> a statement.

For a "no operation" statement I use

nop

That choice is largely because both I and the language are coming from
an assembly background. :-)


>
> Yes, it's sometimes a faff; the worst case, IME,
> is when commenting out code:
>
> S1;
> # S2 #
> end
>
> "error, statement expected". Grr! Try again:
>
> S1;
> # S2 # skip
> end
>
>> In that, "so" would continue until it 'catches' a non-empty
>> statement. So the assignment to y would only execute if x > 0.> How does that look to you?
>
> ISTM that in solving one problem, you've created another.
> If you edited/commented out what was intended to be the controlled
> statement of "x > 0", then you've accidentally and silently turned
> the assignment to "y" into the controlled statement when it should
> be a compilation error [however annoying those may be].

A programmer might write

if x > 0 so
# x := 0
y := -1

intending to edit the file to switch between the two lines. But it would
be better to write

if x > 0
# x := 0
y := -1
endif

Maybe this is an area in which a language designer can help but not
mandate sensible programming.

>
> Incidentally, many years ago I was playing with several large
> Pascal programs, inc two compilers, and with tweaking Pascal to be a
> better [IMHO] language by including things like "+:=", "endif", ...,
> and one of the things was explicit "skip" for empty statements. The
> tweaking was done by a transduction grammar and a pretty-printer, and
> once the "skip"s were put in and pretty-printed, several blunders of
> the sort above were uncovered and obvious, inc in the compiler.

Understood. That's the kind of reason I prefer a fully bracketed syntax.

>
> [Even more incidentally, if empty statements in Pascal are
> written as "skip", then, IIRC, statements don't need terminators, and
> the semicolons can all go. Or you could spell "skip" as ";", and then
> most semicolons can go. This is not necessarily an improvement, so I
> don't entirely recommend it.]
>

The semicolons in the example I gave earlier were not necessary. I only
put them in to make the point that an arbitrary number of semicolons and
empty lines could occur after the so, and that all would be ignored.


--
James Harris

David Brown

unread,
Mar 15, 2019, 9:29:11 AM3/15/19
to
On 15/03/2019 13:52, Bart wrote:
> On 15/03/2019 12:00, David Brown wrote:
>> On 14/03/2019 23:52, Bart wrote:
>>> On 14/03/2019 20:24, David Brown wrote:
>>>> On 14/03/2019 16:29, Bart wrote:
>>>
>>>> OK, I get that.  I still can't get you to tell me why you need
>>>> "record" and "struct" in the language, and what the difference is, but
>>>> maybe I'll just have to accept that you either don't know the
>>>> difference, or can't explain it.
>>>
>>> Perhaps you can first explain to me why Python needs attributes, named
>>> tuples AND, when emulating C-structs, the 'struct' module.
>>
>> No - I am asking /you/ to explain about /your/ language.  Can't you
>> understand that?
>
> My reasons are the same as Python's; named tuples can't do the job of
> creating data structures that can be passed to a C API that expects
> C-style structs.
>

In Python, these do very different jobs, and are used in very different
ways. Named tuples form types - the "struct" module is not for types at
all.

>> This is beginning to get somewhere.  "struct" is a structure where each
>> data member can be any type except variant, while "record" is a
>> structure where all the data members are of type "variant".  Is that
>> correct?
>
> Yes.

OK, progress!

>
>> That does, of course, lead right back to the question of why you want to
>> have both - rather than just allowing variants along with other types in
>> your structures (which you might call either "struct" or "record").
>
> Because it would be massively more complicated to implement. One
> (struct) has elements of mixed types and sizes; the other (record) has
> elements of one type (variant) and one size, and actually shares
> implementation code with lists of variants.

/Now/ we are getting somewhere. The distinction is to aid
implementation, not usage.

There is no fundamental reason why a struct member could not be a
variant. There is therefore no fundamental reason why you could not
have a "struct" whose members are all "variant" - thus appearing
identical to a record. Is that not correct?

>
> They are totally different types, other than both are compound types
> with elements accessed by name, so they share the same 'interface'.
>
> Note that my treatment of structs doesn't exist at all in most other
> scripting languages, you have to use third party libraries and various
> ad hoc approaches, eg:
>
>     import struct
>     v = struct.pack('hhl',1,2,3)
>
> Compare with my:
>
>     type S = struct     # built-in to the language
>         int16 a, b
>         int32 c
>    end
>
>    v := S(1,2,3)
>    v.a = v.b+v.c        # (Can this be done in Python? Perhaps
>                         # with another couple of libraries)

You may or may not be able to something similar in Python - I haven't
thought about it.

To me, it sounds like your language lies somewhere in between Python and
C. That might well be a reasonable compromise. (I know you always
assume I am criticising you and your language, no matter what I write -
but if you have read these posts carefully you may see that I have not
said there is anything wrong with having "struct" and "record". I have
been asking why you have the distinction and what they mean, not saying
they are bad ideas.)

>
> Since, to you, NOTHING is ever too much trouble, perhaps you'd care to
> explain to the Python guys how easy it would be to get rid of the struct
> module, and tweak one of the named-tuple/record-handling add-ons, to
> also cope with packed struct data like this.
>
> Because that is exactly what you suggesting to me.

Your language is not Python.

And I have not been saying you should get rid of "struct", or "record" -
I have been asking why you have them. Surely I can ask you /why/ you
have both?

For Python, on reason they have different things is that their types
contain a gazillion other bits and pieces, not just data slots for named
fields. Another reason is that everything in Python is somewhat
indirect - objects and names are distinct. Neither of these appears to
apply to your language, but maybe that's because I don't know enough
about it.

>
>> That is doing a different thing from what you first talked about.
>
> It's the same thing. You made the assumption that the 'a' in 'a:=a[i]'
> is always going to be a simple variable name, rather than be the same
> object.

I based my post on what you wrote, not what you didn't write.

>
> My point is simply that variant types introduce something that is alien
> to traditional type analysers of statically-typed languages.
>

Variant types have been part of statically typed languages since
forever. In basic programming language type theory, you have "product
types" which correspond approximately to structs, and "sum types" which
correspond approximately to unions with some method of distinguishing
which type is in use. Different languages make this sort of thing
neater or less neat - an implementation in C is possible but probably
really horrible, while the implementation in your language is neat and
simple.

For a different example, in Haskell (a statically typed functional
programming language), defining a type T that can either be an integer
or a list of type T elements is done by:

data T = Single Int | List [T]

(My Haskell experience is small, so I apologise if that is not quite
right. But it will be very close.)

Andy Walker

unread,
Mar 15, 2019, 10:25:49 AM3/15/19
to
On 15/03/2019 13:07, James Harris wrote:
>>> I can't say I've ever really taken to [trailing condition clauses]
>>> mainly because they read in the wrong order.
>>     Is that because you're reading them as C rather than as English?
> No, it's because the parts are written in the opposite order in which
> they will be processed.

But I expect your language still has "myproc (a[i+1], f(j))",
which will be processed typically as

i, 1, add, a, subscript, j, f, call, myproc, call

or near equivalent. Building trees and walking them in a different
order was a solved problem in CS more than half a century ago!

But, of course, whether or not your own language includes
"S1 if C1", "S2 unless C2", "S3 while C3" is entirely up to you, and
largely a matter of taste.

--
Andy Walker,
Nottingham.

Bart

unread,
Mar 15, 2019, 10:30:22 AM3/15/19
to
On 15/03/2019 13:29, David Brown wrote:
> On 15/03/2019 13:52, Bart wrote:

>> My reasons are the same as Python's; named tuples can't do the job of
>> creating data structures that can be passed to a C API that expects
>> C-style structs.
>>
>
> In Python, these do very different jobs, and are used in very different
> ways. Named tuples form types - the "struct" module is not for types at
> all.

It's the method used to create C-compatible structs. In C, a struct is a
type.

>> Because it would be massively more complicated to implement. One
>> (struct) has elements of mixed types and sizes; the other (record) has
>> elements of one type (variant) and one size, and actually shares
>> implementation code with lists of variants.
>
> /Now/ we are getting somewhere. The distinction is to aid
> implementation, not usage.

What's the distinction for in Python?

Of course you're welcome to create a language where you have one type
doing very different jobs, but you'd have to implement it.

You'd also however have the job of listing all the constraints needed on
that type when it is used for the things that C-like structs are used for.

> There is no fundamental reason why a struct member could not be a
> variant. There is therefore no fundamental reason why you could not
> have a "struct" whose members are all "variant" - thus appearing
> identical to a record. Is that not correct?

In /my/ language, I found much easier to segregate variant data types
from flat/packed data types (remember I said most scripting languages
don't bother with the latter at all; this is all a great convenience,
not a hindrance as you seem to think):

Variant Packed

List Array
Record Struct
Pointer Ref

Also (some terminal types):

Int Int8, int16, etc
Real Real32, Real64
etc

List: Sequence of variant elements accessed by index
Array: Homogeneous sequence of one packed type (including fixed-length
arrays and structs)

Record: Fixed sequence of variant elements accessed by name
Struct: Heterogeneous sequence of mixed packed types

Pointer: Pointer to a variant
Ref: Pointer to one packed type

There is some mixing because the execution model only manipulates
variants, so you can have - need to have - a variant that contains an
array, struct or ref.

Records and lists can also contain circular references (causing issues
when destroying or duplicating those types); structs and lists don't.

But when you design and implement this stuff, you can make your own choices.

Other languages don't have such parallel sets of types, because they
only do one or the other (C does 'packed'; Python does 'variant). So
they superficially appear simpler, until you need to do the equivalent
of 'variant' in C, or 'packed' in Python, and then the fun and games start.

(May reply to other points later.)

James Harris

unread,
Mar 15, 2019, 11:05:00 AM3/15/19
to
On 15/03/2019 14:25, Andy Walker wrote:
> On 15/03/2019 13:07, James Harris wrote:
>>>> I can't say I've ever really taken to [trailing condition clauses]
>>>> mainly because they read in the wrong order.
>>>     Is that because you're reading them as C rather than as English?
>> No, it's because the parts are written in the opposite order in which
>> they will be processed.
>
> But I expect your language still has "myproc (a[i+1], f(j))",
> which will be processed typically as
>
> i, 1, add, a, subscript, j, f, call, myproc, call

Sure, but when one sees a function call one knows what to expect. It's the

go and do something at this point in the program if x

that I'm not so keen on because one may be partway through reading what
is to be done before it becomes apparent that it may not be done at all!

>
> or near equivalent. Building trees and walking them in a different
> order was a solved problem in CS more than half a century ago!
>
> But, of course, whether or not your own language includes
> "S1 if C1", "S2 unless C2", "S3 while C3" is entirely up to you, and
> largely a matter of taste.
>

Indeed - it's just a preference. Here's another

transition_rate[i], volume[i], density[i], area[i] := f(a)

I have a similar dislike of that because it's not apparent when one
begins reading that it's an assignment statement. Just a preference.
Cannot be avoided, in some cases.

--
James Harris

James Harris

unread,
Mar 15, 2019, 11:24:30 AM3/15/19
to
On Wednesday, 13 March 2019 14:19:56 UTC, James Harris wrote:

...

[Re. letting programmers work with pointers]

>
> In fact, C got that from BCPL. Martin Richards deserves the credit for
> that 'breakthrough'! (Or, in fact, maybe he got it from CPL. Don't know.)

For the record, looking at the CPL documentation I can't see any similar pointer support or equivalence between pointers and array handling

https://academic.oup.com/comjnl/article/6/2/134/364746

BCPL, by contrast, unashamedly based its data model on a store with numbered storage cells. It seems, therefore, that the breakthrough was Martin Richards'.

https://www.cl.cam.ac.uk/teaching/0910/ConceptsPL/BCPL-C.pdf

Well done sir!


James


Andy Walker

unread,
Mar 15, 2019, 12:55:57 PM3/15/19
to
On 15/03/2019 13:27, James Harris wrote:
> Perhaps you miss the point. The 'empty statements' in the above are
> just empty space - like blank lines. I can't believe you would object
> to those so I am not clear as to what your objection is. All blank
> lines and semicolons in the above would be skipped/ignored.

I don't object to blank lines. See below about semicolons.
My point is about the danger of

if C so S1
S2

where deleting S1 promotes S2 silently to be the controlled statement
of the "if". Same, of course, with "while C do S1" and other similar
constructs which are [or should be] logically brackets. It's easy
to do that sort of thing by mistake, esp when "S2" is separated from
"if" by many lines of comment. If you have "if C so S1 endif", then
an overenthusiastic deletion gives you a bracket mismatch.

More generally, putting in "skip" or similar as a placeholder
where many languages accept blanks serves as confirmation that you
really did intend to do nothing here, you haven't just left something
out [or put something in]. In such languages, I personally often
use "# skip #" or equivalent simply as a reminder to myself that there
is an empty statement present; but of course the compiler can't check
for that.

[...]
> The semicolons in the example I gave earlier were not necessary. I
> only put them in to make the point that an arbitrary number of
> semicolons and empty lines could occur after the so, and that all
> would be ignored.

I think that's an historically unfortunate use of semicolons.
Any normal programmer sees ";" as a statement terminator/separator.
Thus "begin ;;;;; end" contains five or six empty statements, not
one. If ";" is just white space, then it's very misleading; even
more so if it's sometimes a terminator and sometimes ignored.

--
Andy Walker,
Nottingham.

Bart

unread,
Mar 15, 2019, 1:00:56 PM3/15/19
to
On 15/03/2019 13:29, David Brown wrote:
> On 15/03/2019 13:52, Bart wrote:

> To me, it sounds like your language lies somewhere in between Python and
> C. That might well be a reasonable compromise.

Well, between C and Python...

(I thought of it as highly dynamic until I saw how hyper-dynamic Python
is, probably too much for its own good.)

> Variant types have been part of statically typed languages since
> forever. In basic programming language type theory, you have "product
> types" which correspond approximately to structs, and "sum types" which
> correspond approximately to unions with some method of distinguishing
> which type is in use. Different languages make this sort of thing
> neater or less neat - an implementation in C is possible but probably
> really horrible, while the implementation in your language is neat and
> simple.
>
> For a different example, in Haskell (a statically typed functional
> programming language), defining a type T that can either be an integer
> or a list of type T elements is done by:
>
> data T = Single Int | List [T]
>
> (My Haskell experience is small, so I apologise if that is not quite
> right. But it will be very close.)

My knowledge of Haskell is even less. But I would say that when a
variable has a 'variant' type, then the number of possibilities needs to
be known before execution starts. (Excluding certain recursive types, if
that is possible.)

My variant types, which I started using 30 years ago for scripting code
attached to an application, are never known at compile time (although I
suppose some could be inferred). Example:

types := (list, array, bits, int, real, string, set)
hassubtypes := (array, bits)
hasdims := (list, array, bits, string, set)

to 10 do
t := types[random(types.bounds)]

if t in hassubtypes and t in hasdims then
x := new(t,(t=bits|bit|byte),random(1..100))
elsif t in hasdims then
x := new(t,random(1..100))
else
x := new(t)
fi

println "X is",x.type, (t in hasdims|x.bounds|"")
od

Output:

X is <tbits> 1..28
X is <tstring> 1..3
X is <tint>
X is <treal>
X is <tstring> 1..24
X is <tarray> 1..67
X is <treal>
X is <tbits> 1..59
X is <tbits> 1..14
X is <treal>

(Notice also the last item on that print statement; it will have type
'range' or 'string'. The print routine will have to deal with any
possibililty.

This kind of variant type can also give you lightweight generics.)

Rod Pemberton

unread,
Mar 16, 2019, 4:25:42 AM3/16/19
to
On Thu, 14 Mar 2019 17:35:58 +0000
James Harris <james.h...@gmail.com> wrote:

> On 14/03/2019 11:21, Rod Pemberton wrote:

> > As for "push eax ... pop ebx", you could do a "mov ebx, eax" or
> > "xchg ebx, eax". Or, you could "lea ebx, [B] ...", i.e., B against
> > ebx, which would allow you to eliminate the esp/ebp prologue and
> > epilogue code, as you'd be working against registers without stack
> > parameters.
>
> Remember that this is the output from an initial code generator.

Yes, I know. It reminded me of code emitted by Ron Cain's SmallC.
Did you ever see the code emitted by Ron Cain's SmallC?

https://groups.google.com/d/msg/comp.lang.forth/19SuApMcw30/p8WaWx8jlMsJ
Usenet msg-id iq9v6t$dkk$1...@speranza.aioe.org

Of course, if one assumes that every binary and logical operation has
two arguments, then they can both be kept in registers, first in eax,
second in ebx.


Rod Pemberton
--
Apple opposes "glorifying violence" and "dehumanizing language". Yet,
it manufactures products in China which commits crimes against humanity.

Rod Pemberton

unread,
Mar 16, 2019, 4:26:00 AM3/16/19
to
On Fri, 15 Mar 2019 13:27:46 +0000
James Harris <james.h...@gmail.com> wrote:

> On 15/03/2019 11:42, Andy Walker wrote:
> > On 14/03/2019 18:37, James Harris wrote:

> > ISTM that in solving one problem, you've created another.
> > If you edited/commented out what was intended to be the controlled
> > statement of "x > 0", then you've accidentally and silently turned
> > the assignment to "y" into the controlled statement when it should
> > be a compilation error [however annoying those may be].
>
> A programmer might write
>
> if x > 0 so
> # x := 0
> y := -1
>
> intending to edit the file to switch between the two lines. But it
> would be better to write
>
> if x > 0
> # x := 0
> y := -1
> endif
>
> Maybe this is an area in which a language designer can help but not
> mandate sensible programming.
>

Was removing the 'so' for the second example intentional? I.e., the
presence of the 'so' can simplify parsing, for when long lines can be
continued to the next. In other words, you can tell where the starting
curly brace would go in C, if 'so' where present.

Rod Pemberton

unread,
Mar 16, 2019, 4:26:18 AM3/16/19
to
On Fri, 15 Mar 2019 14:25:37 +0000
Andy Walker <a...@cuboid.co.uk> wrote:

> On 15/03/2019 13:07, James Harris wrote:

> > No, it's because the parts are written in the opposite order in
> > which they will be processed.
>
> But I expect your language still has "myproc (a[i+1], f(j))",
> which will be processed typically as
>
> i, 1, add, a, subscript, j, f, call, myproc, call
>
> or near equivalent. Building trees and walking them in a different
> order was a solved problem in CS more than half a century ago!

I'm not sure James is using an AST. Elsewhere, he said he was directly
converting to assembly. So, he would emit a sequence of assembly
sequentially for that, if possible.

Rod Pemberton

unread,
Mar 16, 2019, 4:26:33 AM3/16/19
to
On Fri, 15 Mar 2019 11:09:06 +0000
Andy Walker <a...@cuboid.co.uk> wrote:


> What you don't want is
>
> begin S1;
> ...;
> # many lines of statements #
> ...
> end unless C1
>
> where the whole structure of what you're reading comes as a surprise.

I run into that with C. On more than one occasion, I've considered
moving the block into it's own procedure to compact the structure for
readability, although I don't usually do so. Usually, I've found that
if I restructure the code, the mess goes away.

Rod Pemberton

unread,
Mar 16, 2019, 4:27:14 AM3/16/19
to
On Fri, 15 Mar 2019 11:53:10 +0000
Bart <b...@freeuk.com> wrote:

> Myself, I've gotten used to 'else' for everything (if, if/elsif,
> switch, case, even loops ala Python) unless I'm using "|" in the
> compact form.

'else' can be a code maintenance issue. When the logic for the 'if'
portion is to be changed, sometimes the logic for the 'else' portion
must remain the same as the original if-else conditions. This means
breaking apart the if-else into two if's. The question then becomes
for a programmer who didn't write the code: "Do I preserve the logic
for the 'else' or does it change when the 'if' changes?" ...

> (But probably one of the worst is C's:
>
> default:

I'm surprised you didn't say C's 'break', since it falls through if you
forget it, which is easy to do if you have a large block. And, most
programmers don't seem to know how to properly code fall-through logic
anymore, i.e., commonly learned with unstructured languages like
BASIC. I usually add comments to make sure that if anyone ever looks
at my code, that they'll know the fall-through was intentional.

Rod Pemberton

unread,
Mar 16, 2019, 4:56:54 AM3/16/19
to
On Sat, 16 Mar 2019 04:28:37 -0400
Rod Pemberton <inv...@lkntrgzxc.com> wrote:

> On Fri, 15 Mar 2019 13:27:46 +0000
> James Harris <james.h...@gmail.com> wrote:

> starting curly brace would go in C, if 'so' where present.

s/where/were/

RP

James Harris

unread,
Mar 16, 2019, 6:26:26 AM3/16/19
to
On 16/03/2019 08:28, Rod Pemberton wrote:
> On Thu, 14 Mar 2019 17:35:58 +0000
> James Harris <james.h...@gmail.com> wrote:

>> Remember that this is the output from an initial code generator.
>
> Yes, I know. It reminded me of code emitted by Ron Cain's SmallC.

...

> Did you ever see the code emitted by Ron Cain's SmallC?
>
> https://groups.google.com/d/msg/comp.lang.forth/19SuApMcw30/p8WaWx8jlMsJ
> Usenet msg-id iq9v6t$dkk$1...@speranza.aioe.org

You are right. In fact, it is essentially identical! The only difference
is in that I do deref by

mov eax, [eax]

Addressing off AX is not possible on the 8086 so Cain's code transfers
the address to BX first as in

MOV BX,AX
MOV AX,[BX]

Other than that the code is the same, even down to the register names used!

>
> Of course, if one assumes that every binary and logical operation has
> two arguments, then they can both be kept in registers, first in eax,
> second in ebx.

There is a caveat to that. Consider any binary operation between two
arbitrary subexpressions

e1 binop e2

It will parse e1 and put the result in wherever it puts results - say
EAX. Then, because parsing of e2 will overwrite EAX it naturally saves
the old EAX on the stack. Once e2 has been evaluated, its value will be
in EAX. It makes sense, therefore, to pop the stack into another
register such as EBX before combining the two. So at that point it's
natural for the first value to be in EBX and the second in EAX. (The
result of applying the binop to them still has to leave the result in EAX.)


--
James Harris

James Harris

unread,
Mar 16, 2019, 6:42:39 AM3/16/19
to
On 16/03/2019 08:28, Rod Pemberton wrote:
Yes, it was intentional. The idea of 'so' is to give the programmer the
option to avoid the use of endif if the action is a single statement. In
a given 'if' statement a programmer could use one or the other, but
never both. In other words, one could not have

if cond so action ... endif

That would not make sense; the 'so' keyword would effectively insert an
endif immediately after the action so the later endif would not be
relevant.

Put another way, the following three fragments would be the same as each
other.

if cond
action
endif

if cond so
action

if cond so action

It would be the programmer's choice as to which to use. The nearest
equivalents in C would be

if (cond) {
action
}

if (cond)
action

if (cond) action

I use all three such arrangements in C, according to what I think makes
the code the most readable. Maybe you do likewise?


--
James Harris

James Harris

unread,
Mar 16, 2019, 6:44:34 AM3/16/19
to
On 16/03/2019 08:29, Rod Pemberton wrote:
> On Fri, 15 Mar 2019 11:53:10 +0000
> Bart <b...@freeuk.com> wrote:
>
>> Myself, I've gotten used to 'else' for everything (if, if/elsif,
>> switch, case, even loops ala Python) unless I'm using "|" in the
>> compact form.
>
> 'else' can be a code maintenance issue. When the logic for the 'if'
> portion is to be changed, sometimes the logic for the 'else' portion
> must remain the same as the original if-else conditions. This means
> breaking apart the if-else into two if's. The question then becomes
> for a programmer who didn't write the code: "Do I preserve the logic
> for the 'else' or does it change when the 'if' changes?" ...

That sounds interesting but I am not sure what you mean. Could you give
an example?


--
James Harris

Bart

unread,
Mar 16, 2019, 7:01:08 AM3/16/19
to
On 16/03/2019 08:29, Rod Pemberton wrote:
> On Fri, 15 Mar 2019 11:53:10 +0000
> Bart <b...@freeuk.com> wrote:
>
>> Myself, I've gotten used to 'else' for everything (if, if/elsif,
>> switch, case, even loops ala Python) unless I'm using "|" in the
>> compact form.
>
> 'else' can be a code maintenance issue.

No more than anything else.

>
> I'm surprised you didn't say C's 'break', since it falls through if you
> forget it, which is easy to do if you have a large block.

That's one of many, many issues I have with C, but this was about
handling 'else'.

(C might have been OK-ish in the 1970s, but unfortunately it's never
been replaced by one that doesn't look like it was thrown together by a
couple of students over a weekend, and for a lark. The joke seems to
have worn rather thin.)

Andy Walker

unread,
Mar 16, 2019, 7:16:02 AM3/16/19
to
On 16/03/2019 08:28, Rod Pemberton wrote:
[James, re "S unless C"]
>>> No, it's because the parts are written in the opposite order in
>>> which they will be processed.
[me:]
>> But I expect your language still has "myproc (a[i+1], f(j))",
>> which will be processed typically as
>> i, 1, add, a, subscript, j, f, call, myproc, call
>> or near equivalent. Building trees and walking them in a different
>> order was a solved problem in CS more than half a century ago!
> I'm not sure James is using an AST. Elsewhere, he said he was directly
> converting to assembly. So, he would emit a sequence of assembly
> sequentially for that, if possible.

OK, but in that case he has to do a lot of stacking in the
assembly so as to process procedure parameters before calling the
procedure. Similarly even with simple arithmetic: "(a+b)*(c+d)"
typically has to be processed into RPN "a b + c d + *". There's
no real difficulty in parsing "S unless C" into something like

J a, b:, S, J c, a:, C, JF b, c:

[where "J x" means "jump to label x" and "JF x" means "jump on false
to x"]. Voila, a sequential process, and you can leave everything to
the assembler, with perhaps an optimisation phase to "improve" jumps.
[Some assemblers can do that for you automatically.]

There certainly can be readability issues if constructs are,
in some sense, out of natural order, but even with a very simple-
minded parser and assembler it makes no sense to me to say that the
construct is OK in English but not in a computer language.

--
Andy Walker,
Nottingham.

James Harris

unread,
Mar 16, 2019, 7:43:49 AM3/16/19
to
On 16/03/2019 11:15, Andy Walker wrote:
> On 16/03/2019 08:28, Rod Pemberton wrote:
> [James, re "S unless C"]
>>>> No, it's because the parts are written in the opposite order in
>>>> which they will be processed.
> [me:]
>>> But I expect your language still has "myproc (a[i+1], f(j))",
>>> which will be processed typically as
>>> i, 1, add, a, subscript, j, f, call, myproc, call
>>> or near equivalent. Building trees and walking them in a different
>>> order was a solved problem in CS more than half a century ago!
>> I'm not sure James is using an AST. Elsewhere, he said he was directly
>> converting to assembly. So, he would emit a sequence of assembly
>> sequentially for that, if possible.
>
> OK, but in that case he has to do a lot of stacking in the
> assembly so as to process procedure parameters before calling the
> procedure.

I had to work out a way to evaluate arguments left to right and then
push them right to left (so that they are in the correct left-to-right
order for the callee). That was fun! I wrote-up the scheme I came up
with in the 'Function calls' section of

https://groups.google.com/d/msg/comp.lang.misc/gvGWiumivqM/9JRf4hseCAAJ

>
> Similarly even with simple arithmetic: "(a+b)*(c+d)"
> typically has to be processed into RPN "a b + c d + *".

I am not sure of the point you guys are discussing but, to be clear, I
didn't have to traverse a tree data structure (or any data structure) to
emit code for expressions. What I did was use an expression parser I
came up with in 2013. (It's scary how long ago that was!)

https://groups.google.com/d/msg/comp.lang.misc/whv09M1V80U/5nClwWrj4iAJ

It doesn't have to build a data structure or convert to any intermediate
stage. It simply converts an expression on the fly and uses only the
normal call stack. That was important as it required no memory
management and made it easy to implement in assembly.


--
James Harris

Bart

unread,
Mar 16, 2019, 9:26:17 AM3/16/19
to
On 16/03/2019 11:15, Andy Walker wrote:
> On 16/03/2019 08:28, Rod Pemberton wrote:
> [James, re "S unless C"]
>>>> No, it's because the parts are written in the opposite order in
>>>> which they will be processed.
> [me:]
>>>     But I expect your language still has "myproc (a[i+1], f(j))",
>>> which will be processed typically as
>>>      i, 1, add, a, subscript, j, f, call, myproc, call
>>> or near equivalent.  Building trees and walking them in a different
>>> order was a solved problem in CS more than half a century ago!
>> I'm not sure James is using an AST.  Elsewhere, he said he was directly
>> converting to assembly.  So, he would emit a sequence of assembly
>> sequentially for that, if possible.
>
>     OK, but in that case he has to do a lot of stacking in the
> assembly so as to process procedure parameters before calling the
> procedure.

It sounds like a one-pass compiler; any stacking will be done by the
recursive nature of the parsing functions.

  Similarly even with simple arithmetic:  "(a+b)*(c+d)"
> typically has to be processed into RPN "a b + c d + *".

With one-pass, the output has to be done immediately.

  There's
> no real difficulty in parsing "S unless C" into something like
>
>   J a, b:, S, J c, a:, C, JF b, c:

I don't understand what this does. But I can see there can be
difficulties in processing 'S unless C' with a one-pass compiler.

That will want to generate S immediately, but if that turns out to be
conditional, it would have needed to be preceded by code that checks C.

There are various workarounds, especially if the output code is in
memory and can be modified. But with a compiler directly generating ASM
source to file, for example, there are obvious difficulties.


> [where "J x" means "jump to label x" and "JF x" means "jump on false
> to x"].  Voila, a sequential process, and you can leave everything to
> the assembler, with perhaps an optimisation phase to "improve" jumps.
> [Some assemblers can do that for you automatically.]
>
>     There certainly can be readability issues if constructs are,
> in some sense, out of natural order, but even with a very simple-
> minded parser and assembler it makes no sense to me to say that the
> construct is OK in English but not in a computer language.

In English someone will wait until the end of a sentence before acting
on what it means.

James Harris

unread,
Mar 16, 2019, 9:28:28 AM3/16/19
to
On 15/03/2019 14:25, Andy Walker wrote:
> On 15/03/2019 13:07, James Harris wrote:
>>>> I can't say I've ever really taken to [trailing condition clauses]
>>>> mainly because they read in the wrong order.
>>>     Is that because you're reading them as C rather than as English?
>> No, it's because the parts are written in the opposite order in which
>> they will be processed.
>
> But I expect your language still has "myproc (a[i+1], f(j))",
> which will be processed typically as
>
> i, 1, add, a, subscript, j, f, call, myproc, call
>
> or near equivalent. Building trees and walking them in a different
> order was a solved problem in CS more than half a century ago!

Yes, the language allows such expressions. They are eventually to be
processed via a data structure. At the moment, though, the initial
compiler processes the whole thing left to right, i.e.

myproc; a; i + 1; index; parm; f; j; parm; call; parm; call;


--
James Harris

Bart

unread,
Mar 16, 2019, 5:25:51 PM3/16/19
to
On 16/03/2019 13:26, Bart wrote:
> On 16/03/2019 11:15, Andy Walker wrote:

>   There's
>> no real difficulty in parsing "S unless C" into something like
>>
>>    J a, b:, S, J c, a:, C, JF b, c:
>
> I don't understand what this does.

I've decoded this to mean:

jump L2
L1:
evaluate(S)....
jump L3
L2:
evaluate(C)...
jumpfalse L1
L3:

(I think such intermediate codes need to be readable; there's no
advantage in having them cryptic.)

>> [where "J x" means "jump to label x" and "JF x" means "jump on false
>> to x"].  Voila, a sequential process, and you can leave everything to
>> the assembler, with perhaps an optimisation phase to "improve" jumps.
>> [Some assemblers can do that for you automatically.]

The problem is still having to assume that EVERY such statement
(including every individual statement making up S, and possibly even
every such term depending on design) could have a trailing condition.
This means writing every statement sequence S as:

jump L2
L1:
evaluate(S) # may include multiple such jumps
jump L3
L2:
jump L1 # not conditional this time

just because of the off-chance of it being conditional. Maybe this can
be fixed up later into just:

evaluate(S)

but that is effectively relying on an extra pass.

The solution is really to just do an extra pass or two anyway.
Efficiency is not going to be an issue until you need to compile 100K or
200K lines in one go. Done properly, then you can directly generate:

evaluate(C)
jumptrue L1
evaluate(S)
L1:

Andy Walker

unread,
Mar 17, 2019, 10:59:36 AM3/17/19
to
On 16/03/2019 13:28, James Harris wrote:
>>     But I expect your language still has "myproc (a[i+1], f(j))",
>> which will be processed typically as
>>      i, 1, add, a, subscript, j, f, call, myproc, call
>> or near equivalent.  Building trees and walking them in a different
>> order was a solved problem in CS more than half a century ago!
> Yes, the language allows such expressions. They are eventually to be
> processed via a data structure. At the moment, though, the initial
> compiler processes the whole thing left to right, i.e.
>   myproc; a; i + 1; index; parm; f; j; parm; call; parm; call;

OK, but that means "call" has to retrieve "f" or "myproc" from
several elements buried in the stack, or equivalent; and in an arithmetic
expression you typically have to stack operands and then call operators.
How much of this you have to do depends on what operators and address
modes are available, but you can't normally work textually strictly left
to right. These are all easily solved [non-]problems, of which the
solution has been known since the 1950s [eg, conversion to RPN using a
stack]. All I'm getting it is that is seems illogical to me to complain
that "S unless C" involves processing "C" before "S", but not that
"E1 + E2" involves processing "E1" and "E2" before you can add them.

Human readability *is* an issue, and is no doubt the reason why
"S unless C" is relatively uncommon [and mostly occurs in statemen-per-
line languages, where you don't have to read far ahead to find that "S"
is not a stand-alone statement]. Otherwise, we would all have switched
to RPN for ordinary arithmetic.

--
Andy Walker,
Nottingham.

Bart

unread,
Mar 17, 2019, 12:48:36 PM3/17/19
to
On 17/03/2019 14:59, Andy Walker wrote:

> All I'm getting it is that is seems illogical to me to complain
> that "S unless C" involves processing "C" before "S", but not that
> "E1 + E2" involves processing "E1" and "E2" before you can add them.

That's not the same. E1 will always need to be evaluated in E1+E2. When
you get to the "+", then you will know what needs to be done next.

Andy Walker

unread,
Mar 17, 2019, 1:18:30 PM3/17/19
to
On 16/03/2019 13:26, Bart wrote:
>>      OK, but in that case he has to do a lot of stacking in the
>> assembly so as to process procedure parameters before calling the
>> procedure.
> It sounds like a one-pass compiler; any stacking will be done by the
> recursive nature of the parsing functions.

There's a distinction between compile-time stacking [which can
use the recursive nature of parsing] and run-time stacks. Emitted code
has to do things at run-time.


>>   Similarly even with simple arithmetic:  "(a+b)*(c+d)"
>> typically has to be processed into RPN "a b + c d + *".
> With one-pass, the output has to be done immediately.

FSVO "immediately". Typically, operands are "done" at the
time, operators are stacked [by the compiler] and "done" when they
are unstacked [in accordance with priorities].

> [...] I can see there can be difficulties in processing 'S unless C'
> with a one-pass compiler.
> That will want to generate S immediately, but if that turns out to be
> conditional, it would have needed to be preceded by code that checks
> C.
> There are various workarounds, especially if the output code is in
> memory and can be modified. But with a compiler directly generating
> ASM source to file, for example, there are obvious difficulties.

Some assemblers can do optimisations, inc of jumps, that allow
ASM to be re-ordered or re-worked in various ways. Eg, IIRC, the
Amsterdam Compiler Kit could move blocks of assembler around, so that
"S unless C" could be compiled essentially as

L1: assembler for S

and now if the next symbol was ";", or some other plain-statement end,
then the label could be forgotten, but if it was "unless", then "C"
would be compiled to get

L2: assembler for C, jump_on_true L3, L3:

followed by an assembler directive to swap the blocks from L1 to L2 and
from L2 to L3.

I did something similar in a compiler once; the tokeniser put
out a label followed by the token. The labels were mostly sequential,
but had gaps so that later tokens could have alphabetically-earlier
labels; then the output went [quite literally] through the Unix "sort"
command and a simple "sed" script that stripped the labels.

--
Andy Walker,
Nottingham.

Andy Walker

unread,
Mar 17, 2019, 1:28:45 PM3/17/19
to
Not in general; you don't know the type of "E2" yet, so you
don't know which version of "+" to emit. Esp, but not only, if your
language has user-defined operators, so that "E2" could be anything
from an "int" to a "long long int", to an array of unions of pointers
to procedures and structures containing an int and two strings, to
whatever other weird and wonderful type an object may have.

--
Andy Walker,
Nottingham.

James Harris

unread,
Mar 17, 2019, 3:21:37 PM3/17/19
to
On 17/03/2019 14:59, Andy Walker wrote:
> On 16/03/2019 13:28, James Harris wrote:
>>>     But I expect your language still has "myproc (a[i+1], f(j))",
>>> which will be processed typically as
>>>      i, 1, add, a, subscript, j, f, call, myproc, call
>>> or near equivalent.  Building trees and walking them in a different
>>> order was a solved problem in CS more than half a century ago!
>> Yes, the language allows such expressions. They are eventually to be
>> processed via a data structure. At the moment, though, the initial
>> compiler processes the whole thing left to right, i.e.
>>   myproc; a; i + 1; index; parm; f; j; parm; call; parm; call;
>
> OK, but that means "call" has to retrieve "f" or "myproc" from
> several elements buried in the stack, or equivalent; and in an arithmetic
> expression you typically have to stack operands and then call operators.
> How much of this you have to do depends on what operators and address
> modes are available, but you can't normally work textually strictly left
> to right.

Indeed. The expression parser I linked to before was designed to help
build tree nodes. Then, when walking the tree, the function-name node
can be visited at the ideal point.

But a different approach was called for here as no tree was to be built.
Fortunately, it turned out that the function address could be pushed as
it was encountered and the stack-distance to that value kept track of.

In this case, when the compiler gets to myproc it emits

lea eax, [myproc]
push eax ;Function address

then, later, when it has parsed any parameters in parens it retrieves
the function address and calls the function with

mov eax, [esp + 16] ;Function address
call eax

where the 16 is the distance the compiler has kept track of.

Incidentally, this approach doesn't need to know that a certain value is
a function address until it sees the left paren. As such, the parsing is
perfectly regular. Any legitimate value can be a function address. For
example,

myproc(x)
a[b + 1](y)
p$$(z)

A left paren as a trailing operator will treat as a function address
whatever value the preceding expression evaluates to. (The $ is a
pointer dereference so the last example basically follows a pointer to
get a function address.)

> These are all easily solved [non-]problems, of which the
> solution has been known since the 1950s [eg, conversion to RPN using a
> stack]. All I'm getting it is that is seems illogical to me to complain
> that "S unless C" involves processing "C" before "S", but not that
> "E1 + E2" involves processing "E1" and "E2" before you can add them.
>
> Human readability *is* an issue, and is no doubt the reason why
> "S unless C" is relatively uncommon [and mostly occurs in statemen-per-
> line languages, where you don't have to read far ahead to find that "S"
> is not a stand-alone statement]. Otherwise, we would all have switched
> to RPN for ordinary arithmetic.
>

Yes, my point was purely about a human reading the code, not a compiler
parsing it.


--
James Harris

Bart

unread,
Mar 17, 2019, 3:38:43 PM3/17/19
to
On 17/03/2019 17:28, Andy Walker wrote:
> On 17/03/2019 16:48, Bart wrote:
>>> All I'm getting it is that is seems illogical to me to complain
>>> that "S unless C" involves processing "C" before "S", but not that
>>> "E1 + E2" involves processing "E1" and "E2" before you can add them.
>> That's not the same. E1 will always need to be evaluated in E1+E2.
>> When you get to the "+", then you will know what needs to be done
>> next.
>
>     Not in general;  you don't know the type of "E2" yet, so you
> don't know which version of "+" to emit.

The code for "+" usually follows that for E2. By then you will know
which kind it is, or whether it is further deferred because you are
compiling "E1+E2*E3".

The question here is whether a statement in the source is conditional,
then the evaluation has to be guarded by code that will check at runtime
whether it is to be executed.

With a one-pass compiler, the language's design can make it easy: 'if C
then S fi' (you know in advance that S conditional) or hard: 'S if C'
(when you don't).

With arbitrarily complex expressions and statements, you can be looking
at 100 lines of code for S before you know it is unconditional.

With a multi-pass compiler, that is no problem at all. In mine, if P
represents what has been read so far in S (that 100 lines of code), and
you are positioned at the following token, then code such as the
following can implement 'S if C':

if symbol=kifsym then
lex()
P := createunit(j_if, readexpression(), P) # guard P
fi

Four lines. The resulting AST is identical 'if C then S fi'.

  Esp, but not only, if your
> language has user-defined operators, so that "E2" could be anything
> from an "int" to a "long long int", to an array of unions of pointers
> to procedures and structures containing an int and two strings, to
> whatever other weird and wonderful type an object may have.

That's probably unlikely to be a one-pass compiler then.

(The last compiler of that kind of did was for byte-code. Then there was
only one "add" opcode, no matter what types were involved.)


Andy Walker

unread,
Mar 17, 2019, 6:59:32 PM3/17/19
to
On 17/03/2019 19:38, Bart wrote:
[E1+E2:]
>>> When you get to the "+", then you will know what needs to be done
>>> next.
>>      Not in general;  you don't know the type of "E2" yet, so you
>> don't know which version of "+" to emit. [...]
>> Esp, but not only, if your
>> language has user-defined operators, so that "E2" could be anything
>> from an "int" to a "long long int", to an array of unions of pointers
>> to procedures and structures containing an int and two strings, to
>> whatever other weird and wonderful type an object may have.
> That's probably unlikely to be a one-pass compiler then.

The Malvern A68-R compiler was single-pass, and had only local
optimisations; yet outperformed almost all other compilers for ICL 1900s.
Algorithms to convert expressions into RPN have been around "forever", and
need little more than an operator stack -- you don't need to "identify"
the operator until you're about to emit code for it, after that for "E1"
and "E2".

--
Andy Walker,
Nottingham.

Bart

unread,
Mar 17, 2019, 9:14:48 PM3/17/19
to
On 17/03/2019 22:59, Andy Walker wrote:
> On 17/03/2019 19:38, Bart wrote:
> [E1+E2:]
>>>> When you get to the "+", then you will know what needs to be done
>>>> next.
>>>      Not in general;  you don't know the type of "E2" yet, so you
>>> don't know which version of "+" to emit. [...]
>>> Esp, but not only, if your
>>> language has user-defined operators, so that "E2" could be anything
>>> from an "int" to a "long long int", to an array of unions of pointers
>>> to procedures and structures containing an int and two strings, to
>>> whatever other weird and wonderful type an object may have.
>> That's probably unlikely to be a one-pass compiler then.
>
>     The Malvern A68-R compiler was single-pass, and had only local
> optimisations;

I don't know enough about that ancient compiler to comment.

(My own languages can no longer be practically implemented as
single-pass because they allow out-of-order definitions (unless I make
some concessions such as banning circular imports, and only allowing
out-of-order for functions). I notice that with A68G, a call to a
function defined later in the program doesn't work.

  yet outperformed almost all other compilers for ICL 1900s.
> Algorithms to convert expressions into RPN

Unless RPN is the actual target, then converting RPN to the final code
would constitute an extra pass.

have been around "forever", and
> need little more than an operator stack -- you don't need to "identify"
> the operator until you're about to emit code for it, after that for "E1"
> and "E2".

Whether a program can be compiled with a single pass of its source code,
and no further passes of any subsequent representations such as ASTs and
IRs, is mildly interesting, but there is no reason for any compiler
these days to be so restricted.

The one described in the OP is a temporary bootstrapping compiler (and
written in ASM), so needs to be kept simple.

And the Malvern one had to be run on some very limited equipment.

One-pass introduces some limitations in the design of a language, and
the quality of the implementation, IMV, even if in theory they could
compile any language.

David Brown

unread,
Mar 18, 2019, 4:16:57 AM3/18/19
to
On 15/03/2019 15:30, Bart wrote:
> On 15/03/2019 13:29, David Brown wrote:
>> On 15/03/2019 13:52, Bart wrote:
>
>>> My reasons are the same as Python's; named tuples can't do the job of
>>> creating data structures that can be passed to a C API that expects
>>> C-style structs.
>>>
>>
>> In Python, these do very different jobs, and are used in very different
>> ways.  Named tuples form types - the "struct" module is not for types at
>> all.
>
> It's the method used to create C-compatible structs. In C, a struct is a
> type.
>
>>> Because it would be massively more complicated to implement. One
>>> (struct) has elements of mixed types and sizes; the other (record) has
>>> elements of one type (variant) and one size, and actually shares
>>> implementation code with lists of variants.
>>
>> /Now/ we are getting somewhere.  The distinction is to aid
>> implementation, not usage.
>
> What's the distinction for in Python?

They are completely different things in Python - that's the distinction.

>
> Of course you're welcome to create a language where you have one type
> doing very different jobs, but you'd have to implement it.

Don't misunderstand me here. I am not saying there is anything wrong
with making the distinction for the benefit of the implementation. All
programming languages are a balance between what the programmer might
want, and what the implementation provides. If you think it is
relatively rare that the programmer would want to mix concrete fixed
types with variants in the same structure, and that keeping them
separate means a simpler and faster implementation, then that's a
perfectly reasonable design decision for the language. I am merely
trying to understand the purpose behind the record/struct split, and now
you have given me it.

>
> You'd also however have the job of listing all the constraints needed on
> that type when it is used for the things that C-like structs are used for.
>
>> There is no fundamental reason why a struct member could not be a
>> variant.  There is therefore no fundamental reason why you could not
>> have a "struct" whose members are all "variant" - thus appearing
>> identical to a record.  Is that not correct?
>
> In /my/ language, I found much easier to segregate variant data types
> from flat/packed data types (remember I said most scripting languages
> don't bother with the latter at all; this is all a great convenience,
> not a hindrance as you seem to think):
>
>    Variant   Packed
>
>    List      Array
>    Record    Struct
>    Pointer   Ref
>
> Also (some terminal types):
>
>    Int       Int8, int16, etc
>    Real      Real32, Real64
>    etc
>
> List:    Sequence of variant elements accessed by index
> Array:   Homogeneous sequence of one packed type (including fixed-length
>          arrays and structs)
>
> Record:  Fixed sequence of variant elements accessed by name
> Struct:  Heterogeneous sequence of mixed packed types
>
> Pointer: Pointer to a variant
> Ref:     Pointer to one packed type
>
> There is some mixing because the execution model only manipulates
> variants, so you can have - need to have - a variant that contains an
> array, struct or ref.
>
> Records and lists can also contain circular references (causing issues
> when destroying or duplicating those types); structs and lists don't.
>
> But when you design and implement this stuff, you can make your own
> choices.

It sounds like you are attempting a rather careful balancing act, with
one foot in dynamic type land, and one foot in static type land. I said
previously that you are sitting between C and Python - perhaps C++ and
Python is more accurate. In C++, the language designers have found a
way to handle dynamic run-time determined types within a statically
typed framework (through structural class inheritance and polymorphism).
In Python, everything is handled as dynamic types (with duck typing),
therefore requiring wrappers for handling interfacing to foreign
functions. Your language supports both dynamic types native to the
language, and static types for interfacing. Only you can tell us how
successful that is in practice.

>
> Other languages don't have such parallel sets of types, because they
> only do one or the other (C does 'packed'; Python does 'variant). So
> they superficially appear simpler, until you need to do the equivalent
> of 'variant' in C, or 'packed' in Python, and then the fun and games start.
>
> (May reply to other points later.)
>

Andy Walker

unread,
Mar 18, 2019, 8:53:49 AM3/18/19
to
On 18/03/2019 01:14, Bart wrote:
> [...] I notice that with A68G, a call
> to a function defined later in the program doesn't work.

Yes it does, otherwise [eg] mutually-recursive functions would
be impossible. For [a very simple] example:

$ a68g -p "INT p = f; PROC f = INT: 2+3; p"
+5

Did you fall foul of some version of

$ a68g -p "INT p = f; PROC INT f := INT: 2+3; p"
1 (print (((INT p = f; PROC INT f := INT: 2+3; p), new line)))
1
a68g: runtime error: 1: attempt to use an uninitialised REF PROC INT value

[spot the difference]?

>> Algorithms to convert expressions into RPN
> Unless RPN is the actual target, then converting RPN to the final
> code would constitute an extra pass.

Not in any conventional sense of "extra pass". You don't re-parse
the RPN, you emit code as the algorithm progresses; all that is required
is that the compiler maintains an operator stack onto which operators are
pushed and from which they are popped when the time comes to emit the code
for them. By then, you have compiled the operands, so you know their types
and can emit code for the right version of "+" [or whatever].

[...]> One-pass introduces some limitations in the design of a language, and
> the quality of the implementation, IMV, even if in theory they could
> compile any language.

Yes, but your point was that a compiler for a language that has
user-defined operators is unlikely to be single-pass. Not so, or at least
no more so than the modern-day unlikelihood of a compiler being single-
pass anyway. User-defined operators are not a limitation on single-pass
compilation. The design limitation, in general, is use before definition;
but even that is easy to circumvent, as you don't need to know everything
about a type or function before it is used textually in the program, only
before it is applied when the program is run.

As for quality of implementation, well, that's in the eye of the
beholder. A single-pass compiler is unlikely to do much more than peep-hole
optimisation; how far that is from "optimal" depends substantially on the
quality of the person doing the implementation, but also on qualities of
the language.

But the distinction between single- and multi-pass is blurring.
You don't nowadays need to write out intermediate representations between
passes; you can use threads to run phases of compilation concurrently.
OTOH, if your system provides lexers, parsers, linkers and loaders as
stand-alone facilities, then as a compiler-writer you need a good reason
not to use them. Phases such as optimisation are often optional; but
if you're not careful, you finish up with monstrosities where there are
so many flags and options that invoking the compiler uses a complicated
language [and shell script!] in its own right. Gcc has a manual entry
of nearly 18000 lines; no normal person understands all that, it's a
decently large book just to explain how to compile a C program.

--
Andy Walker,
Nottingham.

James Harris

unread,
Mar 18, 2019, 9:30:18 AM3/18/19
to
On 18/03/2019 12:53, Andy Walker wrote:
> On 18/03/2019 01:14, Bart wrote:

...

>> Unless RPN is the actual target, then converting RPN to the final
>> code would constitute an extra pass.
>
> Not in any conventional sense of "extra pass".

Just to jump in, the idea of a translation "pass" is surely archaic. It
applied when memories were so small that a translator had to read the
source a certain number of times, e.g. once to work out offsets and the
second to emit code.

It applies today really only as a concept. A language might be described
as single pass if a compiler /could/ emit code on the fly.

Most compilers, today, read the source /once/ and build internal data
structures from which they emit code; they never have to make another
pass over the source.

My current compiler reads the source once but rather than converting to
machine code it converts to assembly. I don't know whether that would be
called a single-pass compiler or not. On its own it makes just a single
pass. But it exploits the assembler's multi-pass facilities to generate
machine code.


--
James Harris

Pascal J. Bourguignon

unread,
Mar 18, 2019, 9:59:18 AM3/18/19
to
The first compiler had 63 passes.
http://ibm-1401.info/1401-FORTRAN-Illustrated.html
But then, only 4000 characters of core memory…


--
__Pascal J. Bourguignon__
http://www.informatimago.com

Bart

unread,
Mar 18, 2019, 10:59:31 AM3/18/19
to
On 18/03/2019 12:53, Andy Walker wrote:
> On 18/03/2019 01:14, Bart wrote:
>> [...]  I notice that with A68G, a call
>> to a function defined later in the program doesn't work.
>
>     Yes it does, otherwise [eg] mutually-recursive functions would
> be impossible.  For [a very simple] example:
>
>   $ a68g -p "INT p = f; PROC f = INT: 2+3; p"
>            +5
>
> Did you fall foul of some version of
>
>   $ a68g -p "INT p = f; PROC INT f := INT: 2+3; p"
>   1     (print (((INT p = f; PROC INT f := INT: 2+3; p), new line)))
>                           1
>   a68g: runtime error: 1: attempt to use an uninitialised REF PROC INT
> value

The test I did was this:

fred(10);
PROC fred=(INT a)VOID: print(("FRED",a))

which generates:

a68: syntax error: 1: clause does not yield a value.

But I see now it doesn't like that the last line doesn't yield a value;
putting a dummy value there fixes it.

(The syntax does seem rather fragile, for example you can't swap those
two lines, when it will work, without faffing around with semicolons. My
own take of Algol68 syntax requires all executable code to be inside a
function. The example would need to be written as:

proc start =
fred(10)
end proc

proc fred(int a) =
print "FRED",a
end proc

Function definitions that count as statements that have to return some
value.)


> [spot the difference]?
>
>>> Algorithms to convert expressions into RPN
>> Unless RPN is the actual target, then converting RPN to the final
>> code would constitute an extra pass.
>
>     Not in any conventional sense of "extra pass".  You don't re-parse
> the RPN, you emit code as the algorithm progresses;  all that is required
> is that the compiler maintains an operator stack onto which operators are
> pushed and from which they are popped when the time comes to emit the code
> for them.  By then, you have compiled the operands, so you know their types
> and can emit code for the right version of "+" [or whatever].

An expression (or function or whatever sub-division you like) can span
an entire module. Then the RPN (which sounds like a form of my linear,
stack-based intermediate language) could build a representation of the
whole module before it's converted to the next stage.

That would count as an extra pass (not 'parse', as no text is involved).
Whether it doesn't count as a pass if you convert portions of the RPN as
soon as is practicable is debatable.

As is whether creating an explicit data structure for the RPN instead of
being a side-effect of the recursive call-structure makes a difference.
But this is starting to split hairs.

>     Yes, but your point was that a compiler for a language that has
> user-defined operators is unlikely to be single-pass.

I was thinking that because such a language would be more sophisticated,
so would its implementation as a modern compiler. I didn't have in mind
what was going on in the 1970s or earlier.

>     But the distinction between single- and multi-pass is blurring.
> You don't nowadays need to write out intermediate representations between
> passes;  you can use threads to run phases of compilation concurrently.
> OTOH, if your system provides lexers, parsers, linkers and loaders as
> stand-alone facilities, then as a compiler-writer you need a good reason
> not to use them.  Phases such as optimisation are often optional;  but
> if you're not careful, you finish up with monstrosities where there are
> so many flags and options that invoking the compiler uses a complicated
> language [and shell script!] in its own right.  Gcc has a manual entry
> of nearly 18000 lines;  no normal person understands all that, it's a
> decently large book just to explain how to compile a C program.

Single pass (used in the sense of generating the code as you go along,
so 50% through the source, you've got roughly 50% of the output) seems
to be a way of getting ultra-fast compilers, like Tiny C.

Tiny C I believe is a single-pass C compiler, and it's roughly double
the speed of my own C compiler which is multi-pass.

Andy Walker

unread,
Mar 18, 2019, 2:56:23 PM3/18/19
to
On 18/03/2019 13:59, Pascal J. Bourguignon wrote:
> The first compiler had 63 passes.
> http://ibm-1401.info/1401-FORTRAN-Illustrated.html
> But then, only 4000 characters of core memory…

I'm not sure you can get away with that one! The 1401 dates from
1959. By then, several versions of Fortran were already available, starting
in ~1957, and they were pre-dated by a number of Autocodes. Compilers for
very small computers often had several passes, but I'd guess that 63 is some
sort of record! Seven to ten might have been more normal in ~1960.

--
Andy Walker,
Nottingham.
It is loading more messages.
0 new messages