Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Type qualifiers, declaration aliases and namespaces

48 views
Skip to first unread message

James Harris

unread,
Aug 20, 2021, 11:04:58 AM8/20/21
to
In another thread Bart posed some great questions to which I only have a
partial answer. As the answers end up getting into a separate topic or
two I'll start this new thread.

On 20/08/2021 14:19, Bart wrote:
> On 20/08/2021 12:55, James Harris wrote:
>> On 20/08/2021 11:47, Bart wrote:
>>> On 20/08/2021 08:29, James Harris wrote:


...


>> Naming integers iN was tempting but I felt that it either took away
>> too much of the namespace or, as illustrated, would be irregular and
>> fiddly.
>
>
>
> I don't use i1 i2 i4, only i8/i16/i32/i64/i128.

You do have similar, though, don't you? In an earlier reply you said "In
my case however I also have bittypes which I call u1, u2 and u4 (which
then continue as u8, u16 etc).".

If you have u2 etc then

u1 is a reserved word
u2 is a reserved word
u3 is not reserved
u4 is a reserved word
u5 is not reserved
etc

...

> Many languages now which allow size-specific types will have then as
> one of:
>
> i32
> int32
> Int32
> int32_t etc
>
> You could say that all these are irregular since int31/int33 are legal
> user identifiers, but int32 isn't (well apart from Rust).

True. AIUI in C int31_t would not be a type name but it would be using a
form of name which is reserved - and C has only conventions for names,
not enforcement.


>
> This applies to 'int' to:
>
> hnt int jnt ...
> ins int inu ...

I am not sure I understand what that is pointing out.


>
> And actually to most keywords unless the language has a
> peculiar enough syntax to allow keywords as identifiers (I
> think PL/I allowed if if=if ...)

I heard something like that before.

if if = if then then = then else else = else
if then = else then if = then else then = if

:-o


>>>> IOW the iN and uN forms are tempting but they seem to be rather
>>>> limiting.
>>>
>>> Why, what are you planning?
>>
>> If possible (and I haven't implemented it yet) I'd rather have the
>> number of bits as a qualifier which goes after the type name as
>> follows
>>
>> int 8 a
>> int 16 b
>> int 32 c
>
> This is more flexible

Yes.


> (I'd prefer some punctuation or other way of
> connecting the number with the type)

I'm surprised to hear that you would want additional punctuation. I tend
to put a lot of effort into trying to make it unnecessary! For example,
instead of C's

if (e)

I have just

if e

Isn't code easier to read without unnecessary punctuation?


> but as I said, you then have to deal with extra possibilities:
>
> * Could the number be an expression?

Yes, as long as it was resolvable at compile time. If the width were to
be specified by an expression, E, then the syntax would be

int (E) d

>
> * Could it be the name of a macro or constant that expands to a
> number?

It could be the name of a constant, yes. I don't have any plans for
macros. The constant name would have to be enclosed in parens as in the
expression form, above. As for the full syntax a constant could be
defined as

ro uint Bits = 32

where ro means "read only". Then later that constant could be used as in

int (Bits) counter


>
> * If the number is a name, then int a ... becomes ambiguous; are you
> defining an int called 'a',or is 'a' a name that expands to '32', and
> the actual variable name follows?

ATM,

int a would declare an integer a of default width
int (w) b would declare an integer b of width w
int 16 c would declare an integer c of width 16

Depending on how other decisions pan out I might end up changing the
parens to square brackets for consistency. Then the middle one of those
declarations would become

int [w] b


>
> * What to do about invalid sizes?

In the expressions above, the size would be determined at compile time
so any invalid size could be rejected.


>
> * Could such a number appear also after a user-defined type; for
> example if an alias 'T' for 'int' was created, would 'T 8 a'
> be allowed?

Good question. I hadn't thought of doing that but it might be possible.
To explain, I have been thinking to declare type names with a syntax like

typedef T1 = int 8

then

T1 g

would declare g as of type int 8.

However, as a separate matter I am also toying with the idea of allowing
short names for other namespaces such as

namespace S = ns.dns.invalid.scl.personnel

Then

S.X

would really refer to

ns.dns.invalid.scl.personnel.X

Why is that relevant? Because a typename such as int is also a name.
Therefore I would be able to define

namespace T2 = int

and subsequently do as you originally suggested by writing

T2 8 h

If T2 had been declared to be int then that would do what you asked
about, above, and declare h to be of type "int 8".

Whether a programmer would want to do that or not is another matter!



Either way, what I've not bottomed out, yet, is whether there's a need
for both typedefs and namespace definitions. They are very similar:

typedef T1 = int 8
namespace T2 = int

They may be better replaced with

alias T1 = int 8
alias T2 = int

Would that be a good idea? I don't know. It would be flexible, for sure,
but possibly confusing. A programmer would not be forced to use it but
if he did it could make subsequent code harder to parse - both for the
compiler and, more importantly, for a programmer because it would be
harder to recognise type names.

So at the moment this stuff is still on the drawing board.

...

>> uint 64
>>
>> Having said that, what do you make of uns when compared with uint?
>
> Here I agree, uint is better than uns, nat, and nneg! Uint or
> variations is also commonly used so that wouldn't be a bad choice.

OK. Given what you said before that's unexpected! But welcome. :-)


--
James Harris

David Brown

unread,
Aug 20, 2021, 2:50:07 PM8/20/21
to
On 20/08/2021 17:04, James Harris wrote:

>
>>>>> IOW the iN and uN forms are tempting but they seem to be rather
>>>>> limiting.
>>>>
>>>> Why, what are you planning?
>>>
>>> If possible (and I haven't implemented it yet) I'd rather have the
>>> number of bits as a qualifier which goes after the type name as
>>> follows
>>>
>>>    int 8 a
>>>    int 16 b
>>>    int 32 c
>>
>> This is more flexible
>
> Yes.
>
>
>> (I'd prefer some punctuation or other way of
>> connecting the number with the type)
>
> I'm surprised to hear that you would want additional punctuation. I tend
> to put a lot of effort into trying to make it unnecessary! For example,
> instead of C's
>
>   if (e)
>
> I have just
>
>   if e
>
> Isn't code easier to read without unnecessary punctuation?
>

No. /Excessive/ punctuation, or complicated symbols make code hard to
read, especially when rarely used. (So having something like ">>?" for
a "maximum" operator is a terrible idea.)

If you require your "if" statements to use brackets, then the
parenthesis are not needed : "if e { ... }". If you don't require
brackets, then put the parenthesis or other syntax (like a "then") to
make it clear. (My preference is to require the {} brackets.)


>
>> but as I said, you then have to deal with extra possibilities:
>>
>> * Could the number be an expression?
>
> Yes, as long as it was resolvable at compile time. If the width were to
> be specified by an expression, E, then the syntax would be
>
>   int (E) d
>

I'd recommend looking at C++ templates. You might not want to follow
all the details of the syntax, and you want to look at the newer and
better techniques rather than the old ones. But pick a way to give
compile-time parameters to types, and then use that - don't faff around
with special cases and limited options. Pick one good method, then you
could have something like this :

builtin::int<32> x;
using int32 = builtin::int<32>;
int32 y;

That is (IMHO) much better than your version because it will be
unambiguous, flexible, and follows a syntax that you can use for all
sorts of features.


If you want more fun, you could make types first-class objects of your
language. Then you could have a function "int" that takes a single
number as a parameter and returns a type. Then you'd have :

int(32) x;
type int32 = int(32);
int32 y;

Bart

unread,
Aug 20, 2021, 3:33:38 PM8/20/21
to
On 20/08/2021 16:04, James Harris wrote:
> In another thread Bart posed some great questions to which I only have a
> partial answer. As the answers end up getting into a separate topic or
> two I'll start this new thread.
>
> On 20/08/2021 14:19, Bart wrote:

> >
> > This applies to 'int' to:
> >
> >    hnt int jnt ...
> >    ins int inu ...
>
> I am not sure I understand what that is pointing out.
>

Sometimes an innocuous-looking reserved word may be part of a pattern
you want to use for variables. For example, I couldn't understand what
was wrong here:

ref int pi, pj, pk

I'd forgotten that 'pi' was a reserved word (you know, the constant
3.1415926...)

> > (I'd prefer some punctuation or other way of
> > connecting the number with the type)
>
> I'm surprised to hear that you would want additional punctuation.

You need /some/ punctuation, ie. symbols, otherwise source code will
just be a monotonous sequence of names and literals.

I quite like writing f(x,y,z) for example, but some languages will drop
the comma so that you have f(x y z), where you start having to think
about where an argument ends and the next begins, or even:

f x y z

(eg. Haskell). Without boundaries, this can get ambiguous:

f x g y z

Is that f(x,g(y),z) or f(x,g(y,x)) or f(x,y,g,z)?

In this example, I felt it needed something to tie the '16' to the 'int'.

My dynamic language defines some struct members like this:

string*13 barcode
string*36 description

That is, fixed-width string fields (0 to max 13/36 characters). Here, I
don't actually need that *, since 13 or 36 can't be the member name. But
it would look weirdly naked without:

string 13 barcode
string 36 description

That's more suited to a data description format with entries lined up in
3 columns.

> I tend
> to put a lot of effort into trying to make it unnecessary! For example,
> instead of C's
>
>   if (e)
>
> I have just
>
>   if e
>
> Isn't code easier to read without unnecessary punctuation?

Yes, but this reads like it does in English (or would do if 'e' had a
more meaningful name). But these don't:

int 32 ...
string 32 ...

You'd probably read them out loud with something added between keyword
and number so that it flows better. That's what the punctuation provides.

>
> > but as I said, you then have to deal with extra possibilities:
> >
> > * Could the number be an expression?
>
> Yes, as long as it was resolvable at compile time. If the width were to
> be specified by an expression, E, then the syntax would be
>
>   int (E) d

OK. This can serve as the punctuation I mentioned.


> ATM,
>
>   int a           would declare an integer a of default width
>   int (w) b       would declare an integer b of width w
>   int 16 c        would declare an integer c of width 16
>
> Depending on how other decisions pan out I might end up changing the
> parens to square brackets for consistency. Then the middle one of those
> declarations would become
>
>   int [w] b

For consistency you'd have int [16] too. Unless you're going to have a
lot of them in any program, then you might end up with int16! (That is,
just drop the space.)

>
> >
> > * What to do about invalid sizes?
>
> In the expressions above, the size would be determined at compile time
> so any invalid size could be rejected.
>

But which /are/ the invalid sizes; would int 24 be OK?

>
> >
> > * Could such a number appear also after a user-defined type; for
> > example if an alias 'T' for 'int' was created, would 'T 8 a'
> > be allowed?
>
> Good question. I hadn't thought of doing that but it might be possible.
> To explain, I have been thinking to declare type names with a syntax like
>
>   typedef T1 = int 8
>
> then
>
>   T1 g
>
> would declare g as of type int 8.
>
> However, as a separate matter I am also toying with the idea of allowing
> short names for other namespaces such as
>
>   namespace S = ns.dns.invalid.scl.personnel

> Then
>
>   S.X
>
> would really refer to
>
>   ns.dns.invalid.scl.personnel.X

(I think I can do that at the minute with macros. My macros only work
when the bodies are well-formed sub-expressions, but your example could
be written like this:

macro S = ns.dns.invalid.scl.personnel

But...

)

>
> Why is that relevant? Because a typename such as int is also a name.
> Therefore I would be able to define
>
>   namespace T2 = int
>
> and subsequently do as you originally suggested by writing
>
>   T2 8 h

( ... my macro wouldn't work here because this is not an expression. It
needs a more general macro system.)

>
> If T2 had been declared to be int then that would do what you asked
> about, above, and declare h to be of type "int 8".
>
> Whether a programmer would want to do that or not is another matter!
>
>
>
> Either way, what I've not bottomed out, yet, is whether there's a need
> for both typedefs and namespace definitions. They are very similar:
>
>   typedef T1 = int 8
>   namespace T2 = int

The right-hand-side of a namespace definition is presumably a series of
dotted names. The new name doesn't mean anything by itself until it is
expanded at each instance site.

The right-hand-side of a type definition would be a type specifier. The
new name is a Type, and can be used anywhere a type is expected.

The only point of similarity is when both type and namespace define an
alias to a simple type denoted, at the right end, by a single name
token. But typedef can also construct an arbitrary new type.

anti...@math.uni.wroc.pl

unread,
Aug 20, 2021, 4:13:37 PM8/20/21
to
James Harris <james.h...@gmail.com> wrote:
> In another thread Bart posed some great questions to which I only have a
> partial answer. As the answers end up getting into a separate topic or
> two I'll start this new thread.
>
> On 20/08/2021 14:19, Bart wrote:
> > On 20/08/2021 12:55, James Harris wrote:
> >> On 20/08/2021 11:47, Bart wrote:
> >>> On 20/08/2021 08:29, James Harris wrote:
>
>
> ...
>
>
> >> Naming integers iN was tempting but I felt that it either took away
> >> too much of the namespace or, as illustrated, would be irregular and
> >> fiddly.
> >
> >
> >
> > I don't use i1 i2 i4, only i8/i16/i32/i64/i128.
>
> You do have similar, though, don't you? In an earlier reply you said "In
> my case however I also have bittypes which I call u1, u2 and u4 (which
> then continue as u8, u16 etc).".
>
> If you have u2 etc then
>
> u1 is a reserved word
> u2 is a reserved word
> u3 is not reserved
> u4 is a reserved word
> u5 is not reserved
> etc

Most languages make difference between reserved words and predefined
identifiers. For example, in Pascal 'begin' is reserved word,
while 'integer' is merely a predefined identifer. If you have
no use of predefined 'integer' you are allowed to redefine it
and use new meaning.

> >
> > And actually to most keywords unless the language has a
> > peculiar enough syntax to allow keywords as identifiers (I
> > think PL/I allowed if if=if ...)
>
> I heard something like that before.
>
> if if = if then then = then else else = else
> if then = else then if = then else then = if
>
> :-o

PL/I put things to extreme: formally no identifier was reserved
and you you could put declarations after use. Most languages
take intermediate position: there is small number of reserved
words and you need to declare variables before use. So
re-using predefined identifiers is easy to implement and safe.

In fact, in case of PL/I one view is that _all_ non-alhanumeric
"words" are reserved. That is things like comma, parenthesis,
semicolon, etc. By reserving also some aplhanumeric words
one gets nicer and simpler syntax. But there is no need to
reserve type names.

--
Waldek Hebisch

David Brown

unread,
Aug 21, 2021, 5:32:21 AM8/21/21
to
On 20/08/2021 21:33, Bart wrote:
> On 20/08/2021 16:04, James Harris wrote:
>> In another thread Bart posed some great questions to which I only have
>> a partial answer. As the answers end up getting into a separate topic
>> or two I'll start this new thread.
>>
>> On 20/08/2021 14:19, Bart wrote:
>
>>  >
>>  > This applies to 'int' to:
>>  >
>>  >    hnt int jnt ...
>>  >    ins int inu ...
>>
>> I am not sure I understand what that is pointing out.
>>
>
> Sometimes an innocuous-looking reserved word may be part of a pattern
> you want to use for variables. For example, I couldn't understand what
> was wrong here:
>
>   ref int pi, pj, pk
>
> I'd forgotten that 'pi' was a reserved word (you know, the constant
> 3.1415926...)
>

And that is one of the reasons why a well-designed programming language
keeps the reserved words to a minimum, and one of the reasons why you
want namespaces (or modules, or packages, or whatever you want to call
them). 99.99% of programs don't need pi, so it should not be forced
upon them unless they choose to use it.

(If it makes you feel any better, the lack of namespaces and modules in
C is one of its major drawbacks for large-scale programming.)

>>  > (I'd prefer some punctuation or other way of
>>  > connecting the number with the type)
>>
>> I'm surprised to hear that you would want additional punctuation.
>
> You need /some/ punctuation, ie. symbols, otherwise source code will
> just be a monotonous sequence of names and literals.

Agreed.

>
> I quite like writing f(x,y,z)

You really should learn to use the space key. "f(x, y, z)" is vastly
easier to read.

> for example, but some languages will drop
> the comma so that you have f(x y z), where you start having to think
> about where an argument ends and the next begins, or even:
>
>   f x y z
>

In Haskell, which is a functional programming language, "f" is not a
function that takes three parameters. It is a function that takes one
parameter, and returns a function that takes one parameter and returns a
function that takes one parameter and returns a number (if that's the
final type, which is not visible in this case).

So it means (((f x) y) z).

> (eg. Haskell). Without boundaries, this can get ambiguous:
>
>   f x g y z
>
> Is that f(x,g(y),z) or f(x,g(y,x)) or f(x,y,g,z)?

(f x) ((g y) z)

Functional programming works in a rather different way from imperative
programming, and trying to interpret it as imperative programming will
only cause you confusion. You need to learn the paradigm before you can
understand the code. The same applies to other kinds of programming,
like Forth's RPN notation (which also has much less need of punctuation).

I agree that punctuation is useful in imperative programming, but that
doesn't mean it is needed for every kind of programming.

James Harris

unread,
Aug 21, 2021, 5:32:35 AM8/21/21
to
On 20/08/2021 19:50, David Brown wrote:
> On 20/08/2021 17:04, James Harris wrote:

...

>>>>     int 8 a
>>>>     int 16 b
>>>>     int 32 c

...

>>> (I'd prefer some punctuation or other way of
>>> connecting the number with the type)
>>
>> I'm surprised to hear that you would want additional punctuation. I tend
>> to put a lot of effort into trying to make it unnecessary! For example,
>> instead of C's
>>
>>   if (e)
>>
>> I have just
>>
>>   if e
>>
>> Isn't code easier to read without unnecessary punctuation?
>>
>
> No. /Excessive/ punctuation, or complicated symbols make code hard to
> read, especially when rarely used. (So having something like ">>?" for
> a "maximum" operator is a terrible idea.)

Hm, I speak about "unnecessary" punctuation. You disagree and say the
problem is "excessive" punctuation. What's the difference between
unnecessary and excessive??

...

>>> * Could the number be an expression?
>>
>> Yes, as long as it was resolvable at compile time. If the width were to
>> be specified by an expression, E, then the syntax would be
>>
>>   int (E) d
>>
>
> I'd recommend looking at C++ templates. You might not want to follow
> all the details of the syntax, and you want to look at the newer and
> better techniques rather than the old ones. But pick a way to give
> compile-time parameters to types, and then use that - don't faff around
> with special cases and limited options. Pick one good method, then you
> could have something like this :
>
> builtin::int<32> x;
> using int32 = builtin::int<32>;
> int32 y;
>
> That is (IMHO) much better than your version because it will be
> unambiguous, flexible, and follows a syntax that you can use for all
> sorts of features.

My version of that would be

typedef i32 = int 32

int 32 x
i32 y

Your C++ version doesn't seem to be any more precise or flexible. And my
version is shorter, clearer and (at least once you are used to the
syntax) easier to read. In fact, mine is so much more readable that it
shows how weird it looks to use both int 32 and i32 - something that IMO
the C++ version obscures by lots of unnecessary (your term) waffle text!
So I am not sure what your criticism is.

I do agree, however, that I need to look at templates. Are C++
templates, as set out in

https://www.cplusplus.com/doc/oldtutorial/templates/

essentially just about parametrising functions and classes where the
parameters are types and other classes?

Or are they more flexible?

I ask that because I wonder if something based on macros (where the
parameters could be of any form, not just types and classes) could be as
useful but more adaptable to different situations. After all, the
creation of real functions from templated functions is rather like the
instantiation of macros, isn't it?

>
>
> If you want more fun, you could make types first-class objects of your
> language. Then you could have a function "int" that takes a single
> number as a parameter and returns a type. Then you'd have :
>
> int(32) x;
> type int32 = int(32);
> int32 y;
>

Are you talking there about a dynamic language where int is called at
run time?


--
James Harris

David Brown

unread,
Aug 21, 2021, 5:41:01 AM8/21/21
to
On 20/08/2021 22:13, anti...@math.uni.wroc.pl wrote:

>
> PL/I put things to extreme: formally no identifier was reserved
> and you you could put declarations after use. Most languages
> take intermediate position: there is small number of reserved
> words and you need to declare variables before use. So
> re-using predefined identifiers is easy to implement and safe.
>

Forth is the most flexible language I know of in this sense:


$ gforth
Gforth 0.7.3, Copyright (C) 1995-2008 Free Software Foundation, Inc.
Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
Type `bye' to exit
2 2 + . 4 ok
: 2 3 ; ok
2 2 + . 6 ok


The result of "2 2 +" is 4, then I redefine "2" to mean "3", and now the
result of "2 2 +" is 6.

And in Metafont (or Metapost), an identifier like "u8" would mean the
eighth entry in the array "u". In TeX, "u8" could be the macro "u" with
the parameter 8, as digits cannot be part of identifiers. (Of course,
TeX lets you redefine the character class of the digits to make them
letters...)

James Harris

unread,
Aug 21, 2021, 10:58:09 AM8/21/21
to
On 20/08/2021 20:33, Bart wrote:
> On 20/08/2021 16:04, James Harris wrote:
>> Bart wrote:


...


>>  > (I'd prefer some punctuation or other way of
>>  > connecting the number with the type)
>>
>> I'm surprised to hear that you would want additional punctuation.
>
> You need /some/ punctuation, ie. symbols, otherwise source code will
> just be a monotonous sequence of names and literals.

At least you wouldn't need your hated shift key.

;-)

>
> I quite like writing f(x,y,z) for example, but some languages will drop
> the comma so that you have f(x y z), where you start having to think
> about where an argument ends and the next begins, or even:
>
>   f x y z
>
> (eg. Haskell). Without boundaries, this can get ambiguous:
>
>   f x g y z
>
> Is that f(x,g(y),z) or f(x,g(y,x)) or f(x,y,g,z)?

I remember a Basic where one could type

r = sin x + cos y

It was certainly easily readable.


>
> In this example, I felt it needed something to tie the '16' to the 'int'.

OK.

...

>> I have just
>>
>>    if e
>>
>> Isn't code easier to read without unnecessary punctuation?
>
> Yes, but this reads like it does in English (or would do if 'e' had a
> more meaningful name). But these don't:
>
>    int 32 ...
>    string 32 ...
>
> You'd probably read them out loud with something added between keyword
> and number so that it flows better. That's what the punctuation provides.

YM varies, clearly. When you read your own i32 I expect you read it as

"i thirty-two"

Isn't

"int thirty-two"

sufficiently similar to read aloud?

...

>>    int (E) d
>
> OK. This can serve as the punctuation I mentioned.

...

>>    int [w] b
>
> For consistency you'd have int [16] too.

Well, int [16] would be allowed as what's in the brackets would be a
compile-time expression. But the brackets would be unnecessary.

Could it be a familiarity thing? New C programmers sometimes write

return (x);

because it looks right to them. But after a while they get used to

return x;

Could it be that these things just take time to get used to?

Besides, you likely remember that my declarations are meant to include
/ranges/ of widths (where the width has to be within a certain range).
For example,

int 16..32 b

would mean that b had to be between 16 and 32 bits (inclusive) wide. If
either of those bounds were an expression then it (that bound's
calculation) would need to be bracketed.

>
> Unless you're going to have a
> lot of them in any program, then you might end up with int16! (That is,
> just drop the space.)

:-)

I could do

typedef i16 = int 16

but I am not sure that would be much of a saving. IMO a typedef would be
better reserved for logical type names rather than be used to make
shortcuts for types which are already short.

>
>>
>>  >
>>  > * What to do about invalid sizes?
>>
>> In the expressions above, the size would be determined at compile time
>> so any invalid size could be rejected.
>>
>
> But which /are/ the invalid sizes; would int 24 be OK?

Yes, int 24 would be OK. Negative numbers would be invalid. Probably
zero, too. int 1 would be valid, though unusual.

...

>> Either way, what I've not bottomed out, yet, is whether there's a need
>> for both typedefs and namespace definitions. They are very similar:
>>
>>    typedef T1 = int 8
>>    namespace T2 = int
>
> The right-hand-side of a namespace definition is presumably a series of
> dotted names.

The RHS of a namespace definition would be required only to be an extant
name. It could be dotted or not. For example, if ns means "name system"
and ns.dns refers to the DNS name system then you could have

namespace com = ns.dns.com
namespace gweb = com.google.www

where the definition of gweb uses the "com" defined on the preceding
line. Yes, the above both have dots but you could go on to write

namespace webroot = gweb

making webroot an alias for the gweb name previously defined. So the RHS
would not have to have dots.


> The new name doesn't mean anything by itself until it is
> expanded at each instance site.

I am not sure about it being 'expanded' if you mean as one might expand
a macro. I see it more as an alias.

>
> The right-hand-side of a type definition would be a type specifier. The
> new name is a Type, and can be used anywhere a type is expected.

Yes.

>
> The only point of similarity is when both type and namespace define an
> alias to a simple type denoted, at the right end, by a single name
> token. But typedef can also construct an arbitrary new type.
>

Well, as with C, typedef really just creates a new name for an existing
type but you make a good point that a typedef has to create a type and
could not name a partial type ... so typedef and namedef (sic) should
probably be kept separate even though their forms are almost identical.


--
James Harris

Bart

unread,
Aug 21, 2021, 12:31:10 PM8/21/21
to
On 21/08/2021 15:58, James Harris wrote:
> On 20/08/2021 20:33, Bart wrote:
>> On 20/08/2021 16:04, James Harris wrote:
> >> Bart wrote:
>
>
> ...
>
>
>>>  > (I'd prefer some punctuation or other way of
>>>  > connecting the number with the type)
>>>
>>> I'm surprised to hear that you would want additional punctuation.
>>
>> You need /some/ punctuation, ie. symbols, otherwise source code will
>> just be a monotonous sequence of names and literals.
>
> At least you wouldn't need your hated shift key.
>
> ;-)

It's not so bad in between tokens, maybe I just don't like interrupting
the typing of a single alphanumeric token.

However I do have considerable problems with typing accurately, so I
still hate the unneeded punctuation you have in C, especially with
simple prints:

printf("A=%d B=%f\n",a,b);

7 shifted symbols, versus none in my equivalent code: println =a, =b




>> I quite like writing f(x,y,z) for example, but some languages will
>> drop the comma so that you have f(x y z), where you start having to
>> think about where an argument ends and the next begins, or even:
>>
>>    f x y z
>>
>> (eg. Haskell). Without boundaries, this can get ambiguous:
>>
>>    f x g y z
>>
>> Is that f(x,g(y),z) or f(x,g(y,x)) or f(x,y,g,z)?
>
> I remember a Basic where one could type
>
>   r = sin x + cos y
>
> It was certainly easily readable.

My syntax also allows 'sin x + cos y', but only because sin and cos are
operators. However I tend to add the parentheses because I think it
looks better, with less reliance on white space. Actually I also write
max(A,B) instead of A max B for that reason.

Operators can otherwise be used with no parentheses as @A, A@B or A@
depending on unary/binary and whether prefix, infix or postfix.

So I will accept the annpyanceof some punctuation when there is a
benefit: clearer code, or code that is going to persist for longer than
the 2-minute half-life of a debug print.

>> You'd probably read them out loud with something added between keyword
>> and number so that it flows better. That's what the punctuation provides.
>
> YM varies, clearly. When you read your own i32 I expect you read it as
>
>   "i thirty-two"
>
> Isn't
>
>   "int thirty-two"
>
> sufficiently similar to read aloud?

Well, if I had to transcribe how I'd imagine I'd say those out loud, it
might be as "I-32" or "int-thirty-two"; that is, with the hyphen. (But
nothing extra added as I'd thought.)

After all, we write (or at least I do), "64 bits" or "64-bit", even
though in speech the gap between each part of near identical.

The latter would be more of an adjective, but whether a type-specifier
is classed an adjective is another question.

In the US, they have "Interstate 15", without punctuation, which is also
written compactly as "I-15", suggesting some connection is necessary
otherwise an orphaned 'I' by itself is ambiguous.

Anyway, there's no overwhelming evidence either way. To me it just feels
better if 'int' and '32' had a stronger connection than between '32' and
what follows.

> Besides, you likely remember that my declarations are meant to include
> /ranges/ of widths (where the width has to be within a certain range).
> For example,
>
>   int 16..32 b
>
> would mean that b had to be between 16 and 32 bits (inclusive) wide.

Ok, to this seems more like a range of values (as used in Pascal and
Ada) than a range of bits. Didn't you previously have a range like this
to denote values?

>> But which /are/ the invalid sizes; would int 24 be OK?
>
> Yes, int 24 would be OK.

Just reading that makes me think of all the extra work that's going to
be involved! Doing a simple assignment:

A := B

normally means two instructions on x64: load to register, store to
register, when A and B are 1, 2, 4 or 8 bytes.

When they are 3 bytes, then it would likely need 4 instructions or
possibly six if concerned about alignment, or you could get away with 3
if you can over-read the value of B (read 1 byte beyond B).

Now think about packed arrays of 24 bits.

Of course, I'm assuming the target hardware doesn't have 24-bit integers
as native types.


> Negative numbers would be invalid. Probably
> zero, too. int 1 would be valid, though unusual.

Unsigned 1-bit is fine (also called Bool). Signed 1-bit would be
unusual! It would have values of -1 and 0 I think.

>> The right-hand-side of a namespace definition is presumably a series
>> of dotted names.
>
> The RHS of a namespace definition would be required only to be an extant
> name. It could be dotted or not. For example, if ns means "name system"
> and ns.dns refers to the DNS name system then you could have
>
>   namespace com = ns.dns.com

That looks a bit dodgy. So 'com' can appear on both sides?

>   namespace gweb = com.google.www
>
> where the definition of gweb uses the "com" defined on the preceding
> line. Yes, the above both have dots but you could go on to write
>
>   namespace webroot = gweb
>
> making webroot an alias for the gweb name previously defined. So the RHS
> would not have to have dots.
>
>
>> The new name doesn't mean anything by itself until it is expanded at
>> each instance site.
>
> I am not sure about it being 'expanded' if you mean as one might expand
> a macro. I see it more as an alias.

Well, somewhere there needs to be a way for the compiler to trace the
path represented by 'webroot'. But sure, you probably don't need to
expand it at each instance into some sequence of AST nodes the implement
".". There it would differ from an implementation based on macros.

David Brown

unread,
Aug 21, 2021, 2:11:04 PM8/21/21
to
On 21/08/2021 11:32, James Harris wrote:
> On 20/08/2021 19:50, David Brown wrote:
>> On 20/08/2021 17:04, James Harris wrote:
>
> ...
>
>>>>>      int 8 a
>>>>>      int 16 b
>>>>>      int 32 c
>
> ...
>
>>>> (I'd prefer some punctuation or other way of
>>>> connecting the number with the type)
>>>
>>> I'm surprised to hear that you would want additional punctuation. I tend
>>> to put a lot of effort into trying to make it unnecessary! For example,
>>> instead of C's
>>>
>>>    if (e)
>>>
>>> I have just
>>>
>>>    if e
>>>
>>> Isn't code easier to read without unnecessary punctuation?
>>>
>>
>> No.  /Excessive/ punctuation, or complicated symbols make code hard to
>> read, especially when rarely used.  (So having something like ">>?" for
>> a "maximum" operator is a terrible idea.)
>
> Hm, I speak about "unnecessary" punctuation. You disagree and say the
> problem is "excessive" punctuation. What's the difference between
> unnecessary and excessive??
>

None of the punctuation in that paragraph was necessary - the meaning
would have been clear and unambiguous without any periods, apostrophes,
or quotation marks. Yet only the final double question mark was
excessive. It's a matter of degree. Too little punctuation makes the
language harder to read and write, and offers more scope for ambiguity.
Too much makes it hard to read and write, and makes it difficult to
learn. Somewhere in the middle there is a happy medium - going too far
one way (limiting punctuation to the minimum necessary) is as bad as
going too far the other way (excessive punctuation that detracts from
the flow of the code).

> ...
>
>>>> * Could the number be an expression?
>>>
>>> Yes, as long as it was resolvable at compile time. If the width were to
>>> be specified by an expression, E, then the syntax would be
>>>
>>>    int (E) d
>>>
>>
>> I'd recommend looking at C++ templates.  You might not want to follow
>> all the details of the syntax, and you want to look at the newer and
>> better techniques rather than the old ones.  But pick a way to give
>> compile-time parameters to types, and then use that - don't faff around
>> with special cases and limited options.  Pick one good method, then you
>> could have something like this :
>>
>>     builtin::int<32> x;
>>     using int32 = builtin::int<32>;
>>     int32 y;
>>
>> That is (IMHO) much better than your version because it will be
>> unambiguous, flexible, and follows a syntax that you can use for all
>> sorts of features.
>
> My version of that would be
>
>   typedef i32 = int 32
>
>   int 32 x
>   i32 y
>

Punctuation here is not /necessary/, but it would make the code far
easier to read, and far safer (in that mistakes are more likely to be
seen by the compiler rather than being valid code with unintended meaning).

> Your C++ version doesn't seem to be any more precise or flexible.

What happens when you have a type that should have two parameters - size
and alignment, for example? Or additional non-integer parameters such
as signedness or overflow behaviour? Or for container types with other
types as parameters? C++ has that all covered in a clear and accurate
manner - your system does not.

My intention here is to encourage you to think bigger. Stop thinking
"how do I make integer types?" - think wider and with greater generality
and ambition. Make a good general, flexible system of types, and then
let your integer types fall naturally out of that.

> And my
> version is shorter, clearer and (at least once you are used to the
> syntax) easier to read.

"Shorter" is /not/ an advantage, any more than "longer" is an advantage.

> In fact, mine is so much more readable that it
> shows how weird it looks to use both int 32 and i32 - something that IMO
> the C++ version obscures by lots of unnecessary (your term) waffle text!
> So I am not sure what your criticism is.

It doesn't really matter if you decide that "int32", "int32_t", "i32",
or anything else is going to be the normal way to declare a 32-bit
integer. You have to figure out what you think makes sense and reads
well in your language. But the syntax I suggested for defining the type
is not "waffle" - it is intention. You don't want this sort of thing to
be short - you want it to be consistent and logical, unambiguous in
syntax, and not conflict with identifiers the programmer might want.

>
> I do agree, however, that I need to look at templates. Are C++
> templates, as set out in
>
>   https://www.cplusplus.com/doc/oldtutorial/templates/
>
> essentially just about parametrising functions and classes where the
> parameters are types and other classes?
>
> Or are they more flexible?

A limited tutorial on a 20+ year old version of the language is not
going to be the best reference. This is a /much/ better site for C++
(and C) information, and works closely with the language standardisation
groups.

<https://en.cppreference.com/w/cpp/language/templates>

It's not a tutorial site, however. It aims to be accurate to the
standards but gives a more reader-friendly format than the standards,
and is excellent at noting the differences between different standards
versions.

Originally, templates were just about functions and classes parametrised
by types. They let you make a "max" function that would work for any
type with a " > " operator, or a list container class that could work
for any type. But they moved on from that. They also include template
aliases, variables, and concepts (which are a way of naming
characteristics of types - a sort of "type of type", except they use
duck-typing instead of structural typing). As well as types, template
parameters can be integers, enumerators, and now pretty much any
"literal" class. For a while, C++ templates were used for compile-time
calculations in C++, but that was an awkward process - the syntax was
seriously ugly and they were limited and inefficient. (Now you use
proper compile-time functions.)

Make sure you look at C++20 for inspiration, not ancient C++98. Look at
concepts - they greatly simplify templates and generic programming.
(They are not the only way to do it - remember that a lot of the way
things are done in an old, evolved language like C++ come from adding
features while retaining backwards compatibility - for a new language,
you don't need to do that, and can jump straight to better designs. You
are looking for inspiration and ideas to copy, not copying all the
weaker parts of older languages.)

Perhaps even look at the metaclasses proposal
<https://www.fluentcpp.com/2018/03/09/c-metaclasses-proposal-less-5-minutes/>.
This will not be in C++ before C++26, maybe even later, but it gives a
whole new way of building code. If metaclasses had been part of C++
from the beginning, there would be no struct, class, enum, or union in
the language - these would have been standard library metaclasses. They
are /that/ flexible.

>
> I ask that because I wonder if something based on macros (where the
> parameters could be of any form, not just types and classes) could be as
> useful but more adaptable to different situations. After all, the
> creation of real functions from templated functions is rather like the
> instantiation of macros, isn't it?
>

To some extent, yes - but it is done in a clearer, cleaner and more
systematic manner.

Pure textual macros, like C's, have lots of limitations (no recursion is
a critical limitation) - as well as being too chaotic because there are
few rules.

But there are other languages with other kinds of macros, with different
possibilities. There are some languages were features like loop
structures are not keywords or fundamental language statements, but just
macros from the standard library.

Ultimately, things like macros, templates, generics, metafunctions,
etc., are just names for high-level compile-time coding constructs.

>>
>>
>> If you want more fun, you could make types first-class objects of your
>> language.  Then you could have a function "int" that takes a single
>> number as a parameter and returns a type.  Then you'd have :
>>
>>     int(32) x;
>>     type int32 = int(32);
>>     int32 y;
>>
>
> Are you talking there about a dynamic language where int is called at
> run time?
>

No - "int" would be a compile-time function here.

James Harris

unread,
Aug 21, 2021, 2:11:32 PM8/21/21
to
On 21/08/2021 17:31, Bart wrote:
> On 21/08/2021 15:58, James Harris wrote:
>> On 20/08/2021 20:33, Bart wrote:
>>> On 20/08/2021 16:04, James Harris wrote:
>>  >> Bart wrote:

...

>> At least you wouldn't need your hated shift key.
>>
>> ;-)
>
> It's not so bad in between tokens, maybe I just don't like interrupting
> the typing of a single alphanumeric token.

Understood.

From memory web search engines used to (and maybe still do) regard
underscore as being part of a word and hyphen as separating words. So

this-rather_odd-name

would be three 'words'.

>
> However I do have considerable problems with typing accurately, so I
> still hate the unneeded punctuation you have in C, especially with
> simple prints:
>
>   printf("A=%d B=%f\n",a,b);
>
> 7 shifted symbols, versus none in my equivalent code: println =a, =b
>

That's curious given what we have been discussing. You appear to have a
function with two parameters without parens!

I have not yet decided on output mechanisms but since there's some code
to compare I'll have a go. One option is

cout.putrec(a, b)

Another is

cout.putf("a=%M, b=%M\n/", a.string(), b.string())

Another is

debug.vardump(a, b)

which I guess is nearer the intention of your =a form.

...


>> I remember a Basic where one could type
>>
>>    r = sin x + cos y
>>
>> It was certainly easily readable.
>
> My syntax also allows 'sin x + cos y', but only because sin and cos are
> operators. However I tend to add the parentheses because I think it
> looks better, with less reliance on white space. Actually I also write
> max(A,B) instead of A max B for that reason.

OK.

What about the viewpoint that a function call /always/ has the form

f X

where X is /always/ a single argument, and that the single argument
needs to be wrapped in parens if it is a composite of something other
than one element?

I quite like the theory of that given that

f(x)
f (x)

should both mean the same AND that parens are traditionally used for
grouping without changing the meaning.

IOW (x) should mean the same as x.

...


> After all, we write (or at least I do), "64 bits" or "64-bit", even
> though in speech the gap between each part of near identical.

FWIW, I think Verilog allows N' as meaning N-bit. So one could have

7' 15

meaning 7-bit 15. For hardware programming specific-width values are
fairly common.

...

>> Besides, you likely remember that my declarations are meant to include
>> /ranges/ of widths (where the width has to be within a certain range).
>> For example,
>>
>>    int 16..32 b
>>
>> would mean that b had to be between 16 and 32 bits (inclusive) wide.
>
> Ok, to this seems more like a range of values (as used in Pascal and
> Ada) than a range of bits. Didn't you previously have a range like this
> to denote values?

Not me. I have considered using

int range 0..9

Maybe that's what you are thinking of. But in that, "range" would be an
essential keyword (to indicate that the integer should be restricted to
values in the specified range).

>
>>> But which /are/ the invalid sizes; would int 24 be OK?
>>
>> Yes, int 24 would be OK.
>
> Just reading that makes me think of all the extra work that's going to
> be involved!

... (examples snipped)

I know. I don't pretend it would be easy but IMO it's important.

>
> Of course, I'm assuming the target hardware doesn't have 24-bit integers
> as native types.

The idea is that the computations would be the same irrespective of the
word size of the machine. So normal 16-bit ops would be a challenge on
such a machine. :-(

>
>
>> Negative numbers would be invalid. Probably zero, too. int 1 would be
>> valid, though unusual.
>
> Unsigned 1-bit is fine (also called Bool). Signed 1-bit would be
> unusual! It would have values of -1 and 0 I think.

Indeed. (That'd be worse to work with than int 24...!)

...

>>    namespace com = ns.dns.com
>
> That looks a bit dodgy. So 'com' can appear on both sides?

It isn't intended to work in the way you have in mind. Rather, imagine
that there's a 'current namespace' so that when you type

int b
int c

then b and c will be placed in that namespace. All normal stuff. Now add

namespace d = <something>

The current namespace will then have b, c and d.

So in the example, after

namedef com = ns.dns.com

there will be a name, com, in the current namespace, and the program
should be able to refer to com just as easily as it refers to b or c in
the example above.

Does that make more sense?

That said, there may be some problems with the idea that I haven't yet
seen. I certainly have a lot of details to work out in respect of where
name resolution will look if a name is not in the current namespace, and
how to allow the programmer to control that mechanism.


--
James Harris

anti...@math.uni.wroc.pl

unread,
Aug 21, 2021, 4:27:47 PM8/21/21
to
David Brown <david...@hesbynett.no> wrote:
> On 20/08/2021 22:13, anti...@math.uni.wroc.pl wrote:
>
> >
> > PL/I put things to extreme: formally no identifier was reserved
> > and you you could put declarations after use. Most languages
> > take intermediate position: there is small number of reserved
> > words and you need to declare variables before use. So
> > re-using predefined identifiers is easy to implement and safe.
> >
>
> Forth is the most flexible language I know of in this sense:
>
>
> $ gforth
> Gforth 0.7.3, Copyright (C) 1995-2008 Free Software Foundation, Inc.
> Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
> Type `bye' to exit
> 2 2 + . 4 ok
> : 2 3 ; ok
> 2 2 + . 6 ok
>
>
> The result of "2 2 +" is 4, then I redefine "2" to mean "3", and now the
> result of "2 2 +" is 6.

Forth is weird because it treats integers as identifiers. However,
there are several languages in "most flexible" camp: each supports
user-provided scanner, so after appropriate "prelude" you can
put completely different programmibng language. AFAIK Forth allows
this. But also Lisp and few other languages.

IIUC this is much more general than what James wants...

--
Waldek Hebisch

Bart

unread,
Aug 21, 2021, 8:14:12 PM8/21/21
to
On 21/08/2021 19:11, James Harris wrote:
> On 21/08/2021 17:31, Bart wrote:
>> On 21/08/2021 15:58, James Harris wrote:

>> 7 shifted symbols, versus none in my equivalent code: println =a, =b
>>
>
> That's curious given what we have been discussing. You appear to have a
> function with two parameters without parens!

Well, that's because it's classed as a statement, which have dedicated
syntax.

print is also different from a function as it is given an arbitrarily
long list of operands, none of more significance than the other, and it
will consume all of them.

Nested prints like this:

print a, b, print c, d, e

would be parsed in a certain way (print a, b, (print c, d, e), but will
not compile since 'print' does not return a value that can be printed.

Nested is possible as:

print a, b, (print c, d; e), f

But in typical use, a print statement will consume all its operands, and
will never have nested print statements in a form that will cause issues.

>
> I have not yet decided on output mechanisms but since there's some code
> to compare I'll have a go. One option is
>
>   cout.putrec(a, b)
>
> Another is
>
>   cout.putf("a=%M, b=%M\n/", a.string(), b.string())

Your example here uses formatted print, which I'd write as:

fprintln "a=#, b=#", a, b

Having this stuff as statements means you don't need to deal with
challenging features like:

* Variadic /numbers/ of arguments to a function

* Variadic /types/ of arguments ...

* ... which you've circumvented with an explicit to-string routine,
but now you need overloaded versions for any types, plus you
need to manage the string memory used

> which I guess is nearer the intention of your =a form.

Using '=' requires being able to turn any expression back into a string.
(Which I don't do perfectly, and the form may not match what was in the
source code, so that 'max(a,b)' may come out as 'a max b'.)


> What about the viewpoint that a function call /always/ has the form
>
>   f X
>
> where X is /always/ a single argument, and that the single argument
> needs to be wrapped in parens if it is a composite of something other
> than one element?

That's for other languages (everyone is mad now about functional
programming with its currying and lambdas).

This anyway causes problems with my current syntax where consecutive
identifiers do not normally occur. If they do, then the first is assumed
to be the name of a user-type.

My view is that if you want a language to look like a command language
that you'd write a line at a time like a shell, or a language mainly
used interactively via a REPL, then you can make them more friendly and
more informal by doing away with parentheses around command arguments.

After all you don't need to write DEL (file.c).

But if you are constructing a whole source file before submitting it to
a compiler or interpreter, then it can do with being a bit more formal.

However you've seen examples of my syntax; it's not particularly
cluttery is it, or bristling with punctuation.



>
>>>    namespace com = ns.dns.com
>>
>> That looks a bit dodgy. So 'com' can appear on both sides?
>
> It isn't intended to work in the way you have in mind. Rather, imagine
> that there's a 'current namespace' so that when you type
>
>   int b
>   int c
>
> then b and c will be placed in that namespace. All normal stuff. Now add
>
>   namespace d = <something>
>
> The current namespace will then have b, c and d.
>
> So in the example, after
>
>   namedef com = ns.dns.com
>
> there will be a name, com, in the current namespace, and the program

Yes, I hadn't spotted that they're not the same because you're defining
a top level 'com' name which will not clash with the other 'com', as it
only appears after a "." so is not visible.

James Harris

unread,
Aug 22, 2021, 3:03:09 PM8/22/21
to
On 22/08/2021 01:14, Bart wrote:
> On 21/08/2021 19:11, James Harris wrote:
>> On 21/08/2021 17:31, Bart wrote:
>>> On 21/08/2021 15:58, James Harris wrote:
>
>>> 7 shifted symbols, versus none in my equivalent code: println =a, =b

...

The whole discussion of print /statements/ and command forms is too big
an issue to go into here. Something to come back to. :-)



>> What about the viewpoint that a function call /always/ has the form
>>
>>    f X
>>
>> where X is /always/ a single argument, and that the single argument
>> needs to be wrapped in parens if it is a composite of something other
>> than one element?
>
> That's for other languages (everyone is mad now about functional
> programming with its currying and lambdas).

It's nothing to do with functional programming. And probably little to
do with command syntax. It's meant to be a plain call. It could be to a
function (returning a result) or a procedure (returning nothing so not a
function).

Consider these two expressions

a
(a)

In the kind of syntax we are familiar with the parentheses do not alter
the meaning so both of those expressions would mean the same thing. So
if we put an identifier before them why should the meaning change? As in

f a
f (a)

Would it be more syntactically consistent (with the examples, above) if
they meant the same as each other?

You mentioned before about how with command syntax one can have the need
to walk through the following arguments, which could include
interpreting their different types and sizes. Consider

a + f(b)

The call would have one argument, (b). With

a + f(b, c, d)

could one also say that the call has one argument (b, c, d), and parse
it as such even if b, c and d were not positional but were a sequence
and had different types and sizes?

...

> However you've seen examples of my syntax; it's not particularly
> cluttery is it, or bristling with punctuation.

No, I like your syntax. It seems to me to be clean and user friendly.

That said, I think you have a number of special cases which help it
remain so but can be confusing to a newbie. Your =a is a case in point.

...

>> So in the example, after
>>
>>    namedef com = ns.dns.com
>>
>> there will be a name, com, in the current namespace, and the program
>
> Yes, I hadn't spotted that they're not the same because you're defining
> a top level 'com' name which will not clash with the other 'com', as it
> only appears after a "." so is not visible.
>

Yes. To be clear, to apply

namedef com = ns.dns.com

the only name which would need to be visible is ns. Assuming it could
resolve the RHS of the assignment the namedef would make the plain name
com visible as a short form of the longer name.


--
James Harris

Bart

unread,
Aug 22, 2021, 3:54:25 PM8/22/21
to
(X) doesn't always mean the same thing (I'll elaborate on X below).

In most cases it just means the expression X. But following a term, it
is the arguments for a function call (or cast or operator when they use
function-like syntax):

(expr)
A + (expr)
A(args)
A(args)(args)
(expr)(args)

X can be any of:

Expr:

() Empty list
(x) Simple term
(x,) 1-element list
(x,y,...) List of N terms

Args:

() No args (mandatory)
(x) 1 arg
(x,y,...) N args

As mentioned, some other constructs use the function-like syntax:

max(x,y) Same as (x max y)
clamp(x,y,z) Only uses this syntax
int(x) Cast
date(25,12,20) Record constructor, when 'date' is a type.

So, in your a + f(b,c,d) example, that is 3 arguments not one. For a
single argument, I'd need to write:

a + f((b, c, d))


Bart

unread,
Aug 22, 2021, 8:11:04 PM8/22/21
to
On 21/08/2021 10:40, David Brown wrote:
> On 20/08/2021 22:13, anti...@math.uni.wroc.pl wrote:
>
>>
>> PL/I put things to extreme: formally no identifier was reserved
>> and you you could put declarations after use. Most languages
>> take intermediate position: there is small number of reserved
>> words and you need to declare variables before use. So
>> re-using predefined identifiers is easy to implement and safe.
>>
>
> Forth is the most flexible language I know of in this sense:
>
>
> $ gforth
> Gforth 0.7.3, Copyright (C) 1995-2008 Free Software Foundation, Inc.
> Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license'
> Type `bye' to exit
> 2 2 + . 4 ok
> : 2 3 ; ok
> 2 2 + . 6 ok
>
>
> The result of "2 2 +" is 4, then I redefine "2" to mean "3", and now the
> result of "2 2 +" is 6.

-------------------------------
C:\qapps>qq forth
Bart-Forth
Type bye, quit or exit to stop

> 2 2 + .
4
> : 2 3 ;
> 2 2 + .
6
>
-------------------------------

It didn't have a REPL 20 minutes ago; it does now.

I wrote this last year, then when I went to download some test programs
(since I find it impossible to code in myself), I found that each used a
different, incompatible dialect.


(This Forth written in my 'Q' scripting language. 'Q' implemented in my
'M' systems language. 'M' implemented in my 'M' systems language.)

James Harris

unread,
Aug 23, 2021, 3:40:19 PM8/23/21
to
On 22/08/2021 20:54, Bart wrote:
> On 22/08/2021 20:03, James Harris wrote:

...

>> Consider
>>
>>    a + f(b)
>>
>> The call would have one argument, (b). With
>>
>>    a + f(b, c, d)
>>
>> could one also say that the call has one argument (b, c, d), and parse
>> it as such even if b, c and d were not positional but were a sequence
>> and had different types and sizes?

...

> Expr:
>
>   ()         Empty list
>   (x)        Simple term
>   (x,)       1-element list
>   (x,y,...)  List of N terms

...

> So, in your a + f(b,c,d) example, that is 3 arguments not one. For a
> single argument, I'd need to write:
>
>   a + f((b, c, d))

Speaking of lists, what if a certain language treated every function
call as

F L

where L was a list?

The list could be in one of the forms you show, above, with zero or more
arguments. Any mandatory arguments would come first, followed by any
optional arguments. Is this what you were calling 'command syntax'
except with delimiters rather than whitespace between arguments?

I think of command syntax as more Lisp-like as in

(f b c d)

I was thinking here, however, that such a call would be more
conventionally embeddable in expressions as in

a + f(b,c,d) + e

Further, if b were to be mandatory and c and d were optional then f
would begin with b already assigned to whatever local name was the
formal parameter and would then iterate over - or even, effectively,
/parse/ - the subsequent sequence of arguments.

Parsing a series of arguments in that way would be akin to other
conventional processing patterns where a program is responding to a
series of varying inputs, including processing arguments on a command
line and the classic model of successively taking the next piece of work
from an event queue.

My language is currently much more conventional so perhaps this is just
a thought experiment. I suppose what I am thinking of is being able to
invoke a function with a form of arguments which is more general and
flexible than those we normally use.

One could, for example, create the list (in an arbitrary number of
steps) before making the call such as

Args = (b, c, d)
a + f *Args + e

Or the tail of the list could be generated dynamically by another
process so it would only end when that other process signalled that the
list was complete.

Etc.

There seem to be lots of ways of producing a list as a 'work stream' and
that's what would be fed in as the 'parameter list' to the callee.

Maybe you've already tried something like this as it may be better
suited to a dynamic language than a static one.

An early parser of mine built a tree where each node was a list of the form

(nodetype, param ...)

Each list began with a node type and then had any 'parameters' of that
type. In a sense, that had the same structure: a type followed by a
series of properties is akin to a command followed by a series of
arguments.

As I say, it seems to be a common pattern. I just wonder what it would
enable if it were applied to a parameter list.


--
James Harris

Bart

unread,
Aug 23, 2021, 5:55:48 PM8/23/21
to
With static code, dealing with Win64 ABI is already enough of a
nightmare then it's best to keep things as simple and conventional as
possible.

With interpreted code where there is a software stack, then it's
possible to be more flexible.

But I've looked at some of those ways of passing parameters, eg. like
Pythons *A or A*, whichever it is, which expands a list (or object) A
into N arguments. Unfortunately they don't really fit my current
implementations.

It's not impossible, but I would have to keep conventional function
calls to keep things efficient, with expanding argument lists
implemented via new bytecode instructions.

In that case that would just have to join the waiting list of such features!

It's possible in Python because everything is done at runtime (I'm
actually surprised it's not a lot slower), but it makes it harder to
optimise.

It also aspires to be a functional language so it needs to support all
these tricks with argument lists, or returning function objects, or
creating lambda functions; it's all interconnected. My own need to be
more prosaic.


James Harris

unread,
Nov 26, 2021, 1:41:49 PM11/26/21
to
On 21/08/2021 19:11, David Brown wrote:
> On 21/08/2021 11:32, James Harris wrote:
>> On 20/08/2021 19:50, David Brown wrote:

...

>>> I'd recommend looking at C++ templates.  You might not want to follow
>>> all the details of the syntax, and you want to look at the newer and
>>> better techniques rather than the old ones.  But pick a way to give
>>> compile-time parameters to types, and then use that - don't faff around
>>> with special cases and limited options.  Pick one good method, then you
>>> could have something like this :
>>>
>>>     builtin::int<32> x;
>>>     using int32 = builtin::int<32>;
>>>     int32 y;
>>>
>>> That is (IMHO) much better than your version because it will be
>>> unambiguous, flexible, and follows a syntax that you can use for all
>>> sorts of features.
>>
>> My version of that would be
>>
>>   typedef i32 = int 32
>>
>>   int 32 x
>>   i32 y
>>
>
> Punctuation here is not /necessary/, but it would make the code far
> easier to read, and far safer (in that mistakes are more likely to be
> seen by the compiler rather than being valid code with unintended meaning).

Noted.

>
>> Your C++ version doesn't seem to be any more precise or flexible.
>
> What happens when you have a type that should have two parameters - size
> and alignment, for example? Or additional non-integer parameters such
> as signedness or overflow behaviour? Or for container types with other
> types as parameters? C++ has that all covered in a clear and accurate
> manner - your system does not.

That's not wholly true. Specific terms and syntax are not yet decided
but I do have the concept of qualifiers. For example,

int 32 x
int 32 alignbits 3 y

In that, y would be required to be aligned such that the bottom 3 bits
of its address were zero.

However, the syntax is not yet chosen and if as you suggest the use of
punctuation would not be onerous I would prefer the addition of the
colon as in

int 32: x
int 32 alignbits 3: y

The additional colon would make parsing by compiler and by human easier.
I have omitted it up until now as I could imagine that programmers would
not want to have to type it in simple declarations such as

int: i

but maybe that doesn't look too bad.

>
> My intention here is to encourage you to think bigger. Stop thinking
> "how do I make integer types?" - think wider and with greater generality
> and ambition. Make a good general, flexible system of types, and then
> let your integer types fall naturally out of that.

The goal is that range specifications would apply anywhere relevant, not
just for integers. For example,

array (5..15) floati32: v

would declare an array of between 5 and 15 elements of type floati32.
One might use that as a parameter declaration to require that what gets
passed in has to be an array which matches within certain size limits.

I take your point on board but I don't know that my syntax can be made
more general without getting in to either variants or generics.

...

> Perhaps even look at the metaclasses proposal
> <https://www.fluentcpp.com/2018/03/09/c-metaclasses-proposal-less-5-minutes/>.
> This will not be in C++ before C++26, maybe even later, but it gives a
> whole new way of building code. If metaclasses had been part of C++
> from the beginning, there would be no struct, class, enum, or union in
> the language - these would have been standard library metaclasses. They
> are /that/ flexible.

I've been looking at some material by one of the proposers, Herb Sutter.
A proper discussion about such things needs a topic of its own but I
should say here that my fundamental reaction is not favourable. He
effectively acknowledges that the more such proposals complicate a
language the more a programmer has to depend on tools to help understand
program code and that's a step too far for me at present.

One of Sutter's justifications for metaclasses is that they help remove
a lot of boilerplate from C++ code and, on that, ISTM that the problem
may be the presence of the boilerplate in the first place. I don't know
enough C++ but ATM I am not sure I'd have the boilerplate to remove.

As a wider point, I've previously seen Python become more 'clever'. And
it has become less comprehensible as a result. C++ is already alarmingly
complex. Sutter's proposals would make it more so. In both cases ISTM
that people who are unhealthily immersed in the language (Python, C++)
see ways to be even cleverer and that makes a language worse. I believe
similar has happened with Ada and JavaScript. C, by contrast, despite
some enhancements has remained pleasantly small and focussed.

I'll keep metaclasses in mind but ATM they are in the bucket with other
ideas for code customisation such as those you list here:

> Ultimately, things like macros, templates, generics, metafunctions,
> etc., are just names for high-level compile-time coding constructs.


--
James Harris

David Brown

unread,
Nov 27, 2021, 10:17:44 AM11/27/21
to
Before you try to think out a syntax here, ask yourself /why/ someone
would want this feature. What use is it? What are the circumstances
when you might need non-standard alignment? What are the consequences
of it? If this is something that is only very rarely useful (and I
believe that is the case here - but /you/ have to figure that out for
your language), there is no point going out of your way to make it easy
to write. Common things should be easy to write - rare things can be
hard to write. What you certainly don't want is a short but cryptic way
to write it.

So for me, your "alignbits 3" is just wrong - it makes no sense. You
are trying to say it should be aligned with 8-byte alignment, also known
as 64-bit alignment. Obviously I can figure out what you meant - there
really isn't any other possibility for "alignbits 3". But if you had
written "alignbits 8", I would take that to mean a packed or unaligned
declaration, not one with 256-byte alignment.

> However, the syntax is not yet chosen and if as you suggest the use of
> punctuation would not be onerous I would prefer the addition of the
> colon as in
>
>   int 32: x
>   int 32 alignbits 3: y
>

The details here are a matter of taste, but you get my point about
improving readability.

> The additional colon would make parsing by compiler and by human easier.
> I have omitted it up until now as I could imagine that programmers would
> not want to have to type it in simple declarations such as
>
>   int: i
>
> but maybe that doesn't look too bad.
>

I only know one person who regularly complains about having to use
punctuation and finds it inconvenient to type symbols. But even he uses
punctuation at times in his languages.

(On the other hand, too much punctuation makes code harder to read and
write. As with most things, you want a happy medium.)

>>
>> My intention here is to encourage you to think bigger.  Stop thinking
>> "how do I make integer types?" - think wider and with greater generality
>> and ambition.  Make a good general, flexible system of types, and then
>> let your integer types fall naturally out of that.
>
> The goal is that range specifications would apply anywhere relevant, not
> just for integers. For example,
>
>   array (5..15) floati32: v
>
> would declare an array of between 5 and 15 elements of type floati32.
> One might use that as a parameter declaration to require that what gets
> passed in has to be an array which matches within certain size limits.
>
> I take your point on board but I don't know that my syntax can be made
> more general without getting in to either variants or generics.
>

Unless a language was just a simple, limited scripting tool, I would not
bother making (or learning) a new language that did not have features
such as variants or generics (noting that these terms are vague and mean
different things to different people). There are perfectly good
languages without such features. Given the vast benefits of C in terms
of existing implementation, code, experience and information, why would
anyone bother with a different compiled language unless it let them do
things that you cannot easily do in C? Being able to make your own
types, with their rules, invariants, methods, operators, etc., is pretty
much a basic level feature for modern languages. Generic programming is
standard. I would no longer consider these as advanced or complex
features of a modern language, I'd consider them foundational.

Note that I am /not/ saying you should copy C++'s templates, or Ada's
classes. Your best plan is to learn from these languages - see what
they can do. And then find a better, nicer, clearer and simpler way to
get the same (or more) power. When you are starting a new language, you
don't have to keep compatibility and build step by step over many years,
you can jump straight to a better syntax.

> ...
>
>> Perhaps even look at the metaclasses proposal
>> <https://www.fluentcpp.com/2018/03/09/c-metaclasses-proposal-less-5-minutes/>.
>>
>>   This will not be in C++ before C++26, maybe even later, but it gives a
>> whole new way of building code.  If metaclasses had been part of C++
>> from the beginning, there would be no struct, class, enum, or union in
>> the language - these would have been standard library metaclasses.  They
>> are /that/ flexible.
>
> I've been looking at some material by one of the proposers, Herb Sutter.
> A proper discussion about such things needs a topic of its own but I
> should say here that my fundamental reaction is not favourable. He
> effectively acknowledges that the more such proposals complicate a
> language the more a programmer has to depend on tools to help understand
> program code and that's a step too far for me at present.

Remember that metaclasses are not for the "average" programmer. They
are for the library builders and the language builders. A good
proportion of modern C++ features are never seen or used by the majority
of programmers, but they are used underneath to implement the features
that /are/ used. Few C++ programmers really understand rvalue
references and move semantics, but they are happy to see that the
standard library container classes are now more efficient - without
caring about the underlying language changes that allow those efficiency
gains. Probably something like 99% of Python programmers have never
even heard of metaclasses, yet they use libraries that make use of them.

>
> One of Sutter's justifications for metaclasses is that they help remove
> a lot of boilerplate from C++ code and, on that, ISTM that the problem
> may be the presence of the boilerplate in the first place. I don't know
> enough C++ but ATM I am not sure I'd have the boilerplate to remove.

When starting with a language from scratch, you can avoid a fair amount
of boilerplate that is necessarily when features have evolved over time.
But your language will either develop idioms that need boilerplate, or
it will die out because no one uses it. (There are only two kinds of
programming languages - the ones that people complain about, and the
ones no one uses.)

Metaprogramming and metaclasses do not /remove/ boilerplate code - they
push it one level higher, so that fewer people need to make the
boilerplate code and they need to make less of it.

(Again, I am not saying that you should copy C++'s way of doing things,
or Stutter's proposals here - just that you could learn from it and be
inspired by it.)

>
> As a wider point, I've previously seen Python become more 'clever'. And
> it has become less comprehensible as a result. C++ is already alarmingly
> complex. Sutter's proposals would make it more so. In both cases ISTM
> that people who are unhealthily immersed in the language (Python, C++)
> see ways to be even cleverer and that makes a language worse. I believe
> similar has happened with Ada and JavaScript. C, by contrast, despite
> some enhancements has remained pleasantly small and focussed.
>

C basically has not changed - the new features since C99 have been quite
minor.

Bart

unread,
Nov 27, 2021, 1:55:41 PM11/27/21
to
On 27/11/2021 15:17, David Brown wrote:
> On 26/11/2021 19:41, James Harris wrote:

>> That's not wholly true. Specific terms and syntax are not yet decided
>> but I do have the concept of qualifiers. For example,
>>
>>   int 32 x
>>   int 32 alignbits 3 y
>>
>> In that, y would be required to be aligned such that the bottom 3 bits
>> of its address were zero.
>>
>
> Before you try to think out a syntax here, ask yourself /why/ someone
> would want this feature. What use is it? What are the circumstances
> when you might need non-standard alignment? What are the consequences
> of it? If this is something that is only very rarely useful (and I
> believe that is the case here - but /you/ have to figure that out for
> your language), there is no point going out of your way to make it easy
> to write. Common things should be easy to write - rare things can be
> hard to write.

Yet C breaks that rule all the time. Just today I needed to type:

unsigned char to mean byte or u8
unsigned long lont int to mean u64 [typo left in]
printf("....\n", ...) to mean println ...

Yes you could use and uint8_t uint64_t, but that still needs:

#include <stdint.h>

to be remembered to add at the top of every module


> So for me, your "alignbits 3" is just wrong - it makes no sense. You
> are trying to say it should be aligned with 8-byte alignment, also known
> as 64-bit alignment. Obviously I can figure out what you meant - there
> really isn't any other possibility for "alignbits 3". But if you had
> written "alignbits 8", I would take that to mean a packed or unaligned
> declaration, not one with 256-byte alignment.

I don't get it either, but I guess you're not complaining of a way to
control alignment of type, just that this not intuitive?

In my assembler I use:

align N

to force alignment of next data/code byte at a multiple of N bytes,
usually a power-of-two.

My HLL doesn't have that, except that I once used @@ N to control the
alignment of record fields (now I use a $caligned attribute for the
whole record as that was only use for @@, to emulate C struct layout).

>> The additional colon would make parsing by compiler and by human easier.
>> I have omitted it up until now as I could imagine that programmers would
>> not want to have to type it in simple declarations such as
>>
>>   int: i
>>
>> but maybe that doesn't look too bad.
>>
>
> I only know one person who regularly complains about having to use
> punctuation and finds it inconvenient to type symbols. But even he uses
> punctuation at times in his languages.

Shifted punctuation is worse.

> (On the other hand, too much punctuation makes code harder to read and
> write. As with most things, you want a happy medium.)
>
>>>
>>> My intention here is to encourage you to think bigger.  Stop thinking
>>> "how do I make integer types?" - think wider and with greater generality
>>> and ambition.  Make a good general, flexible system of types, and then
>>> let your integer types fall naturally out of that.
>>
>> The goal is that range specifications would apply anywhere relevant, not
>> just for integers. For example,
>>
>>   array (5..15) floati32: v
>>
>> would declare an array of between 5 and 15 elements of type floati32.

(No. That's just not what anyone would guess that to mean. It looks like
an array of length 11 indexed from 5 to 15 inclusive.

It's not clear what the purpose of this is, or what a compiler is
supposed to do with that info.)

> Unless a language was just a simple, limited scripting tool, I would not
> bother making (or learning) a new language that did not have features
> such as variants or generics (noting that these terms are vague and mean
> different things to different people). There are perfectly good
> languages without such features. Given the vast benefits of C in terms
> of existing implementation, code, experience and information, why would
> anyone bother with a different compiled language unless it let them do
> things that you cannot easily do in C? Being able to make your own
> types, with their rules, invariants, methods, operators, etc., is pretty
> much a basic level feature for modern languages. Generic programming is
> standard. I would no longer consider these as advanced or complex
> features of a modern language, I'd consider them foundational.

That still leaves a big gap between C, and a language with all those
advanced features, which probably cannot offer the benefits of small
footprint, transparency, and the potential for a fast build process.

Plus there are plenty of things at the level of C that some people (me,
for a start) want but it cannot offer:

* An alternative to that god-forsaken, error prone syntax
* Freedom from case-sensitivity
* 1-based arrays!
* An ACTUAL byte/u8 type without all the behind-the-scenes
nonsense, and the need for stdint/inttypes etc
* 64-bit integer types as standard
* A grown-up Print feature
* etc etc

What /are/ the actual alternatives available as the next C replacement;
Rust and Zig? You're welcome to them!

James Harris

unread,
Nov 28, 2021, 4:33:17 AM11/28/21
to
On 27/11/2021 15:17, David Brown wrote:
> On 26/11/2021 19:41, James Harris wrote:
>> On 21/08/2021 19:11, David Brown wrote:

...

>>> What happens when you have a type that should have two parameters - size
>>> and alignment, for example?  Or additional non-integer parameters such
>>> as signedness or overflow behaviour?  Or for container types with other
>>> types as parameters?  C++ has that all covered in a clear and accurate
>>> manner - your system does not.
>>
>> That's not wholly true. Specific terms and syntax are not yet decided
>> but I do have the concept of qualifiers. For example,
>>
>>   int 32 x
>>   int 32 alignbits 3 y
>>
>> In that, y would be required to be aligned such that the bottom 3 bits
>> of its address were zero.
>>
>
> Before you try to think out a syntax here, ask yourself /why/ someone
> would want this feature. What use is it? What are the circumstances
> when you might need non-standard alignment? What are the consequences
> of it? If this is something that is only very rarely useful (and I
> believe that is the case here - but /you/ have to figure that out for
> your language), there is no point going out of your way to make it easy
> to write. Common things should be easy to write - rare things can be
> hard to write. What you certainly don't want is a short but cryptic way
> to write it.

Agreed, and it may change. It's just that for now I have qualifiers
after the base type and if there were a need to align a type then that's
where that particular qualifier would be put. In practice, alignment is
more likely to apply to structures/records than to integers.

>
> So for me, your "alignbits 3" is just wrong - it makes no sense. You
> are trying to say it should be aligned with 8-byte alignment, also known
> as 64-bit alignment. Obviously I can figure out what you meant - there
> really isn't any other possibility for "alignbits 3". But if you had
> written "alignbits 8", I would take that to mean a packed or unaligned
> declaration, not one with 256-byte alignment.

On that, I wonder if I could persuade you to think in terms of the
number of bits. AISI there are two ways one can specify alignment: a
power of two number of bytes that the alignment has to be a multiple of
and the number of zero bits on the RHS.

When specifying constants it's easier to begin with and convert from the
number of bits. Consider the opposite. Given

constant ALIGN_BYTES = 8

there are these two ways one might convert that to alignment bits.

constant ALIGN_BITS = Log2RoundedUp(ALIGN_BYTES)
constant ALIGN_BITS = Log2ButErrorIfNotPowerOfTwo(ALIGN_BYTES)

IOW (1) there are two possible interpretations of the conversion and,
perhaps worse, (2) either would need a special function to implement it.

By contrast, if we begin with alignment bits then there's a standard
conversion which needs no special function.

constant ALIGN_BYTES = 1 << ALIGN_BITS

Hence I prefer to use bit alignment (number of zero bits on RHS) as the
base constant. Other constants and values can easily be derived from there.

...

>>   int 32 alignbits 3: y

...

> (On the other hand, too much punctuation makes code harder to read and
> write. As with most things, you want a happy medium.)

Yes. I could require

int(32)<alignbits 3> y

which would perhaps be more convenient for the compiler (and more
familiar for a new reader) but in the long term I suspect it would be
more work for a human to write and read.

...

> Remember that metaclasses are not for the "average" programmer. They
> are for the library builders and the language builders. A good
> proportion of modern C++ features are never seen or used by the majority
> of programmers, but they are used underneath to implement the features
> that /are/ used. Few C++ programmers really understand rvalue
> references and move semantics, but they are happy to see that the
> standard library container classes are now more efficient - without
> caring about the underlying language changes that allow those efficiency
> gains. Probably something like 99% of Python programmers have never
> even heard of metaclasses, yet they use libraries that make use of them.

When it comes to language design I have a problem with the conceptual
division of programmers into average and expert. The issue is that it
assumes that each can write their own programs and the two don't mix. In
reality, code is about communication. I see a programming language as a
lingua franca. Part of its value is that everyone can understand it. If
the 'experts' start writing code which the rest of the world cannot
decipher then a significant part of that value is lost.

Hence, AISI, it's better for a language to avoid special features for
experts, if possible.

...

> it will die out because no one uses it. (There are only two kinds of
> programming languages - the ones that people complain about, and the
> ones no one uses.)

:-)


--
James Harris

James Harris

unread,
Nov 28, 2021, 5:11:23 AM11/28/21
to
On 27/11/2021 18:55, Bart wrote:
> On 27/11/2021 15:17, David Brown wrote:
>> On 26/11/2021 19:41, James Harris wrote:

...

>>>> My intention here is to encourage you to think bigger.  Stop thinking
>>>> "how do I make integer types?" - think wider and with greater
>>>> generality
>>>> and ambition.  Make a good general, flexible system of types, and then
>>>> let your integer types fall naturally out of that.
>>>
>>> The goal is that range specifications would apply anywhere relevant, not
>>> just for integers. For example,
>>>
>>>    array (5..15) floati32: v
>>>
>>> would declare an array of between 5 and 15 elements of type floati32.
>
> (No. That's just not what anyone would guess that to mean. It looks like
> an array of length 11 indexed from 5 to 15 inclusive.
>
> It's not clear what the purpose of this is, or what a compiler is
> supposed to do with that info.)

It's not meant to be a feature. It's the consequence of trying to be
consistent: allowing the parameters of parameters (if you see what I
mean) to be qualified whether the parameters are integers or arrays or
whatever else. I was pointing out to David that I didn't have a special
syntax just for integers.

I may eventually limit what a programmer could do (for
comprehensibility, perhaps!) but for now ISTM best to keep features
orthogonal and universal, even if the combination thereof looks strange
at first.


--
James Harris

David Brown

unread,
Nov 28, 2021, 8:24:45 AM11/28/21
to
On 27/11/2021 19:55, Bart wrote:
> On 27/11/2021 15:17, David Brown wrote:
>> On 26/11/2021 19:41, James Harris wrote:
>
>>> That's not wholly true. Specific terms and syntax are not yet decided
>>> but I do have the concept of qualifiers. For example,
>>>
>>>    int 32 x
>>>    int 32 alignbits 3 y
>>>
>>> In that, y would be required to be aligned such that the bottom 3 bits
>>> of its address were zero.
>>>
>>
>> Before you try to think out a syntax here, ask yourself /why/ someone
>> would want this feature.  What use is it?  What are the circumstances
>> when you might need non-standard alignment?  What are the consequences
>> of it?  If this is something that is only very rarely useful (and I
>> believe that is the case here - but /you/ have to figure that out for
>> your language), there is no point going out of your way to make it easy
>> to write.  Common things should be easy to write - rare things can be
>> hard to write.
>
> Yet C breaks that rule all the time. Just today I needed to type:
>
>    unsigned char            to mean byte or u8
>    unsigned long lont int   to mean u64 [typo left in]
>    printf("....\n", ...)    to mean println ...

If you needed to type those to mean something else, that is /purely/ a
problem with the programmer, not with the language. The language and
its standard libraries provides everything you need in order to write
these things in the way you want. As long as the language makes that
practically possible (and in the first two examples at least, extremely
simple), the language does all it needs.

No language will /ever/ mean you can trivially write all the code you
want to write! Obviously when you are doing a one-man language with one
designer, one user, and one type of code, and are happy to modify the
language to suit the program you are writing, you can come quite close.
But for real languages developed and implemented by large numbers of
people and used by huge numbers of people, that does not happen.

As so often happens, in your manic obsession to rail against C, you
completely missed the point. Oh, and you also missed that in this
newsgroup I have repeatedly said that people should study popular and
successful languages like C and C++ (and others) in order to learn from
them and take inspiration from them, and to aim to make something
/better/ for their particular purposes and requirements.

>
> Yes you could use and uint8_t uint64_t, but that still needs:
>
>    #include <stdint.h>
>
> to be remembered to add at the top of every module
>

Do you have any idea how pathetic and childish that sounds? Presumably
not, or you wouldn't have written it. So let me inform you, again, that
continually whining and crying about totally insignificant
inconveniences does nothing to help your "down with C" campaign.

>
>> So for me, your "alignbits 3" is just wrong - it makes no sense.  You
>> are trying to say it should be aligned with 8-byte alignment, also known
>> as 64-bit alignment.  Obviously I can figure out what you meant - there
>> really isn't any other possibility for "alignbits 3".  But if you had
>> written "alignbits 8", I would take that to mean a packed or unaligned
>> declaration, not one with 256-byte alignment.
>
> I don't get it either, but I guess you're not complaining of a way to
> control alignment of type, just that this not intuitive?
>

Yes. There are occasions when controlling alignment can be important,
but they are really quite rare in practice. The most common cases I see
in my line of work are use of "packed" structures to go lower than
standard alignments, and the majority (but not all) of such cases I see
are counter-productive and a really bad idea. On bigger systems,
picking higher alignments can sometimes be helpful for controlling the
efficiency of caching.

So being able to control alignment is a good feature of a relatively
low-level language. But it is something that you only need on rare
occasions, so it doesn't have to be a simple or convenient thing to
write and it /does/ have to be something that is obvious to understand
when you see it written. "alignbits 3" does not qualify, IMHO.

> In my assembler I use:
>
>     align N
>
> to force alignment of next data/code byte at a multiple of N bytes,
> usually a power-of-two.
>

That's a perfectly reasonable choice - matching pretty much ever
assembler I've ever used (baring details such as "align" vs. ".align").

> My HLL doesn't have that, except that I once used @@ N to control the
> alignment of record fields (now I use a $caligned attribute for the
> whole record as that was only use for @@, to emulate C struct layout).
>

Using "@@" for the purpose is an example of cryptic syntax that is
unhelpful for a feature that is rarely needed. A word such as a
"caligned" attribute will be clearer when people read the code.


>>> The additional colon would make parsing by compiler and by human easier.
>>> I have omitted it up until now as I could imagine that programmers would
>>> not want to have to type it in simple declarations such as
>>>
>>>    int: i
>>>
>>> but maybe that doesn't look too bad.
>>>
>>
>> I only know one person who regularly complains about having to use
>> punctuation and finds it inconvenient to type symbols.  But even he uses
>> punctuation at times in his languages.
>
> Shifted punctuation is worse.
>

So you'd rather write "x - -y" than "x + y", because it avoids the shift
key? That seems like a somewhat questionable choice of priorities.
If there were a significant number of people who wanted a language with
these features, there would be one.

David Brown

unread,
Nov 28, 2021, 8:46:24 AM11/28/21
to
On 28/11/2021 10:33, James Harris wrote:
> On 27/11/2021 15:17, David Brown wrote:
>> On 26/11/2021 19:41, James Harris wrote:
>>> On 21/08/2021 19:11, David Brown wrote:
>
> ...
>
>
>>
>> So for me, your "alignbits 3" is just wrong - it makes no sense.  You
>> are trying to say it should be aligned with 8-byte alignment, also known
>> as 64-bit alignment.  Obviously I can figure out what you meant - there
>> really isn't any other possibility for "alignbits 3".  But if you had
>> written "alignbits 8", I would take that to mean a packed or unaligned
>> declaration, not one with 256-byte alignment.
>
> On that, I wonder if I could persuade you to think in terms of the
> number of bits. AISI there are two ways one can specify alignment: a
> power of two number of bytes that the alignment has to be a multiple of
> and the number of zero bits on the RHS.

Those are equivalent in terms of the actual implementation, but not in
the way a programmer is likely to think (or want to think). The whole
point of a programming language above the level of assembly is that the
programmer doesn't think in terms of underlying representations in bits
and bytes, but at a higher level, in terms of values and the meanings of
the values. If I write "int * p = &x;", I think of "p" as a pointer to
the variable "x". I don't think about whether it is 64-bit or 32-bit,
or whether it is an absolute address or relative to a base pointer, or
how it is translated via page tables. Considering the number of zero
bits in the representation of the address is at a completely different
level of abstraction from what I would see as relevant in a programming
language.

>
> When specifying constants it's easier to begin with and convert from the
> number of bits. Consider the opposite. Given
>
>   constant ALIGN_BYTES = 8
>
> there are these two ways one might convert that to alignment bits.
>
>   constant ALIGN_BITS = Log2RoundedUp(ALIGN_BYTES)
>   constant ALIGN_BITS = Log2ButErrorIfNotPowerOfTwo(ALIGN_BYTES)
>
> IOW (1) there are two possible interpretations of the conversion and,
> perhaps worse, (2) either would need a special function to implement it.
>

This is all completely trivial to implement in your
compiler/interpreter. Users are not interested in the number of zero
bits in addresses - and they are not interested in the effort it takes
to implement a feature. If you want a programming language that is more
than a toy, a learning experiment, or a one-man show, then you must
prioritise the effort of the user by many orders of magnitude over the
convenience of the implementer.
That would be an incorrect assumption.

Prioritise readability over writeability - you write a piece of code
once, but it can be read many times. It is entirely to be expected that
there is code that people will read and understand, but know they could
not have written it themselves.

There is always going to be a huge spread between beginners (including
those that never get beyond beginner stages no matter how long they
spend), average and expert programmers. This is perhaps an unusual
aspect of programming as a profession and hobby. Imagine there were
such a spread amongst professional football ("soccer", for those living
in the ex-colonies) players. On the same team as Maradonna you'd have
someone who insists on picking up and carrying the ball, since it
clearly works, and someone who could be outrun by an asthmatic snail.

Unless you are designing a language to compete with Bart for the record
of fewest users, expect this difference in competences. Embrace it and
make use of it, rather than futilely fighting it.

James Harris

unread,
Nov 28, 2021, 9:54:07 AM11/28/21
to
On 28/11/2021 13:46, David Brown wrote:
> On 28/11/2021 10:33, James Harris wrote:
>> On 27/11/2021 15:17, David Brown wrote:


>>> So for me, your "alignbits 3" is just wrong - it makes no sense.  You
>>> are trying to say it should be aligned with 8-byte alignment, also known
>>> as 64-bit alignment.  Obviously I can figure out what you meant - there
>>> really isn't any other possibility for "alignbits 3".  But if you had
>>> written "alignbits 8", I would take that to mean a packed or unaligned
>>> declaration, not one with 256-byte alignment.

BTW, you may be assuming octet addressing. That's not always the case.

>>
>> On that, I wonder if I could persuade you to think in terms of the
>> number of bits. AISI there are two ways one can specify alignment: a
>> power of two number of bytes that the alignment has to be a multiple of
>> and the number of zero bits on the RHS.
>
> Those are equivalent in terms of the actual implementation, but not in
> the way a programmer is likely to think (or want to think). The whole
> point of a programming language above the level of assembly is that the
> programmer doesn't think in terms of underlying representations in bits
> and bytes, but at a higher level, in terms of values and the meanings of
> the values.

That's all very well but if you are thinking about alignment of values
or structures you are already working at a low level.

Further, say you had your alignment in the way you prefer, i.e. as a
number of bytes such as 8. What would you write if you wanted to apply a
commensurate shift? To get from 8 to 3 you'd need some sort of
log-base-2 function of the type I showed earlier. Which would you want a
language to provide?

All in all, I put it to you that going from 3 to 8 is easier. :-)


> If I write "int * p = &x;", I think of "p" as a pointer to
> the variable "x". I don't think about whether it is 64-bit or 32-bit,
> or whether it is an absolute address or relative to a base pointer, or
> how it is translated via page tables. Considering the number of zero
> bits in the representation of the address is at a completely different
> level of abstraction from what I would see as relevant in a programming
> language.

Well, if an address is guaranteed to be at a certain alignment then asm
programmers may store flags in the lower bits. AFAICS that's not too
easy to do in C so C programmers don't think in those terms - perhaps
putting the flags in a separate integer on their own. But there can be
value in using such bits and some hardware structures already include
them. Specifying an address as aligned can make low bits available.

At the end of the day, confining layout details to /declarations/ means
that the body of an algorithm can still work with the elements which
make sense for the task in hand, leaving the compiler to handle the
details of accessing them.


>
>>
>> When specifying constants it's easier to begin with and convert from the
>> number of bits. Consider the opposite. Given
>>
>>   constant ALIGN_BYTES = 8
>>
>> there are these two ways one might convert that to alignment bits.
>>
>>   constant ALIGN_BITS = Log2RoundedUp(ALIGN_BYTES)
>>   constant ALIGN_BITS = Log2ButErrorIfNotPowerOfTwo(ALIGN_BYTES)
>>
>> IOW (1) there are two possible interpretations of the conversion and,
>> perhaps worse, (2) either would need a special function to implement it.
>>
>
> This is all completely trivial to implement in your
> compiler/interpreter. Users are not interested in the number of zero
> bits in addresses - and they are not interested in the effort it takes
> to implement a feature. If you want a programming language that is more
> than a toy, a learning experiment, or a one-man show, then you must
> prioritise the effort of the user by many orders of magnitude over the
> convenience of the implementer.

I was talking about the facilities being made available to programmers,
not just those I would use internally!


--
James Harris

Bart

unread,
Nov 28, 2021, 11:26:13 AM11/28/21
to
On 28/11/2021 13:24, David Brown wrote:
> On 27/11/2021 19:55, Bart wrote:

>> Yet C breaks that rule all the time. Just today I needed to type:
>>
>>    unsigned char            to mean byte or u8
>>    unsigned long lont int   to mean u64 [typo left in]
>>    printf("....\n", ...)    to mean println ...
>
> If you needed to type those to mean something else, that is /purely/ a
> problem with the programmer, not with the language. The language and
> its standard libraries provides everything you need in order to write
> these things in the way you want. As long as the language makes that
> practically possible (and in the first two examples at least, extremely
> simple), the language does all it needs.

No it doesn't, not in a way I'd consider acceptable.

You don't get a 'byte' type, which was commonly used 40+ years ago but
is still missing from the world's most popular lower-level language, by
insisting users define it themselves!

Until C99 that wasn't even possible in a reliable manner. Even now, it
means mucking about with typedefs and special headers and maybe
conditional code, but is still full of complications:

* You still need interact with other people's code that uses unsigned
char or uint8_t or sometimes plain char

* You may need to interact with code from other people who have all
created their alternate solutions (_byte, Byte, _Byte, ubyte, u8 etc).

To get back to /your/ point, which was:

"Common things should be easy to write - rare things can be
hard to write."

I'm saying that you want express a u8 type as simply as possible:

byte x;

and in way that is compatible with anyone else's u8 type who's using the
same language.

I managed this in 1981 in my very first language.

Your comments anyway don't address my printf example. In 1981 I could
also write:

print x

That, you still can't do in C. You can't even do it in C++ (another
favourite of yours):

std::cout << x; // or some such thing

C++ however may provide means to emulate such a feature, which brings us
back to my comments above.


> No language will /ever/ mean you can trivially write all the code you
> want to write! Obviously when you are doing a one-man language with one
> designer, one user, and one type of code, and are happy to modify the
> language to suit the program you are writing, you can come quite close.
> But for real languages developed and implemented by large numbers of
> people and used by huge numbers of people, that does not happen.

I'm talking about the basics, the things you /commonly/ want to do.


> As so often happens, in your manic obsession to rail against C, you
> completely missed the point. Oh, and you also missed that in this
> newsgroup I have repeatedly said that people should study popular and
> successful languages like C and C++ (and others) in order to learn from
> them and take inspiration from them, and to aim to make something
> /better/ for their particular purposes and requirements.

What do you think /I/ should take away from C?

I have actually taken some things from it, but it's not many:

* Use f() for function calls with no arguments, not f
* Allow f to mean a function pointer, as well as &f
* Switched to 0xABC for hex literals, not 0ABCH

Otherwise C has little to teach me about devising system languages.

>>
>> Yes you could use and uint8_t uint64_t, but that still needs:
>>
>>    #include <stdint.h>
>>
>> to be remembered to add at the top of every module
>>
>
> Do you have any idea how pathetic and childish that sounds? Presumably
> not, or you wouldn't have written it. So let me inform you, again, that
> continually whining and crying about totally insignificant
> inconveniences does nothing to help your "down with C" campaign.

Do /you/ have any idea how incredibly crass it is to have to explicitly
incorporate headers to enable the most fundamental language features?

Suppose you needed:

#include <upper.h> # to allow upper case in identifiers
#include <lower.h> # to allow lower case in identifiers

So every program now needs lower.h. You're working away, and at some
point it fails because you've used an upper case macro name, and the
compiler throws an error.

No problem! Just edit the file to add the include for upper.h at the top.

Ridiculous, yes? That's exactly how I see most of C's standard headers.
Just enable the lot by default, and increase the productivity of the
world's C programmers by a couple of percentage points.


>> Shifted punctuation is worse.
>>
>
> So you'd rather write "x - -y" than "x + y", because it avoids the shift
> key? That seems like a somewhat questionable choice of priorities.

Some is unavoidable. But a lot of it is.

for (i = 1; i<=N; ++i) {
printf("%d %f\n", i, sqrt(i)); # 16 shifted keys
}

for i to n do # 0 shifted keys
println i, sqrt i
od

In real code, C may use 30-50% more shifted keys than the equivalent in
my syntax, not counting shifted alphabetics because of mixed or upper
case (mine is case-insensitive, so I can choose to write 'messagebox',
not 'MessageBox').

But I tend to use them most often in temporary debug code, which does
use lots of prints and loops like my example.

>>
>>     * An alternative to that god-forsaken, error prone syntax
>>     * Freedom from case-sensitivity
>>     * 1-based arrays!
>>     * An ACTUAL byte/u8 type without all the behind-the-scenes
>>       nonsense, and the need for stdint/inttypes etc
>>     * 64-bit integer types as standard
>>     * A grown-up Print feature
>>     * etc etc
>>
>> What /are/ the actual alternatives available as the next C replacement;
>> Rust and Zig? You're welcome to them!
>>
>
> If there were a significant number of people who wanted a language with
> these features, there would be one.

There are a few with such features, unfortunately not all in the same
language! (Ada has the first 3 of my list, but also has an impossible
type system.)



David Brown

unread,
Nov 28, 2021, 12:18:38 PM11/28/21
to
On 28/11/2021 15:54, James Harris wrote:
> On 28/11/2021 13:46, David Brown wrote:
>> On 28/11/2021 10:33, James Harris wrote:
>>> On 27/11/2021 15:17, David Brown wrote:
>
>
>>>> So for me, your "alignbits 3" is just wrong - it makes no sense.  You
>>>> are trying to say it should be aligned with 8-byte alignment, also
>>>> known
>>>> as 64-bit alignment.  Obviously I can figure out what you meant - there
>>>> really isn't any other possibility for "alignbits 3".  But if you had
>>>> written "alignbits 8", I would take that to mean a packed or unaligned
>>>> declaration, not one with 256-byte alignment.
>
> BTW, you may be assuming octet addressing. That's not always the case.

It /is/ the case on any system where your language will be used.
Non-octal addressing is only used on legacy mainframes for which almost
no new code is written, and on a few niche devices such as some DSPs.
Don't kid yourself - the chances of your language being used on any of
these is not low, it is zero. Trying to add any kind of flexibility or
support for anything other than 8-bit bytes would be a disservice to
your potential users.

>
>>>
>>> On that, I wonder if I could persuade you to think in terms of the
>>> number of bits. AISI there are two ways one can specify alignment: a
>>> power of two number of bytes that the alignment has to be a multiple of
>>> and the number of zero bits on the RHS.
>>
>> Those are equivalent in terms of the actual implementation, but not in
>> the way a programmer is likely to think (or want to think).  The whole
>> point of a programming language above the level of assembly is that the
>> programmer doesn't think in terms of underlying representations in bits
>> and bytes, but at a higher level, in terms of values and the meanings of
>> the values.
>
> That's all very well but if you are thinking about alignment of values
> or structures you are already working at a low level.
>

That is somewhat true. But there is no need to go lower than necessary.

> Further, say you had your alignment in the way you prefer, i.e. as a
> number of bytes such as 8. What would you write if you wanted to apply a
> commensurate shift? To get from 8 to 3 you'd need some sort of
> log-base-2 function of the type I showed earlier. Which would you want a
> language to provide?
>

Again, I want /you/ to think about what /your/ users will actually need
and use. When would they need this? Is it really something people will
need? I believe you are trying to optimise for non-existent use-cases,
instead of realistic ones. If you believe otherwise, please say so -
perhaps with examples or justification. (It's your language, you don't
/have/ to justify your choice of features, but it makes it easier to
give helpful suggestions.)

> All in all, I put it to you that going from 3 to 8 is easier. :-)
>

I agree it is easier to go that way. But since I don't think that is
something that will often be needed, I don't see its ease as being
important.

And of course there is nothing to stop you doing the equivalent of

#define struct_align_needed 3
alignas(1 < struct_align_needed) struct S s;

or whatever. In other words, if you really need to go from 3 to 8, then
you can happily do that even if your "align" method takes an 8 rather
than a 3.

>
>> If I write "int * p = &x;", I think of "p" as a pointer to
>> the variable "x".  I don't think about whether it is 64-bit or 32-bit,
>> or whether it is an absolute address or relative to a base pointer, or
>> how it is translated via page tables.  Considering the number of zero
>> bits in the representation of the address is at a completely different
>> level of abstraction from what I would see as relevant in a programming
>> language.
>
> Well, if an address is guaranteed to be at a certain alignment then asm
> programmers may store flags in the lower bits. AFAICS that's not too
> easy to do in C so C programmers don't think in those terms - perhaps
> putting the flags in a separate integer on their own. But there can be
> value in using such bits and some hardware structures already include
> them. Specifying an address as aligned can make low bits available.

It is quite rare that it makes sense to use those extra bits like that.
And no, it is not particularly hard to do so in C - you just need to be
a little careful (and its likely to be somewhat non-portable).

>
> At the end of the day, confining layout details to /declarations/ means
> that the body of an algorithm can still work with the elements which
> make sense for the task in hand, leaving the compiler to handle the
> details of accessing them.
>

Yes - that is why you very rarely need to specify alignment. The
compiler should know the rules for what makes sense on the platform and
the ABI in use.

>
>>
>>>
>>> When specifying constants it's easier to begin with and convert from the
>>> number of bits. Consider the opposite. Given
>>>
>>>    constant ALIGN_BYTES = 8
>>>
>>> there are these two ways one might convert that to alignment bits.
>>>
>>>    constant ALIGN_BITS = Log2RoundedUp(ALIGN_BYTES)
>>>    constant ALIGN_BITS = Log2ButErrorIfNotPowerOfTwo(ALIGN_BYTES)
>>>
>>> IOW (1) there are two possible interpretations of the conversion and,
>>> perhaps worse, (2) either would need a special function to implement it.
>>>
>>
>> This is all completely trivial to implement in your
>> compiler/interpreter.  Users are not interested in the number of zero
>> bits in addresses - and they are not interested in the effort it takes
>> to implement a feature.  If you want a programming language that is more
>> than a toy, a learning experiment, or a one-man show, then you must
>> prioritise the effort of the user by many orders of magnitude over the
>> convenience of the implementer.
>
> I was talking about the facilities being made available to programmers,
> not just those I would use internally!
>

Programmers don't need these - it's not something they have to do. And
if they /do/, then they can do so with :

constant ALIGN_BITS = 3
constant ALIGN_BYTES = 1 << ALIGN_BITS

or maybe:

constant ALIGN_BYTES = 8
constant ALIGN_BITS = 3
static_assert(ALIGN_BYTES == 1 << ALIGN_BITS,
"Failed alignment sanity check")

You are inventing non-existent problems here.

David Brown

unread,
Nov 28, 2021, 12:24:35 PM11/28/21
to
On 28/11/2021 17:26, Bart wrote:

> There are a few with such features, unfortunately not all in the same
> language! (Ada has the first 3 of my list, but also has an impossible
> type system.)
>

You really do think the world should revolve around /you/, don't you?
You probably also write letters to your local newspaper complaining that
the breakfast cereals you personally prefer are on the top shelf rather
than at a more convenient height.

Most people would be very happy to be in the position where the most
difficult part of their job was having to press the shift key several
times per day.

Bart

unread,
Nov 28, 2021, 2:25:35 PM11/28/21
to
Look: when I first started programming, then these characteristics were
common:

* Case insensitive (in code, file system and CLIs)

* 1-based indexing, with A[i,j] for 2D accesses

* Keyword-based block delimiters (do...end, not {...})

* Proper Read A, B, C and Print A, B, C features ...

* ... and line-based processing of text files

* Linear, left-to-right type specifiers

I liked those, they worked well, and I incorporated them into my own
stuff (I used N-based indexing, which defaulted to 1-based)

But I didn't think them remarkable, until years later when the
combination of Unix+C started to take over the world, and I first came
across the alternatives that that combo was trying to inflict on
everyone else:

* Case sensitive (in code, file system and CLIs)

* 0-based indexing, with A[i][j] for 2D accesses

* Brace-based delimiters for everything (all statements, all data)

* Off-language, library-based I/O with 'format specifiers' ...

* ... and character-based processing of text files

* For C, convoluted inside-out type specifiers that even the designers
admitted was a mistake (with everyone else pretending they were a
good idea)

In every case, for reasons I won't go into here, I found those inferior.

Why SHOULDN'T I be allowed to have my own preferences, and why SHOULDN'T
I complain when those have been marginalised in favour of inferior
practices?

Getting back to what this is about, which was your suggestion that C is
so perfect, it is pointless to create something new unless it comes with
a raft of advanced, heavy features, then why SHOULDN'T there be an
alternative systems language with it's OWN set of characteristics?

Yes, C has pretty much won the war for ubiquitous systems language,
although I don't remember there being any viable /mainstream/ contenders.

It doesn't meant it's great; it means it's what most are stuck with.

My own is a private language created years before I encountered C, and
now it does have plenty of significant features that are not in C.

(Like: proper value-array types, modules, keyword/optional parameters,
proper for-loops, 64-bit default types, a true 'char' type, an actual
BYTE type, strinclude, tabledata, proper switch...

All ones I used every single day.)

Why would I use C when I can use mine? Why shouldn't anyone be able to
do the same?

You of course just want to be patronising and insist any such project
can only ever be a toy that someone does for fun until they come to
their senses.

This group is also about discussing aspect of language design. If you
want to talk about some of your own ideas, regardless of whether you're
going to implement them, then that would great.

However you seem intent on trashing everyone's ideas and aspirations.

Andy Walker

unread,
Nov 28, 2021, 4:27:25 PM11/28/21
to
On 28/11/2021 13:46, David Brown wrote:
> There is always going to be a huge spread between beginners (including
> those that never get beyond beginner stages no matter how long they
> spend), average and expert programmers. This is perhaps an unusual
> aspect of programming as a profession and hobby. Imagine there were
> such a spread amongst professional football ("soccer", for those living
> in the ex-colonies) players. On the same team as Maradonna you'd have
> someone who insists on picking up and carrying the ball, since it
> clearly works, and someone who could be outrun by an asthmatic snail.

Hm. I'm not sure this is the best analogy you've ever
produced! Every sport and hobby has beginners and experts;
and it is at least as unusual in programming as in most other
spheres to find world-class experts and novices in the same
team [to the extent to which programming is even a team game!].

Somewhat OTOH, in two of my principal interests, it is
not unusual for rank amateurs to encounter top players. A few
years back, I found myself playing against one of the world's
top chess grandmasters; I did at least last longer than our
board 2, whose opponent was a mere international master. We
have had GMs and IMs playing in our local league. In music,
it is quite common for student/amateur orchestras and other
ensembles to engage top pianists/violinists/singers to play
concertos and perhaps give lessons.

In a third main interest, a friend who was organising
a lower-league local cricket match was rather surprised to be
contacted by the manager of the New Zealand tourists: "I've
been told you have a match this afternoon?" "Yes." "Well,
we have [top Test player] recovering from injury, and we'd be
extremely grateful if he could play for you." "Um, you do
realise we're not very good?" "That's fine, he just needs the
exercise in a real match." So tTp did play, and apparently he
was a really great guy, very friendly, joined in at the bar,
and gave lots of top tips.

--
Andy Walker, Nottingham.
Andy's music pages: www.cuboid.me.uk/andy/Music
Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Ravel

Bart

unread,
Nov 28, 2021, 5:05:26 PM11/28/21
to
On 28/11/2021 13:46, David Brown wrote:
> On 28/11/2021 10:33, James Harris wrote:

> Unless you are designing a language to compete with Bart for the record
> of fewest users,

The smallest number would zero; there must 1000s of dead languages used
by nobody.

And there must also be a number of personal or in-house or just rare
instances of languages that only happen to be used by one person.

That doesn't mean they are worthless, or not any good.

Some of them may be implemented on top of more mainstream languages and
tools, so they they not completely insular.

I dare say you've also also chosen and configured your tools so that
you're effectively working with a personal dialect of C or whatever;
some of us just take it a bit further.

Having a language become popular and in widespread use is simply not one
of my aims and never has been. I know how fanatical people are about
languages, and I'm not interested in persuading anyone to switch.

David Brown

unread,
Nov 29, 2021, 4:05:08 AM11/29/21
to
On 28/11/2021 20:25, Bart wrote:
> On 28/11/2021 17:24, David Brown wrote:
>> On 28/11/2021 17:26, Bart wrote:
>>
>>> There are a few with such features, unfortunately not all in the same
>>> language! (Ada has the first 3 of my list, but also has an impossible
>>> type system.)
>>>
>>
>> You really do think the world should revolve around /you/, don't you?
>> You probably also write letters to your local newspaper complaining that
>> the breakfast cereals you personally prefer are on the top shelf rather
>> than at a more convenient height.
>>
>> Most people would be very happy to be in the position where the most
>> difficult part of their job was having to press the shift key several
>> times per day.
>>
>
>
> Look: when I first started programming, then these characteristics were
> common:

Why should anyone care?

When I started watching TV, there were three channels, and most TV's
were black and white and about as deep as they were high. Does that
mean I think everyone should go back to such boxes? Do I moan and whine
that manufacturers are breaking some magical unwritten contract because
now you can't put a pot-plant on top of the TV?

When I started programming, I used BASIC. It was a great language for a
8-year old kid to learn programming. It was a shite language for
serious code and useful programs, and is totally unsuitable for anything
but a glorified "Word" macro in comparison to today's languages and
tools. I learned from it, and moved on.

When I started assembly programming, I had to hand-assembly everything
into hex using tables of opcodes. I debugged using the sound of the
power supply on my Spectrum. I learned a lot from those days too, and
moved on.

Do I think anyone here cares what /I/ used when /I/ learned to program?
No. Why should someone making new languages today care what /you/ used?

The past is gone. We can learn from it - look at what worked back then,
and what did not work. Look at what people kept, look at what changed.
Look at what concepts remained constant, and what has not. Look at
which fashions came and went, look at which went away then came back.
Ask why.

But only a fool would want to go back to the past.

>
>   * Case insensitive (in code, file system and CLIs)

That stems from a time when computers had six bits for a character
because 8 bits would cost too much, and people used teletype instead of
screens and keyboards. If you have trouble getting your cases right,
you are in the wrong job.

>
>   * 1-based indexing, with A[i,j] for 2D accesses

1-based counting is good for everyday counting, not for programming.
You want to program? Learn some maths. (For higher level languages,
arrays that are indexable by different ranges, types or tuples is good.)

>
>   * Keyword-based block delimiters (do...end, not {...})

That comes from a time when keyboards with symbols such as { } were
considered advanced and modern. (Hence those monstrosities, the
trigraph and digraph.) Oh, I forgot - you find it such an effort to
press the "shift" key on your keyboard.

>
>   * Proper Read A, B, C and Print A, B, C features ...

What a pointless and meaningless statement. There are a hundred and one
different ways to do "proper" read and print, with everyone having their
own ideas about what is best. Most people, of course, realise that
programming languages are designed for more than one programmer and thus
such features are invariably a compromise.

> Why SHOULDN'T I be allowed to have my own preferences, and why SHOULDN'T
> I complain when those have been marginalised in favour of inferior
> practices?

Because you are wrong.

And you are a margin of /one/, because you believe that languages should
follow exactly what /you/ want in all aspects, with a total disregard
for anyone else.

Because - and I really can't emphasise this enough - we've heard it all
before. Many, /many/ times. Endlessly, repeatedly. You think the
gates of hell opened up the day C was conceived and the first release of
Unix was the start of Ragnarök. We know. Get over yourself.


Yes, you can have your opinion. Yes, you can make your own language the
way /you/ want it. Yes, you can give suggestions and ideas based on
these in a discussion about languages. No, you can't tell people that
you alone are right, and the rest of the world is wrong, and expect to
be taken seriously.


>
> Getting back to what this is about, which was your suggestion that C is
> so perfect, it is pointless to create something new unless it comes with
> a raft of advanced, heavy features, then why SHOULDN'T there be an
> alternative systems language with it's OWN set of characteristics?
>

You consistently demonstrate that you have no clue as to what threads
here are about. You have such a fanatic and unreasoned loathing of C
that you are unable to understand what people write - you make totally
unwarranted assumptions and then fly off your handle to attack the
mirages of your mind.

I /could/ explain what I had written earlier. But what would be the
point? It would just repeat the same things I wrote before. You didn't
read them then, why should I think you'll read them now?

> This group is also about discussing aspect of language design. If you
> want to talk about some of your own ideas, regardless of whether you're
> going to implement them, then that would great.
>
> However you seem intent on trashing everyone's ideas and aspirations.

I /have/ been discussing ideas and suggestions here. Some have been of
interest to James, others not - which is fine. And I write comments
aimed at making him (or anyone else interested in languages) think about
how the language might be used - because that's what's important. I am
not the one who thinks every thread is an opportunity for a new anti-C rant.

David Brown

unread,
Nov 29, 2021, 6:09:36 AM11/29/21
to
On 28/11/2021 22:27, Andy Walker wrote:
> On 28/11/2021 13:46, David Brown wrote:
>> There is always going to be a huge spread between beginners (including
>> those that never get beyond beginner stages no matter how long they
>> spend), average and expert programmers.  This is perhaps an unusual
>> aspect of programming as a profession and hobby.  Imagine there were
>> such a spread amongst professional football ("soccer", for those living
>> in the ex-colonies) players.  On the same team as Maradonna you'd have
>> someone who insists on picking up and carrying the ball, since it
>> clearly works, and someone who could be outrun by an asthmatic snail.
>
>     Hm.  I'm not sure this is the best analogy you've ever
> produced!  Every sport and hobby has beginners and experts;
> and it is at least as unusual in programming as in most other
> spheres to find world-class experts and novices in the same
> team [to the extent to which programming is even a team game!].
>

The difference, I think, is that in programming you /do/ get a wide
range even within teams. And you certainly get a very wide range of
people working as programmers in different places.

In particular, it is not just a mix of amateurs and professionals - as
your experience shows, you can get that in many fields. In programming,
you get people making a living as programmers despite being completely
incompetent. And even amongst people who do a reasonable job, you can
get an order of magnitude difference in productivity.

Dmitry A. Kazakov

unread,
Nov 29, 2021, 6:40:50 AM11/29/21
to
On 2021-11-29 12:09, David Brown wrote:

> In programming,
> you get people making a living as programmers despite being completely
> incompetent.

Reminds me of politicians, pop musicians, journalists, economists,
environmentalists... (put quotation marks as appropriate)

> And even amongst people who do a reasonable job, you can
> get an order of magnitude difference in productivity.

That is the 80/20 law.

But I agree with you, incompetence is strong with programmers...

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

Bart

unread,
Nov 29, 2021, 8:06:05 AM11/29/21
to
On 29/11/2021 09:05, David Brown wrote:
> On 28/11/2021 20:25, Bart wrote:

> Why should anyone care?
>
> When I started watching TV, there were three channels, and most TV's
> were black and white and about as deep as they were high.

In 1962 we had a TV with only 1 channel, in black and white and with 405
lines.

In the same year, Lawrence of Arabia was released in cinemas in 70mm
format (somewhat beyond 4K quality). The screen was pretty flat too!

Not sure how such things are relevant, my remarks are about
characteristics of programming languages, not hardware.

>>   * Case insensitive (in code, file system and CLIs)
>
> That stems from a time when computers had six bits for a character
> because 8 bits would cost too much, and people used teletype instead of
> screens and keyboards.
All languages and all OSes were using the same hardware. Yet Unix+C went
for case-sensitivity, other made a different choice.

A /choice/. That doesn't make it right and the others wrong.

I used machines with both upper and lower case capability from 1982; I
still prefered case-insensitivity because it was generally better and
more user-friendly.

> If you have trouble getting your cases right,
> you are in the wrong job.

If you have trouble thinking up distinct identifiers in examples like this:

Abc abc = ABC;

then /you're/ in the wrong job!

>>
>>   * 1-based indexing, with A[i,j] for 2D accesses
>
> 1-based counting is good for everyday counting, not for programming.

Bollocks. In any case, you snipped my remark that I implement N-based
arrays, so that I can use 0-based /as needed/, and have always done.

You haven't explained why A[i][j] is better than A[i,j].

>>   * Keyword-based block delimiters (do...end, not {...})
>
> That comes from a time when keyboards with symbols such as { } were

So you see it as progress that {,} with their innumerable issues were
introduced. Because this:

} else {

or:

}
else {

or:

} else
{

or:

}
else
} # (oops!)

etc. is SO much better than just:

else

You must be delusional.


> considered advanced and modern. (Hence those monstrosities, the
> trigraph and digraph.) Oh, I forgot - you find it such an effort to
> press the "shift" key on your keyboard.
>
>>
>>   * Proper Read A, B, C and Print A, B, C features ...
>
> What a pointless and meaningless statement. There are a hundred and one
> different ways to do "proper" read and print, with everyone having their
> own ideas about what is best.

This is just pure jealousy. Show me the C code needed to do the
equivalent of this (without knowing the types of a, b, c other than they
are numeric):

print "?"
readln a, b, c
println a, b, c

Here the language provides informal line-based i/o, as might be useful
for interactive programs, or reading/writing files, while still allowing
more precise control as needed.

> Because you are wrong.

> And you are a margin of /one/, because you believe that languages should
> follow exactly what /you/ want in all aspects, with a total disregard
> for anyone else.

What exactly are the choices for someone in 2021 who wants to use (or is
required to use) a language like C, but favours even one of my
characteristics?

> Yes, you can have your opinion. Yes, you can make your own language the
> way /you/ want it. Yes, you can give suggestions and ideas based on
> these in a discussion about languages. No, you can't tell people that
> you alone are right, and the rest of the world is wrong, and expect to
> be taken seriously.

But YOU are allowed to say that:

* Case-insensitivity is wrong
* 1-based is wrong
* A[i,j] is wrong
* Anything other than {...} blocks is wrong
* Easy read/print statements in a language are wrong
* Line-based i/o is wrong
* Left-to-right type syntax is wrong. (Did you say that, or decided not
to mention that one?!)

All things that C doesn't have.

Actually, Ada and Fortran are still around, are case-insensitive, are
N-based, and don't use brace syntax.

Lua doesn't use braces for blocks. It is also 1-based.

Also 1-based are Julia, Mathematica, and Matlab ("Learn some maths"? Sure!)

Julia doesn't use braces either.

These are all characterics that still exist across languages, but not
necessarily within one system languages which is where I need them.


>
>>
>> Getting back to what this is about, which was your suggestion that C is
>> so perfect, it is pointless to create something new unless it comes with
>> a raft of advanced, heavy features, then why SHOULDN'T there be an
>> alternative systems language with it's OWN set of characteristics?
>>

> I /could/ explain what I had written earlier. But what would be the
> point? It would just repeat the same things I wrote before. You didn't
> read them then, why should I think you'll read them now?

I'll repeat what you said:

"Given the vast benefits of C in terms
of existing implementation, code, experience and information, why would
anyone bother with a different compiled language unless it let them do
things that you cannot easily do in C?"

You are clearly saying, don't bother creating an alternative to C unless
it actually does something different.

I disagreed: you CAN have an alternative that, while it does the same
things, can achieve that differently.

I listed some things out of many dozens. You of course with disagree
with every one of them.

Doesn't matter what it is:

C does X C's way is perfect
Bart does Y Bart is WRONG, and in a minority of one

Even when I show that other languages, old or new, also do Y. Or when I
give an example of Y clearly being better than X.

My dislike of C is rational. Your loyalty to it, and hatred of anyone
who dares to badmouth it, is irrational.
> I /have/ been discussing ideas and suggestions here. Some have been of
> interest to James, others not - which is fine. And I write comments
> aimed at making him (or anyone else interested in languages) think about
> how the language might be used - because that's what's important. I am
> not the one who thinks every thread is an opportunity for a new anti-C rant.

No. Your message is 'Just don't bother trying to rewrite C', presumably
because it is perfect.

C is still VERY widely used, but you don't believe in an alternative.
You want people to continue driving a Model T, unless the new car can
also fly!

David Brown

unread,
Nov 29, 2021, 10:19:50 AM11/29/21
to
On 29/11/2021 14:06, Bart wrote:
> On 29/11/2021 09:05, David Brown wrote:
>> On 28/11/2021 20:25, Bart wrote:
>
>> Why should anyone care?
>>
>> When I started watching TV, there were three channels, and most TV's
>> were black and white and about as deep as they were high.
>
> In 1962 we had a TV with only 1 channel, in black and white and with 405
> lines.
>
> In the same year, Lawrence of Arabia was released in cinemas in 70mm
> format (somewhat beyond 4K quality). The screen was pretty flat too!
>
> Not sure how such things are relevant, my remarks are about
> characteristics of programming languages, not hardware.
>
>>>    * Case insensitive (in code, file system and CLIs)
>>
>> That stems from a time when computers had six bits for a character
>> because 8 bits would cost too much, and people used teletype instead of
>> screens and keyboards.
> All languages and all OSes were using the same hardware. Yet Unix+C went
> for case-sensitivity, other made a different choice.
>
> A /choice/. That doesn't make it right and the others wrong.
>

Most programming languages in use today are case-sensitive. Those that
are not are mostly leftovers from the days when computers SHOUTED at you
because they didn't support lower case letters.

Most filesystems in use today are case-sensitive. Those that are not
are mostly leftovers from those same days. Even NTFS on Windows is a
fully case-sensitive filesystem, and can happily support "readme.txt"
and "Readme.txt" as different files in the same directory. The OS has a
layer in its API to make the filesystem appear case-preserving but
case-insensitive.

Case insensitive doesn't work when you go beyond the UK/US alphabet.
The complications for various languages are immense. In German, the
letter ß traditionally capitalises as SS - one letter turns into two.
In Turkish, "i" and "I" are two completely different letters, with their
opposite cases being "İ" and "ı". It quickly becomes ridiculous when
you need to support multiple languages. On the other hand,
case-sensitive naming is usually just done as binary comparison.

So unless you think that everyone should be forced to write a limited
form of UK or US English and that ASCII is good enough for everyone,
case-sensitive is the only sane choice for file systems.


You can reasonably argue that the majority choice is not necessarily
right. But you have a much harder time trying to argue that an outdated
minority choice is right.


> I used machines with both upper and lower case capability from 1982; I
> still prefered case-insensitivity because it was generally better and
> more user-friendly.
>
>> If you have trouble getting your cases right,
>> you are in the wrong job.
>
> If you have trouble thinking up distinct identifiers in examples like this:
>
>    Abc abc = ABC;
>
> then /you're/ in the wrong job!
>

That's a strawman, and you know it. Or do you think it's fine to write:

OO0O1I II1IIlI1 = OIOII1IIl0I;

The ability to write sensible identifiers - or confusing ones - is not
dependent on case sensitivity. (And please don't give us the tired old
bullshit about having seen poor coding in some C code you found online
that left you confused. It would merely show that you prefer
cherry-picking to rational arguments, or that you are easily confused.)

>>>
>>>    * 1-based indexing, with A[i,j] for 2D accesses
>>
>> 1-based counting is good for everyday counting, not for programming.
>
> Bollocks. In any case, you snipped my remark that I implement N-based
> arrays, so that I can use 0-based /as needed/, and have always done.
>

As I said, it can be good to have more flexible array indexes in a
higher level language.

But if you have just one starting point, 0 is the sensible one. You
might not like the way C handles arrays (and I'm not going to argue
about it - it certainly has its cons as well as its pros), but even you
would have to agree that defining "A[i]" to be the element at "address
of A + i * the size of the elements" is neater and clearer than
one-based indexing. Again, 0 is the common choice, especially amongst
lower level languages. (The worst possible choice, of course, is to
have a configurable default starting number.)

> You haven't explained why A[i][j] is better than A[i,j].
>

I didn't "explain" it because I don't agree - the two choices have their
pros and cons. One views arrays as purely linear - so A is a linear
array of elements, each of which is a linear array. The other views
arrays like A as being a single object with multiple dimensions.
Sometimes one viewpoint is better than the other.

I can, however, note that I dislike C's comma operator. One of its
disadvantages is that it means "A[i, j]" is interpreted as "evaluate i
for it's side-effects, then treat as A[j]", which is not remotely helpful.

>>>    * Keyword-based block delimiters (do...end, not {...})
>>
>> That comes from a time when keyboards with symbols such as { } were
>
> So you see it as progress that {,} with their innumerable issues were
> introduced. Because this:
>
>   } else {
>
> or:
>
>   }
>   else {
>
> or:
>
>   } else
>   {
>
> or:
>
>   }
>   else
>   }        # (oops!)
>
> etc. is SO much better than just:
>
>   else
>
> You must be delusional.
>

No, I am not delusional - the use of brackets is hugely better than
relying on line-endings or spacing for block structuring. (And yes, I
am fully aware that I use Python that uses indentation for structuring.)
Mistakes like the one you made there are easily diagnosed by tools -
unlike mistakes for when you don't have delimiting symbols.

However, the choice you gave was not between brackets and nothing, but
between brackets and keywords for delimiters. I find brackets
convenient and light-weight, and very easy to see and use correctly when
combined with a reasonable indentation strategy. I don't see it as a
particularly big issue - "begin"/"end", or whatever, work fine too. But
I see no advantage in them.

(I /do/ see advantage in /requiring/ block delimiters in, for example,
conditionals and loops. Making them optional is a source of errors,
regardless of how they are spelt.)


>
>> considered advanced and modern.  (Hence those monstrosities, the
>> trigraph and digraph.)  Oh, I forgot - you find it such an effort to
>> press the "shift" key on your keyboard.
>>
>>>
>>>    * Proper Read A, B, C and Print A, B, C features ...
>>
>> What a pointless and meaningless statement.  There are a hundred and one
>> different ways to do "proper" read and print, with everyone having their
>> own ideas about what is best.
>
> This is just pure jealousy. Show me the C code needed to do the
> equivalent of this (without knowing the types of a, b, c other than they
> are numeric):
>
>    print "?"
>    readln a, b, c
>    println a, b, c

In C, you don't work with variables whose types are unknown.

You are under the delusion that there is one "correct" interpretation
here. You think that /your/ ideas are the only "obvious" or "proper"
way to handle things. In reality, there are dozens of questions that
could be asked here, including:

Does there have to be a delimiter between the inputs? Does it have to
be comma, or space, or newline? Are these ignored if there are more
than one? Are numbers treated differently in the input? Would an input
of "true" be treated as a string or a boolean? Are there limits to the
sizes? How are errors in the input, such as end-of-file or ctrl-C
treated? How do you handle non-ASCII strings?

Should there be spaces between the outputs? Newlines? Should the
newline be a CR, an LF, CR+LF, or platform specific? What resolution or
format should be used for the numbers? If someone had entered "0x2c"
for one of the inputs, is that a string or a number - and if it is a
number, should it be printed in hex or in decimal?

Should the output go to the "standard out" stream, assuming that is
supported by the language and the OS? The "standard error" stream? A
printer? A debug port? A text box in a gui? Should it be determined
by a wider context in some way, such as via functions that redirect the
output of "println" statements?


No matter how you implement such things, it will not be the right choice
for some people in some cases. A language (and/or standard library) can
make a reasonable starting point that is appropriate for a variety of
uses of the language. And it can fully /document/ and /specify/ the
behaviour. That's all - that's the best that can be done.


(And note that I am /not/ saying that C is "perfect" here. C's "printf"
solution has a lot of advantages, which is why it has often been copied
in other languages, but it has a lot of disadvantages too. The same
applies to your language's print statements.)

>
> Here the language provides informal line-based i/o, as might be useful
> for interactive programs, or reading/writing files, while still allowing
> more precise control as needed.
>
>> Because you are wrong.
>
>> And you are a margin of /one/, because you believe that languages should
>> follow exactly what /you/ want in all aspects, with a total disregard
>> for anyone else.
>
> What exactly are the choices for someone in 2021 who wants to use (or is
> required to use) a language like C, but favours even one of my
> characteristics?
>

The same choices /everybody/ makes in /every/ aspect of their lives -
you find the most suitable compromise. No one programs in C because
they think it is a perfect language - they program in C because it is
the best choice for their needs at the time, weighing up the advantages
and disadvantages.

You don't go to the bakers and say "I'd like a loaf of bread just like
that one, except 30% longer". You choose between a smaller loaf than
you wanted, or buying two and having too much bread, or buying a
different loaf that is the right size but a different texture.

If you are really keen on getting exactly the loaf you want but the
bakers don't stock it, then you can learn to make bread yourself and
make your own loaves that are exactly what /you/ want. But you don't
expect them to be popular with other people.

If you think that lots of people would like loaves that are 30% longer,
then you can try and start a business making and selling them. That's
fine too - though not easy.

What you don't do, however, is go to the butcher's shop and complain to
the butcher that the baker's loaves are so terrible.


Picking a programming language is not really any different from any
other kind of choice in life.


>> Yes, you can have your opinion.  Yes, you can make your own language the
>> way /you/ want it.  Yes, you can give suggestions and ideas based on
>> these in a discussion about languages.  No, you can't tell people that
>> you alone are right, and the rest of the world is wrong, and expect to
>> be taken seriously.
>
> But YOU are allowed to say that:
>
> * Case-insensitivity is wrong
> * 1-based is wrong
> * A[i,j] is wrong
> * Anything other than {...} blocks is wrong
> * Easy read/print statements in a language are wrong
> * Line-based i/o is wrong
> * Left-to-right type syntax is wrong. (Did you say that, or decided not
> to mention that one?!)
>

Yes, I am allowed to say that (though I most certainly did /not/ say
that). But I am not allowed to expect everyone to agree with me just
because I say so. See the difference? If I want anyone to take my
opinions seriously (and I don't always expect that), I have to be able
to justify them. "Case insensitivity is clearly better because I like
it" is not a justification.

> All things that C doesn't have.

Only you are arguing about C here - only you seem to imagine people
think it is perfect. It is far and away the most successful programming
language, massively used and massively popular, so it makes a good
yardstick for comparisons and discussions. But nobody suggests it is an
ideal language (I certainly have not done so).

>
> Actually, Ada and Fortran are still around, are case-insensitive, are
> N-based, and don't use brace syntax.
>

If you take a sample of a thousand programmers, you can count on one
hand the number that have any concept of those languages beyond "Ada is
used by the US DoD" and "Fortran was used in the early days of
programming". (Usenet is not a good sample, given its demographics.)

> Lua doesn't use braces for blocks. It is also 1-based.
>
> Also 1-based are Julia, Mathematica, and Matlab ("Learn some maths"? Sure!)
>
> Julia doesn't use braces either.
>
> These are all characterics that still exist across languages, but not
> necessarily within one system languages which is where I need them.
>
>
>>
>>>
>>> Getting back to what this is about, which was your suggestion that C is
>>> so perfect, it is pointless to create something new unless it comes with
>>> a raft of advanced, heavy features, then why SHOULDN'T there be an
>>> alternative systems language with it's OWN set of characteristics?
>>>
>
>> I /could/ explain what I had written earlier.  But what would be the
>> point?  It would just repeat the same things I wrote before.  You didn't
>> read them then, why should I think you'll read them now?
>
> I'll repeat what you said:
>
> "Given the vast benefits of C in terms
> of existing implementation, code, experience and information, why would
> anyone bother with a different compiled language unless it let them do
> things that you cannot easily do in C?"
>
> You are clearly saying, don't bother creating an alternative to C unless
> it actually does something different.

Yes. Surely that is obvious? There is no point in re-inventing the
same wheel everyone else already uses - you have to bring something new
to the table. (Or you are doing this all for fun and education.) And
given how many people already use C, how many tools there are, how much
code there is, you need /serious/ advantages over it in order for anyone
to choose your language over C.

>
> I disagreed: you CAN have an alternative that, while it does the same
> things, can achieve that differently.

No one will use it. So what's the point?

It would not be impossible to design a new programming language that is
of a similar level to C but has a fair number of technical improvements.
(It is certainly possible to have lots of technical /differences/, but
being different does not make it better just because one person prefers
the change.)

But can you make one that has enough technical improvements to gain any
kind of following?

Let's say that I agree that your language's "println" system is the
bee's knees, that I have always found writing "int * p" confusing, and
that I'd be much happier if I was able to write my identifiers in small
letters when I am in a good mood and in capitals when I am feeling
angry. Would that persuade me to throw away my existing compilers,
debuggers, editors and change to your language? Should I change the
tiny, cheap microcontrollers we use to embedded Windows systems as that
is the only target you support? For C, I have the standards documents
and reference sites, and compilers and libraries that follow these
specifications, and an endless supply of knowledgeable users for help,
advice, or hire - for your language, we have one guy off the internet
who regularly fails to answer simple questions about the language he
wrote without trying it to see the result.


So, again, what is the point of a language that is roughly like C but
with a few technical improvements and perhaps a nicer syntax (in some
people's opinion) ?

There is plenty of scope for making a good new programming language, but
if it is going to be used, it needs to let people do what they are
already doing, things they can do by moving to other established
languages, /and/ something new.

That means it doesn't just have to be a massive technical improvement
over C. It also has to beat C++, Ada, Rust, D, Go, C#, OCaml, and even
oldies like Forth and FORTRAN and "weird" choices like Haskell or Eiffel.

>
> I listed some things out of many dozens. You of course with disagree
> with every one of them.
>

Again, you merely demonstrate your clouded prejudice that hinders you
from reading anything people write.

> Doesn't matter what it is:
>
>  C does X            C's way is perfect
>  Bart does Y         Bart is WRONG, and in a minority of one
>
> Even when I show that other languages, old or new, also do Y. Or when I
> give an example of Y clearly being better than X.
>
> My dislike of C is rational. Your loyalty to it, and hatred of anyone
> who dares to badmouth it, is irrational.

If someone thinks that C is perfect, or that your language was always
wrong, or was blindly loyal to C, then I agree that would be irrational
(or at the very least, ignorant). But I have expressed none of these
things. I most certainly have not expressed hatred of you or anyone
else. (I have accused you of hating /C/, not of hating any person.)

Your problem here is that you cannot appreciate that someone can explain
how C works, or why it is the way it is, or how to use it. You cannot
grasp that people can find C useful, practical and enjoyable without
treating them as though they view C as "perfect" and the paradigm of
programming languages.

I use C a lot - I know the language well and find it very useful for the
programming tasks I have. I'll drop it in a heartbeat when I have a
better alternative. (Indeed I have done so - when it is practical for a
project, taking into account a wide range of factors, I use C++. And
for PC programming I almost never use C.)

These threads would be a lot more pleasant if you could wrap your head
around that.

Oh, and you should get over your delusion that your dislike of C is
rational. Your dislike of /some/ aspects of C are rational (again, no
one who has used C significantly likes it all - and the same applies to
all programming languages). Some is purely a matter of taste (and
that's fine). Much of it, however, is due to your wilful and stubborn
insistence on making life hard for yourself. Such martyrdom is not
becoming.

>> I /have/ been discussing ideas and suggestions here.  Some have been of
>> interest to James, others not - which is fine.  And I write comments
>> aimed at making him (or anyone else interested in languages) think about
>> how the language might be used - because that's what's important.  I am
>> not the one who thinks every thread is an opportunity for a new anti-C
>> rant.
>
> No. Your message is 'Just don't bother trying to rewrite C', presumably
> because it is perfect.
>

You presume incorrectly.

I have written a great many things in James' threads - few of which were
about C. (And most of those were of the form "C does it this way - you
might want to do it differently".)

But this particular message was "Don't bother trying to rewrite C" -
because C is already here. If you want to design a language, make a new
one.

> C is still VERY widely used, but you don't believe in an alternative.
> You want people to continue driving a Model T, unless the new car can
> also fly!

I hope you were not suggesting that /your/ language is somehow more
modern than C! But perhaps you just wanted to end on a joke.

James Harris

unread,
Nov 29, 2021, 4:55:20 PM11/29/21
to
On 28/11/2021 17:18, David Brown wrote:
> On 28/11/2021 15:54, James Harris wrote:
>> On 28/11/2021 13:46, David Brown wrote:
>>> On 28/11/2021 10:33, James Harris wrote:
>>>> On 27/11/2021 15:17, David Brown wrote:

...

>> Further, say you had your alignment in the way you prefer, i.e. as a
>> number of bytes such as 8. What would you write if you wanted to apply a
>> commensurate shift? To get from 8 to 3 you'd need some sort of
>> log-base-2 function of the type I showed earlier. Which would you want a
>> language to provide?
>>
>
> Again, I want /you/ to think about what /your/ users will actually need
> and use. When would they need this? Is it really something people will
> need? I believe you are trying to optimise for non-existent use-cases,
> instead of realistic ones. If you believe otherwise, please say so -
> perhaps with examples or justification. (It's your language, you don't
> /have/ to justify your choice of features, but it makes it easier to
> give helpful suggestions.)

Maybe we are talking about slightly different things but the question
above wasn't asking for a suggestion but was an attempt to point out
that your position (AIUI) wasn't pragmatic. I think you are focussing
only on alignment but I am saying that whatever the use case (alignment
or otherwise) it is more flexible if the defining value for a power of
two is a bit offset rather than an integer. That's because the former
can always be converted to the latter but not vice versa - unless you
are going to invent some special log2 functions. My point was that I'd
recommend you don't do that but define all powers of two via bit
offsets, instead.

I work that way all the time. A master file will have lines such as

constant blocksizebits = 9

then, later, possibly somewhere separate, there may be the definition of
a /derived/ constant,

constant blocksize = 1 << blocksizebits

Then I only have to change one constant to adjust the system.

...

> And of course there is nothing to stop you doing the equivalent of
>
> #define struct_align_needed 3
> alignas(1 < struct_align_needed) struct S s;
>
> or whatever. In other words, if you really need to go from 3 to 8, then
> you can happily do that even if your "align" method takes an 8 rather
> than a 3.

That illustrates my point: it's better to make the bit offset the master
piece of info.

...

>>>> When specifying constants it's easier to begin with and convert from the
>>>> number of bits. Consider the opposite. Given
>>>>
>>>>    constant ALIGN_BYTES = 8
>>>>
>>>> there are these two ways one might convert that to alignment bits.
>>>>
>>>>    constant ALIGN_BITS = Log2RoundedUp(ALIGN_BYTES)
>>>>    constant ALIGN_BITS = Log2ButErrorIfNotPowerOfTwo(ALIGN_BYTES)

...

> Programmers don't need these - it's not something they have to do. And
> if they /do/, then they can do so with :
>
> constant ALIGN_BITS = 3
> constant ALIGN_BYTES = 1 << ALIGN_BITS
>
> or maybe:
>
> constant ALIGN_BYTES = 8
> constant ALIGN_BITS = 3
> static_assert(ALIGN_BYTES == 1 << ALIGN_BITS,
> "Failed alignment sanity check")
>
> You are inventing non-existent problems here.

I see only a different opinion. I don't see anyone inventing a problem.


--
James Harris

James Harris

unread,
Nov 29, 2021, 5:15:12 PM11/29/21
to
On 29/11/2021 11:40, Dmitry A. Kazakov wrote:
> On 2021-11-29 12:09, David Brown wrote:
>
>> In programming,
>> you get people making a living as programmers despite being completely
>> incompetent.
>
> Reminds me of politicians, pop musicians, journalists, economists,
> environmentalists... (put quotation marks as appropriate)
>
>> And even amongst people who do a reasonable job, you can
>> get an order of magnitude difference in productivity.
>
> That is the 80/20 law.
>
> But I agree with you, incompetence is strong with programmers...

That's no good. We cannot have agreement on Usenet. ;-) So let me
suggest that both of you have gone off the point (fine and permissible
but a deviation nonetheless).

What we were talking about was David espousing a language feature which
was "not for the average programmer" and saying (AIUI) that it was fine
to have average and expert programmers use different features. I
disagree with that premise.

I'd suggest to you that it's OK to have average and expert programmers
but that that should relate to the quality of their output and how
quickly it is produced. Different programmers should not, however, use
different parts of the same language. Instead, a language should
(ideally) be simple enough that both average and expert programmers can
work with the same code.

This is, again, about a language being a medium in which a programmer
communicates. That communication can be with other programmers, not just
with a compiler. (A lofty goal and perhaps unachievable but a very
important goal to keep in mind, IMO.)


--
James Harris

Bart

unread,
Nov 29, 2021, 5:40:45 PM11/29/21
to
On 29/11/2021 15:19, David Brown wrote:
> On 29/11/2021 14:06, Bart wrote:

>> A /choice/. That doesn't make it right and the others wrong.

> Case insensitive doesn't work when you go beyond the UK/US alphabet.
> The complications for various languages are immense. In German, the
> letter ß traditionally capitalises as SS - one letter turns into two.
> In Turkish, "i" and "I" are two completely different letters, with their
> opposite cases being "İ" and "ı". It quickly becomes ridiculous when
> you need to support multiple languages. On the other hand,
> case-sensitive naming is usually just done as binary comparison.
>
> So unless you think that everyone should be forced to write a limited
> form of UK or US English and that ASCII is good enough for everyone,
> case-sensitive is the only sane choice for file systems.

URLs are case-insensitive for the first part. So are email addresses and
usernames. And usually, people's names when stored in a computer system.
And addresses and postcodes. And movie and book titles. Etc.

Those I guess are immune to the problems of Unicode.

I feel that file names, which could be used to represent all those
examples, and the commands of CLIs, should be the same.


>> If you have trouble thinking up distinct identifiers in examples like this:
>>
>>    Abc abc = ABC;
>>
>> then /you're/ in the wrong job!
>>
>
> That's a strawman, and you know it.

I see it all the time in C. Example from raylib.h:

typedef struct CharInfo {
int value; // Character value (Unicode)
int offsetX; // Character offset X when drawing
int offsetY; // Character offset Y when drawing
int advanceX; // Character advance position X
Image image; // Character image data
} CharInfo;

'Image image'; just try saying it out loud!

> But if you have just one starting point, 0 is the sensible one. You
> might not like the way C handles arrays (and I'm not going to argue
> about it - it certainly has its cons as well as its pros), but even you
> would have to agree that defining "A[i]" to be the element at "address
> of A + i * the size of the elements" is neater and clearer than
> one-based indexing.

That's a crude way of defining arrays. A[i] is simply the i'th element
of N slots, you don't need to bring offsets into it.

With 0-based, there's a disconnect between the ordinal number of the
element you want, and the index that needs to be used. So A[2] for the
3rd element.

>>    print "?"
>>    readln a, b, c
>>    println a, b, c
>
> In C, you don't work with variables whose types are unknown.

You may know the types, but they shouldn't affect how you write Read and
Print. In C it does, and needs extra maintenance.

> You are under the delusion that there is one "correct" interpretation
> here. You think that /your/ ideas are the only "obvious" or "proper"
> way to handle things. In reality, there are dozens of questions that
> could be asked here, including:

Print is one of the most diverse features among languages. Your
objections would apply to every language. Many have Print similar to
mine, for example there might be 'println'; so what newline should be
used (one of your questions)?

What I'm showing is a sensible set of defaults.

> Does there have to be a delimiter between the inputs? Does it have to
> be comma, or space, or newline?

Think about it: this is for user input, it needs to fairly forgiving.
Using character-based input as C prefers is another problem when trying
to do interactive input. Programs can appear to hang as they silently
wait for that missing number on the line, while extra numbers screw up
the following line.


Are these ignored if there are more
> than one? Are numbers treated differently in the input? Would an input
> of "true" be treated as a string or a boolean? Are there limits to the
> sizes? How are errors in the input, such as end-of-file or ctrl-C
> treated? How do you handle non-ASCII strings?

Yeah, carry on listing so many objections, that in the end the language
provides nothing. And requires a million programmers to each reinvent
line-based i/o from first principles.

You are just making excuses why your favourite languages don't provide
such features.

> Should there be spaces between the outputs? Newlines? Should the
> newline be a CR, an LF, CR+LF, or platform specific? What resolution or
> format should be used for the numbers? If someone had entered "0x2c"
> for one of the inputs, is that a string or a number - and if it is a
> number, should it be printed in hex or in decimal?

Usually such input is not language source code, it might be something
like a config file or maybe a log or transaction file.

If your requirements demand a full-blown language tokeniser, then you're
doing it wrong; you don't parse source code using a Read statement!

> Should the output go to the "standard out" stream, assuming that is
> supported by the language and the OS?

It goes to the console unless the program specifies otherwise; that's
pretty standard.

> The "standard error" stream?

stderror is an invention of C (or Unix, one of those) and is actually
quite difficult to make use of outside that language.

>> All things that C doesn't have.
>
> Only you are arguing about C here - only you seem to imagine people
> think it is perfect. It is far and away the most successful programming
> language, massively used and massively popular,

Great. That means there is still a place for a systems language at this
crude, lower level.

It also means there is room for alternatives. Even if it means the
alternative is something that is implemented on top of C because all the
tools are in place.

>> You are clearly saying, don't bother creating an alternative to C unless
>> it actually does something different.
>
> Yes. Surely that is obvious? There is no point in re-inventing the
> same wheel everyone else already uses - you have to bring something new
> to the table.

I invented /my/ first wheel because I didn't have any!

Then I found that my wheels was smaller, simpler, faster and generally a
better fit than other solutions.

>> I disagreed: you CAN have an alternative that, while it does the same
>> things, can achieve that differently.
>
> No one will use it. So what's the point?

/I/ will use it. And I will get a kick out of using it. After all not
many get to use their own languages for 100% of their work.

> Should I change the
> tiny, cheap microcontrollers we use to embedded Windows systems as that
> is the only target you support?

Mine aren't general purpose in the sense that they are for my own use,
and they target whatever hardware I happen to be using, which currently
is Win64. So I'm not here to flog my language. Only discussing portable
ideas.

However previous versions have targetted:

CPU Size OS

PDP10 36-bit TOPS10(?) (Not my lang, but first time self-hosted)
Z80 8-bit None
Z80 OS/M (CP/M ripoff)
Z80 (PCW)
8088/86 16-bit MSDOS (plus None for some projects)
80386 32-bit MSDOS/Windows
x64 64-bit Windows (current)
(C32) 32-bit Windows/Linux, x86/ARM32 (Versions with C targets)
(C64) 64-bit Windows/Linux, x64/ARM64

If I wanted, I could adapt my language to a small device, but it would
have to work as a cross-compiler, and I'd need a minimum spec.

BTW, would any of C#, Java, D, Rust, Zig, Odin, Go (or Algol68) work on
those microcontrollers of yours?


> For C, I have the standards documents
> and reference sites, and compilers and libraries that follow these
> specifications, and an endless supply of knowledgeable users for help,
> advice, or hire - for your language, we have one guy off the internet
> who regularly fails to answer simple questions about the language he
> wrote without trying it to see the result.

Yeah, and on Reddit, there's an endless stream of the same questions
about C due to all its quirks!

And for you, I bet I could find a choice macro whose output you probably
wouldn't be able to guess without trying it out.

>
> So, again, what is the point of a language that is roughly like C but
> with a few technical improvements and perhaps a nicer syntax (in some
> people's opinion) ?

Well, the language exists. It is a joy to use. It is easy to navigate.
It is easy to type. It has modules. You don't need declarations, just
definitions. It has out-of-order everything. It fixes 100 annoyances of
C. It provides a whole-program compiler. It builds programs very
quickly. It has a self-contained one-file implementation. It has a
companion scripting language in the same syntax and with higher level types.

So, I should just delete a language I've used for 40 years and just code
in C with all those brackets, braces and semicolons like every other
palooka who needs to use an off-the-shelf language?

I should sell my familiar, comfortable car and drive that Model T? It
would be more like getting the bus; I'd rather walk!

> I hope you were not suggesting that /your/ language is somehow more
> modern than C!
Actually it is. Why, in what way is C (the language, not all those shiny
new IDEs), more modern than the language I have?


Dmitry A. Kazakov

unread,
Nov 30, 2021, 2:46:30 AM11/30/21
to
On 2021-11-29 23:15, James Harris wrote:
> On 29/11/2021 11:40, Dmitry A. Kazakov wrote:
>> On 2021-11-29 12:09, David Brown wrote:
>>
>>> In programming,
>>> you get people making a living as programmers despite being completely
>>> incompetent.
>>
>> Reminds me of politicians, pop musicians, journalists, economists,
>> environmentalists... (put quotation marks as appropriate)
>>
>>> And even amongst people who do a reasonable job, you can
>>> get an order of magnitude difference in productivity.
>>
>> That is the 80/20 law.
>>
>> But I agree with you, incompetence is strong with programmers...
>
> That's no good. We cannot have agreement on Usenet. ;-) So let me
> suggest that both of you have gone off the point (fine and permissible
> but a deviation nonetheless).
>
> What we were talking about was David espousing a language feature which
> was "not for the average programmer" and saying (AIUI) that it was fine
> to have average and expert programmers use different features. I
> disagree with that premise.

It is OK even for experts to use different language features. It depends
on the task. Furthermore there are SW development roles like SW
architect etc requiring higher qualification and the corresponding
language parts for these.

> Instead, a language should
> (ideally) be simple enough that both average and expert programmers can
> work with the same code.

If the underlying concepts are inherently complex the language cannot
simplify them enough.

> This is, again, about a language being a medium in which a programmer
> communicates. That communication can be with other programmers, not just
> with a compiler. (A lofty goal and perhaps unachievable but a very
> important goal to keep in mind, IMO.)

Nobody ever objected that.

James Harris

unread,
Nov 30, 2021, 3:14:35 AM11/30/21
to
On 30/11/2021 07:46, Dmitry A. Kazakov wrote:
> On 2021-11-29 23:15, James Harris wrote:

...

>> What we were talking about was David espousing a language feature
>> which was "not for the average programmer" and saying (AIUI) that it
>> was fine to have average and expert programmers use different
>> features. I disagree with that premise.
>
> It is OK even for experts to use different language features. It depends
> on the task. Furthermore there are SW development roles like SW
> architect etc requiring higher qualification and the corresponding
> language parts for these.

AISI that's only true if the language is complex enough to have parts
which are not needed in normal programming. Are those parts really needed?

>
>> Instead, a language should (ideally) be simple enough that both
>> average and expert programmers can work with the same code.
>
> If the underlying concepts are inherently complex the language cannot
> simplify them enough.

Interesting comment. Which concepts cannot be simplified?


--
James Harris

Dmitry A. Kazakov

unread,
Nov 30, 2021, 4:22:26 AM11/30/21
to
On 2021-11-30 09:14, James Harris wrote:
> On 30/11/2021 07:46, Dmitry A. Kazakov wrote:
>> On 2021-11-29 23:15, James Harris wrote:
>
> ...
>
>>> What we were talking about was David espousing a language feature
>>> which was "not for the average programmer" and saying (AIUI) that it
>>> was fine to have average and expert programmers use different
>>> features. I disagree with that premise.
>>
>> It is OK even for experts to use different language features. It
>> depends on the task. Furthermore there are SW development roles like
>> SW architect etc requiring higher qualification and the corresponding
>> language parts for these.
>
> AISI that's only true if the language is complex enough to have parts
> which are not needed in normal programming. Are those parts really needed?

Normal programming for you, abnormal for others.

>>> Instead, a language should (ideally) be simple enough that both
>>> average and expert programmers can work with the same code.
>>
>> If the underlying concepts are inherently complex the language cannot
>> simplify them enough.
>
> Interesting comment. Which concepts cannot be simplified?

- Concurrency, synchronization, tasking, active objects, protected
objects, rendezvous, barriers, scheduling

- Memory management, pools, collectors

- Generic programming, classes, polymorphism

- Interfacing with other languages and the OS

- Representation control, memory layout, alignment, packing,
volatile/atomic access and operations

David Brown

unread,
Nov 30, 2021, 8:58:59 AM11/30/21
to
On 29/11/2021 23:40, Bart wrote:
> On 29/11/2021 15:19, David Brown wrote:
>> On 29/11/2021 14:06, Bart wrote:
>
>>> A /choice/. That doesn't make it right and the others wrong.
>
>> Case insensitive doesn't work when you go beyond the UK/US alphabet.
>> The complications for various languages are immense.  In German, the
>> letter ß traditionally capitalises as SS - one letter turns into two.
>> In Turkish, "i" and "I" are two completely different letters, with their
>> opposite cases being "İ" and "ı".  It quickly becomes ridiculous when
>> you need to support multiple languages.  On the other hand,
>> case-sensitive naming is usually just done as binary comparison.
>>
>> So unless you think that everyone should be forced to write a limited
>> form of UK or US English and that ASCII is good enough for everyone,
>> case-sensitive is the only sane choice for file systems.
>
> URLs are case-insensitive for the first part. So are email addresses and
> usernames. And usually, people's names when stored in a computer system.
> And addresses and postcodes. And movie and book titles. Etc.

You are basing this all on your limited experience of English language
usage. Usernames in English language Windows logins are
case-insensitive - that does not apply to all kinds of usernames, all
kinds of systems, all kinds of languages. The first part of email
addresses were originally ASCII only, and could be case-insensitive.
Now some email servers can support different characters with UTF-8
(encoded in some way over ASCII, I believe) - these could be
case-sensitive or case-insensitive. Most English-language names, words,
titles, etc., are case-insensitive, but not all.

You are generalising too much here. I fully agree that on many things
in daily life, we don't care much about letter case when distinguishing
things - and we very rarely choose to have two things that are
distinguished only by capitalisation. (Primarily because it is hard to
hear the difference.) And some computer-related things are also
case-insensitive, and sometimes that is convenient.

But lots of things are case-sensitive, even in normal usage. "bart",
"ADA" and "Fortran" are all misspelt. If I refer to you as "BART",
you'd feel differently than if I use "Bart".

And in the computer world, it is just vastly easier to have
case-sensitive (and in general to distinguish based on the underlying
bytes - so that " " and " " are considered different). If you try to
be case-insensitive, you are going to get things wrong sooner or later,
especially if you go outside English.

>
> Those I guess are immune to the problems of Unicode.

Unicode has nothing to do with it. Non-Latin alphabets are an important
factor here, but they are not the only thing, and the encoding of the
characters is irrelevant.

For other joys to consider outside of Latin - some languages have
multiple different writing systems or alphabets. Some languages have
different versions of letters depending on their position in words as
well as their "capitalisation". The very concept of "letter case" is
based entirely around the way we write Latin alphabet characters from
the middle ages onwards (the etymology of the term is from Gutenberg
printing presses).

>
> I feel that file names, which could be used to represent all those
> examples, and the commands of CLIs, should be the same.
>

You are a product of your environment (this is not a criticism, it is
human nature). You whole experience, especially in anything
computer-related, has been in ASCII. A large proportion of it has been
from a time when computers couldn't do anything else, and all the rest
has been using English-language systems with keyboards that are simple
ASCII only. You've never been able to type "naïve" or "café" with the
correct English-language spelling, without jumping through hoops with a
"character map" program. You've (almost) never used a command line
terminal that can work with non-ASCII characters.

This means that for you, working with case-insensitive strings is easy.
It's a bit of extra effort in the code, but not much. And it doesn't
cause confusion or annoyance. In the wider world, however, it is a very
different matter - it is either language-specific and potentially
complicated, or if it is supposed to support multiple languages it is
/extremely/ complicated. And that means it is usually wrong.

So having established that case-insensitivity is the only sensible
choice for a lot of computer-related uses, given that it is the only
choice that gets things right, what are the disadvantages? What are the
advantages of being case /insensitive/ ?

Sometimes it is nicer to see sorted lists as case-insensitive. But
that's a whole new can of worms - sorting is again highly
language-specific, and is another topic on its own. Even regardless of
case, sorting is complicated. A good example here is the letter "Å" in
Norwegian. It is regarded as a separate letter, and comes at the end of
the alphabet. But sometimes it is transliterated to "AA". So in an
alphabetic list of names, "Aaron" might come first, while "Aase" is
sorted beside "Åse" at the end of the list. Even in English, in a list
of names "MacDonald" is sorted beside "McDonald".


But for commands, file names, program identifiers? Why would you want
them to be case insensitive? I mean, I agree that you don't want
commands "Copy", "copy" and "COPY" that all do different things. But
given a command "copy", why would you ever want to type "COPY" ? Given
a variable "noOfWhatsits", what is the benefit of letting "NOofwhatSitS"
mean the same? A language tool could easily have a warning about having
two identifiers that differ only in their letter case - that's no reason
to want case-insensitive identifiers.

>
>>> If you have trouble thinking up distinct identifiers in examples like
>>> this:
>>>
>>>     Abc abc = ABC;
>>>
>>> then /you're/ in the wrong job!
>>>
>>
>> That's a strawman, and you know it.
>
> I see it all the time in C. Example from raylib.h:
>
>   typedef struct CharInfo {
>     int value;              // Character value (Unicode)
>     int offsetX;            // Character offset X when drawing
>     int offsetY;            // Character offset Y when drawing
>     int advanceX;           // Character advance position X
>     Image image;            // Character image data
>   } CharInfo;
>
> 'Image image'; just try saying it out loud!
>

If a language (or project) uses a convention that types start with a
capital and variables start with a small letter, then this is perfectly
clear.

>> But if you have just one starting point, 0 is the sensible one.  You
>> might not like the way C handles arrays (and I'm not going to argue
>> about it - it certainly has its cons as well as its pros), but even you
>> would have to agree that defining "A[i]" to be the element at "address
>> of A + i * the size of the elements" is neater and clearer than
>> one-based indexing.
>
> That's a crude way of defining arrays. A[i] is simply the i'th element
> of N slots, you don't need to bring offsets into it.
>

Zero-based indexing is simple, clear, consistent and easy to build on
for something more advanced (either in the language, or in user code).
It is certainly low-level, but that's where you want to start.

> With 0-based, there's a disconnect between the ordinal number of the
> element you want, and the index that needs to be used. So A[2] for the
> 3rd element.

That can seem a little odd at first, depending on where you are starting
- if you are used to lower level work then 0 is the obvious and natural
starting point. (Bit 0 is the least significant bit in almost all
bit-level work, except for PowerPC and related architectures.) People
quickly get used to it.

Some of the things you complain about in C are issues that seem to
bother a number of C programmers - some of them even bug me! But I
don't feel zero-based arrays are one of them - it really is not a
problem for people, and it makes life simpler when you want to do
something less common. (In contrast, the decay of array expressions to
pointer expressions is something that is often surprising to beginners
of C.)

>
>>>     print "?"
>>>     readln a, b, c
>>>     println a, b, c
>>
>> In C, you don't work with variables whose types are unknown.
>
> You may know the types, but they shouldn't affect how you write Read and
> Print. In C it does, and needs extra maintenance.
>

C is not a language with overloads, OOP or generics (beyond the limited
_Generic expression). You /always/ need to track your types, and you
/always/ need to write things in different ways for different types.

>> You are under the delusion that there is one "correct" interpretation
>> here.  You think that /your/ ideas are the only "obvious" or "proper"
>> way to handle things.  In reality, there are dozens of questions that
>> could be asked here, including:
>
> Print is one of the most diverse features among languages. Your
> objections would apply to every language. Many have Print similar to
> mine, for example there might be 'println'; so what newline should be
> used (one of your questions)?
>
> What I'm showing is a sensible set of defaults.

No, you are not. What you are showing is what /you/ think are useful
defaults for /your/ use in /your/ language and /your/ programs. That's
different.

>
>> Does there have to be a delimiter between the inputs?  Does it have to
>> be comma, or space, or newline?
>
> Think about it: this is for user input, it needs to fairly forgiving.
> Using character-based input as C prefers is another problem when trying
> to do interactive input. Programs can appear to hang as they silently
> wait for that missing number on the line, while extra numbers screw up
> the following line.
>
>
>   Are these ignored if there are more
>> than one?  Are numbers treated differently in the input?  Would an input
>> of "true" be treated as a string or a boolean?  Are there limits to the
>> sizes?  How are errors in the input, such as end-of-file or ctrl-C
>> treated?  How do you handle non-ASCII strings?
>
> Yeah, carry on listing so many objections, that in the end the language
> provides nothing. And requires a million programmers to each reinvent
> line-based i/o from first principles.
>
> You are just making excuses why your favourite languages don't provide
> such features.

What language would that be? And why would you think I'd be making
excuses? Are you are so utterly obsessed with hating C that you think
the world is split into those like you that hate it, and those that
think it is perfect and use nothing else? Your continued
misunderstanding and misrepresentation of me is getting quite tedious.

We all know that /you/ have a favourite language (or two) that you think
is perfect - you wrote the bloody thing, for your own use according to
your own needs and preferences. Like any other serious programmer, I
use different languages at different times according to a range of
requirements. And like any other programmer, I know the languages I use
have their strengths and weaknesses, as well as things that I personally
like or dislike (without expecting everyone else to agree on those points).

I don't write "What is your name? Hello <name>" programs - I haven't
done that since I was ten. But if I did, I wouldn't write it in C - as
C is a terrible language for handling general input. I guess a rough
equivalent to your program, written in my favourite language for such
tasks, might be :

a, b, c = input("? ").split()
print(a, b, c)

But usually when I have a program that takes input, it's a bit more
sophisticated and has string handling (and thus less likely to be in C).
There certainly is scope for low-level systems languages. Yours is not
going to win many hearts - nor is any alternative whose only claim to
fame is that it matches one person's personal preferences. If someone
wants to make a low-level systems language that people will want to use,
it has to do things they can't do in existing languages. It really is
not that difficult to understand.
D will work on many, and perhaps Rust or Go - I haven't checked. C++,
Ada and Forth are definitely fine. Micropython and Lua can work on
bigger microcontrollers. But the point is simply that your language is
not a contender - not what other languages are contenders.

>
>> For C, I have the standards documents
>> and reference sites, and compilers and libraries that follow these
>> specifications, and an endless supply of knowledgeable users for help,
>> advice, or hire - for your language, we have one guy off the internet
>> who regularly fails to answer simple questions about the language he
>> wrote without trying it to see the result.
>
> Yeah, and on Reddit, there's an endless stream of the same questions
> about C due to all its quirks!
>

And there is a lack of questions about /your/ language. That must mean
your language is obvious, natural and fault-free. Or perhaps there is
something else going on here?

> And for you, I bet I could find a choice macro whose output you probably
> wouldn't be able to guess without trying it out.
>

So what? If I knew the syntax of your language, it would probably take
only a few minutes to write code that you couldn't figure out. Writing
incomprehensible code is not difficult. Writing /comprehensible/ code
is usually not that difficult either - most people who write
incomprehensible C macros would write illegible code no matter what the
language. (This is a different thing from writing code that uses
advanced features of a language that are hard for newcomers to understand.)

>>
>> So, again, what is the point of a language that is roughly like C but
>> with a few technical improvements and perhaps a nicer syntax (in some
>> people's opinion) ?
>
> Well, the language exists. It is a joy to use. It is easy to navigate.
> It is easy to type. It has modules. You don't need declarations, just
> definitions. It has out-of-order everything. It fixes 100 annoyances of
> C. It provides a whole-program compiler. It builds programs very
> quickly. It has a self-contained one-file implementation. It has a
> companion scripting language in the same syntax and with higher level
> types.
>
> So, I should just delete a language I've used for 40 years and just code
> in C with all those brackets, braces and semicolons like every other
> palooka who needs to use an off-the-shelf language?
>
> I should sell my familiar, comfortable car and drive that Model T? It
> would be more like getting the bus; I'd rather walk!
>
>> I hope you were not suggesting that /your/ language is somehow more
>> modern than C!
> Actually it is. Why, in what way is C (the language, not all those shiny
> new IDEs), more modern than the language I have?
>
>

A language is its ecosystem - specifications, references,
implementations, tools, code, users, knowledge. C has its history
stretching way back, but the way modern C is written is not the way it
was written long ago. But as someone who insists on throwing out much
of the language - the bits designed to make it easier to write clear,
flexible and maintainable code - you might have missed that.

Bart

unread,
Nov 30, 2021, 10:22:42 AM11/30/21
to
On 30/11/2021 13:58, David Brown wrote:
> On 29/11/2021 23:40, Bart wrote:

> You are a product of your environment (this is not a criticism, it is
> human nature). You whole experience, especially in anything
> computer-related, has been in ASCII.

That's not true. My commercial apps worked in Dutch, German and French
as well as English. Some users worked also from a digitizer [2D input
device] and I had to devise a keyboard layout that could cope with the
needs of those languages.

(My first product that was sold, did quite well in Norway in the
mid-80s. It wasn't internationalised then, but Norwegians seemed to
manage with English.)

> A large proportion of it has been
> from a time when computers couldn't do anything else, and all the rest
> has been using English-language systems with keyboards that are simple
> ASCII only. You've never been able to type "naïve" or "café" with the
> correct English-language spelling, without jumping through hoops with a
> "character map" program. You've (almost) never used a command line
> terminal that can work with non-ASCII characters.

See above. Why do you make these silly assumptions? I happen to live in
the UK, largely read and write English, and write software for my own
use, so have little need for internationalisation, which is also now
much harder with Unicode compared with dedicated 8-bit character sets,
so I don't bother about it.

> This means that for you, working with case-insensitive strings is easy.
> It's a bit of extra effort in the code, but not much. And it doesn't
> cause confusion or annoyance. In the wider world, however, it is a very
> different matter - it is either language-specific and potentially
> complicated, or if it is supposed to support multiple languages it is
> /extremely/ complicated. And that means it is usually wrong.

Yet as you explained elsewhere, case-insensitivy IS used in many
situations. Eg. in Google searches, which would make life difficult
otherwise.

For the billion or two of us who use the Roman alphabet (and some others
like Greek) there presumably are ways to normalise case. If so, why
can't I employ that in a programming language where keywords and
identifiers are anyway limited to A-Z and a-z?

Instead people try to argue that I mustn't have a conversion between A-Z
and a-z because one letter in the Turkish alphabet doesn't have an
equivalent in the opposite case!

> But for commands, file names, program identifiers? Why would you want
> them to be case insensitive? I mean, I agree that you don't want
> commands "Copy", "copy" and "COPY" that all do different things. But
> given a command "copy", why would you ever want to type "COPY"?

Why? Because you may have forgotten the caps lock on!

In Unix, if you want to do 'cp ABC def', isn't a nuisance having to keep
switching caps? And suppose you forget the second switch and type:

cp ABC DEF

but don't notice. Now you try and do something with file def, and it
doesn't work; what the hell happened to that copy!

It's ridiculous. Windows retains the case of your filenames, but will
match regardless of case. So you can still use 'def', but you might
still have to rename if it's that important that it's DEF.

(I've spent a year or two doing telephone support for non-technical users.

This involved walking them through typing things on their machine, which
of course I couldn't see. Imagine if what they typed had to be exactly
the right case. Or they'd created a file as ABC and were trying to use
it as abc, but I wouldn't know that.)


> Given
> a variable "noOfWhatsits", what is the benefit of letting "NOofwhatSitS"
> mean the same?

You've got the wrong end of the stick. The purpose of case-insensitivity
isn't so you use a different version of noofwhatsits at each instance (I
think there are 4096 variants); it's so that IT DOESN'T MATTER.

Was it this camelCase or that CamelCase or CamelCase or Camelcase? It
doesn't matter; choose your own preferences and stick with.

Mine is to use allow-lower-case, with ALL-CAPS used for temporary debug
code.

For example I may import GetStdHandle, but I will use getstdhandle
without needing to remember the exact capitalisation.

While other people who've used my languages liked to capitalise the
first letter of keywords, or apply camelcase to my flat function names.

> If a language (or project) uses a convention that types start with a
> capital and variables start with a small letter, then this is perfectly
> clear.

Sure. Except when the name clashes.

> C is not a language with overloads, OOP or generics (beyond the limited
> _Generic expression). You /always/ need to track your types, and you
> /always/ need to write things in different ways for different types.

Not true. You can write a=b, a+b, a==b without needing to know exact
types, other than they are valid for those ops.

I just extend it to tostr(a) (used by Print).

> I don't write "What is your name? Hello <name>" programs - I haven't
> done that since I was ten.

You don't write programs that read or write text files?


> But if I did, I wouldn't write it in C - as
> C is a terrible language for handling general input.

OK, you agree with me then! My example works fine on my static language.

> I guess a rough
> equivalent to your program, written in my favourite language for such
> tasks, might be :
>
> a, b, c = input("? ").split()
> print(a, b, c)

Not a bad attempt, but it's not great. If I try this version:

a, b, c, d = input("? ").split()

for x in (a,b,c,d):
print(x, type(x))

Then for input of '123 45.67 abc "def"` it displays

123 <class 'str'>
45.67 <class 'str'>
abc <class 'str'>
"def" <class 'str'>

They're all strings! And the last still has its quotes.

If I do '123,45.67,abc,"def"', it goes wrong.

If I do '123, 45.67, abc, "def"', then the first 3 items retain that
trailing comma!

If I do "def ghi" instead of "def" it goes wrong (it reads a 5th item
'ghi"'.

If I do just '123' it goes wrong; it must be exactly 4 items.

So it's fragile. It needs a lot of work to make robust.

The equivalent program in my dynamic language is this:

print "?"
readln a,b,c,d

for x in (a,b,c,d) do
fprintln "# <type #>", x, x.type
end

Then input of '123 45.67 abc "def"' shows:

123 <type int>
45.670000 <type real>
abc <type string>
def <type string>

The numbers are actual numbers! "def" has lost its quote.

If I do '123,45.67,abc,"def"' it still works.

If I do '123, 45.67, abc, "def"' it still works.

If I do "def ghi" for the last, it shows:

def ghi <type string>

(So I can have embedded spaces, commas etc in one item)

If I do only '123', it shows:

123 <type int>
<type string>
<type string>
<type string>

Missing items are read as "", unless I tell it what type:

readln a:"i", b:"i", c:"i", d:"i"

Then '123' returns:

123 <type int>
0 <type int>
0 <type int>
0 <type int>

Extra items are ignored; on Python it would go wrong.

But, let me guess, this cuts no ice at all.

All I can say is, I find it jolly useful, and I have used this kind of
thing for years.

If anything doesn't have it or doesn't it, then it's their loss.



Bart

unread,
Nov 30, 2021, 3:36:55 PM11/30/21
to
On 30/11/2021 13:58, David Brown wrote:
> On 29/11/2021 23:40, Bart wrote:

> But for commands, file names, program identifiers? Why would you want
> them to be case insensitive? I mean, I agree that you don't want
> commands "Copy", "copy" and "COPY" that all do different things. But
> given a command "copy", why would you ever want to type "COPY" ? Given
> a variable "noofwhatsits", what is the benefit of letting "noofwhatsits"
> mean the same?

I've normalised both of your 'noofwhatsits' to have the same
capitalisation, ie. none.

Can you remember what the original was?

No? That's the probem.

The thing is, I can remember words and phrases, but I easily forget
capitalisation, and underscores, another thing I avoid.

Yet one or two languages make underscores non-significant; a bit like
making letter case for A-Z/a-z non-significant. (And exactly like making
underscores in numeric literals non-significant.)

And also, a bit like Algol68 making white space in identifiers
non-significant: abc, def and abc def are three disinct identifiers;
abcdef, abc def and a b c d e f are the same one.

So, my treatment of capitalisation is like Nim's(?) underlines, and
Algol68's embedded spaces; it is an optional style that can be used to
enhance readability, or enforce naming guidelines.

No one suggests that Nim users will spend their time writing umpteen
variations of the same name by playing with "_"; or whether Algol68
users will do the same with spaces.

BTW, C also is also case-insensitive in a few areas:

X ABCDEF P Any mix can be used in hex literals
x abcdef p

E/e Exponents
U/u L/l Numeric suffix

A bit radical of it to let the user choose between upper and lower case,
and let that make no difference!

David Brown

unread,
Nov 30, 2021, 4:44:48 PM11/30/21
to
On 30/11/2021 21:36, Bart wrote:
> On 30/11/2021 13:58, David Brown wrote:
>> On 29/11/2021 23:40, Bart wrote:
>
>> But for commands, file names, program identifiers?  Why would you want
>> them to be case insensitive?  I mean, I agree that you don't want
>> commands "Copy", "copy" and "COPY" that all do different things.  But
>> given a command "copy", why would you ever want to type "COPY" ?  Given
>> a variable "noofwhatsits", what is the benefit of letting "noofwhatsits"
>> mean the same?
>
> I've normalised both of your 'noofwhatsits' to have the same
> capitalisation, ie. none.
>
> Can you remember what the original was?

Yes. noOfWhatsits.

>
> No? That's the probem.

No problem. The capitalisation was part of the identifier, and done
intentionally. Perhaps you think that when a language is
case-sensitive, people pick their capitalisations randomly?

>
> The thing is, I can remember words and phrases, but I easily forget
> capitalisation, and underscores, another thing I avoid.
>

Eh, okay. I find that hard to relate to - most people prefer some
indication of words in multi-word identifiers, and the two most common
techniques are underscore_between_words and snakeCase.

> Yet one or two languages make underscores non-significant; a bit like
> making letter case for A-Z/a-z non-significant. (And exactly like making
> underscores in numeric literals non-significant.)

There are some languages with odd rules, yes. TeX and LaTeX do not
allow digits in identifiers (so "x3" is "x 3", or, depending on the
context, "x{3}"). MetaPost and Metafont (note the significance of the
capitalisation in these names) consider "x3" as though x is an array,
with "x3" being the element you might normally think of as "x[3]".

>
> And also, a bit like Algol68 making white space in identifiers
> non-significant:  abc, def and abc def are three disinct identifiers;
> abcdef, abc def and a b c d e f are the same one.
>
> So, my treatment of capitalisation is like Nim's(?) underlines, and
> Algol68's embedded spaces; it is an optional style that can be used to
> enhance readability, or enforce naming guidelines.
>
> No one suggests that Nim users will spend their time writing umpteen
> variations of the same name by playing with "_"; or whether Algol68
> users will do the same with spaces.

In the same way, no one using a case-sensitive language spends their
time making mixed-up case identifiers just to cause confusion.

And having worked with badly written code in case-insensitive languages
(such as Pascal), I can tell you it is /seriously/ confusing when the
same identifier is cased in different ways. I count that as much worse
than having different identifiers distinguished only by case.
(Especially if the cases are used for a convention or style, such as
initial capitals for types.)

It doesn't matter what restrictions you make on the identifiers you can
use - there will always be people who make a mess with it. They will
spell things inconsistently, or call all their variables "temp1, temp2,
temp3" (that's particularly common in languages that encourage declaring
all variables at the start of a function, rather than having decent
scoping and mixing of statements and variable declarations). You can't
force good style by restricting how people can write good code.

>
> BTW, C also is also case-insensitive in a few areas:
>
>   X ABCDEF P   Any mix can be used in hex literals
>   x abcdef p
>
>   E/e          Exponents
>   U/u L/l      Numeric suffix
>
> A bit radical of it to let the user choose between upper and lower case,
> and let that make no difference!

Those are not identifiers.

I'd personally have been happy to stick to lower case only here, at
least for the "x" and "e" (I'd rather not have numeric suffixes at all).
But I didn't design C, I just use it.

Bart

unread,
Nov 30, 2021, 6:53:00 PM11/30/21
to
On 30/11/2021 21:44, David Brown wrote:
> On 30/11/2021 21:36, Bart wrote:
>> On 30/11/2021 13:58, David Brown wrote:
>>> On 29/11/2021 23:40, Bart wrote:
>>
>>> But for commands, file names, program identifiers?  Why would you want
>>> them to be case insensitive?  I mean, I agree that you don't want
>>> commands "Copy", "copy" and "COPY" that all do different things.  But
>>> given a command "copy", why would you ever want to type "COPY" ?  Given
>>> a variable "noofwhatsits", what is the benefit of letting "noofwhatsits"
>>> mean the same?
>>
>> I've normalised both of your 'noofwhatsits' to have the same
>> capitalisation, ie. none.
>>
>> Can you remember what the original was?
>
> Yes. noOfWhatsits.
>
>>
>> No? That's the probem.
>
> No problem. The capitalisation was part of the identifier, and done
> intentionally. Perhaps you think that when a language is
> case-sensitive, people pick their capitalisations randomly?

The Windows API contains 10,000 functions with specific capitalisations.

> In the same way, no one using a case-sensitive language spends their
> time making mixed-up case identifiers just to cause confusion.

Didn't I give you an example? When I use my tool to convert C headers to
my syntax, I need to go through and fix all the clashes.

> And having worked with badly written code in case-insensitive languages
> (such as Pascal), I can tell you it is /seriously/ confusing when the
> same identifier is cased in different ways.

And you don't find other people's C code confusing at all?

I've just done an interesting experiment on two of my programs: I took
the source code and a 100% lower case version and 100% upper case.

One program still worked with either version. The other worked after
tweaking one line to do with char constants.

I then tried the same experiment with a C version of the same program;
both versions failed, one on line 1, the other on line 4.

So, which version was more resilient to changes of case?

The experiment shows that you can much more easily refactor the
case-insensitive language to use consistent capitalisation in a style
that you prefer, than case-sensitive.

I just have my preferences and you have yours.

David Brown

unread,
Dec 1, 2021, 3:15:19 AM12/1/21
to
On 01/12/2021 00:52, Bart wrote:
> On 30/11/2021 21:44, David Brown wrote:
>> On 30/11/2021 21:36, Bart wrote:
>>> On 30/11/2021 13:58, David Brown wrote:
>>>> On 29/11/2021 23:40, Bart wrote:
>>>
>>>> But for commands, file names, program identifiers?  Why would you want
>>>> them to be case insensitive?  I mean, I agree that you don't want
>>>> commands "Copy", "copy" and "COPY" that all do different things.  But
>>>> given a command "copy", why would you ever want to type "COPY" ?  Given
>>>> a variable "noofwhatsits", what is the benefit of letting
>>>> "noofwhatsits"
>>>> mean the same?
>>>
>>> I've normalised both of your 'noofwhatsits' to have the same
>>> capitalisation, ie. none.
>>>
>>> Can you remember what the original was?
>>
>> Yes.  noOfWhatsits.
>>
>>>
>>> No? That's the probem.
>>
>> No problem.  The capitalisation was part of the identifier, and done
>> intentionally.  Perhaps you think that when a language is
>> case-sensitive, people pick their capitalisations randomly?
>
> The Windows API contains 10,000 functions with specific capitalisations.

If you say so (I've avoided it). I assume that they have had good
reasons for the capitalisations they picked (though with code that has
developed over such a long time, it can be hard to keep consistency).

>
>> In the same way, no one using a case-sensitive language spends their
>> time making mixed-up case identifiers just to cause confusion.
>
> Didn't I give you an example?

No. You gave an example of when capitalisation was used appropriately
and helpfully to add meaning to the code.

The fact that /you/ seem to find all capitalisation confusing does not
mean that the code authors wrote it specifically to cause confusion.

> When I use my tool to convert C headers to
> my syntax, I need to go through and fix all the clashes.

Do you /really/ expect sympathy for that? Honestly?

Every programming language has its idiosyncrasies, rules, and semantics.
There are always differences - some big, some small. If you want to
translate from one language to another, you have to take those into
account. Just as you cannot copy blindly from your language to C when
the arithmetic semantics are different, you cannot copy blindly from C
to your language if your identifiers are more restricted.

This happens all the time in language wrappers and interface generators,
as well as in transcompilers. When the swig folks made a tool for
generating interfaces to C++ code in Python (amongst the many
combinations they support), they had to find a way to automate
generation of functions or methods that are distinguished in C++ by
overloads, as Python does not support these. And if they generate for a
case-insensitive language like Pascal or Ada, they must handle case
translation too.

/You/ wrote your language, you designed it, you put in its limitations
and restrictions, you picked its types and semantics. That's fine,
that's your choice. But you can't blame other languages because they
did something different! You can't expect anyone to feel sorry for you
here - even if they happen to prefer case-insensitive languages themselves.

>
>> And having worked with badly written code in case-insensitive languages
>> (such as Pascal), I can tell you it is /seriously/ confusing when the
>> same identifier is cased in different ways.
>
> And you don't find other people's C code confusing at all?
>

As I said, people can (and do) write bad code in all languages, all
styles, and regardless of identifier rules. Yes, I have seen lots of
horrible, confusing or hard to comprehend C code. No, case sensitivity
was not an issue for confusion, though badly capitalised identifiers can
make code ugly. Inconsistent spelling causes an order of magnitude more
annoyance than poor capitalisation.

> I've just done an interesting experiment on two of my programs: I took
> the source code and a 100% lower case version and 100% upper case.
>
> One program still worked with either version. The other worked after
> tweaking one line to do with char constants.
>
> I then tried the same experiment with a C version of the same program;
> both versions failed, one on line 1, the other on line 4.
>
> So, which version was more resilient to changes of case?

If you find such a completely meaningless experiment interesting, then
go ahead - knock yourself out. Everyone else knows that C is a
case-sensitive language and would not expect changing everything to
upper or lower case to work any more than changing all vowels to "e".

>
> The experiment shows that you can much more easily refactor the
> case-insensitive language to use consistent capitalisation in a style
> that you prefer, than case-sensitive.

No, it does nothing of the sort. We already know that with a
case-insensitive language, you can change the capitalisation at will, so
the "experiment" does not show that. You made no attempt to refactor
(perhaps you don't know the meaning of that word?) any programs to a
consistent capitalisation style, so your "experiment" shows nothing there.

>
> I just have my preferences and you have yours.
>

Yes, and that's fine. But drop the delusion that your unusual
collection of personal opinions is the absolute truth of how programming
languages should be.

Bart

unread,
Dec 1, 2021, 5:44:14 AM12/1/21
to
Here's a summary of what I've been talking about:

C (eg) Me

Case-sensitive Yes No
0-based Yes No (both 1-based and N-based)
Braces Yes No (keyword block delimiters)
Library Read/Print Yes No (read/print *statements*)
Char-based text i/o Yes No (line-oriented i/o)
Millions of ";" Yes No (line-oriented source)


It's just struck me that all the languages corresponding to the
left-hand column are generally more rigid and inflexible**.

The ones having attributes from the right are more forgiving, more
tolerant, and therefore more user-friendly. That would be a desirable
attribute of a scripting language.

Since I develop both a compiled and scripting language which have the
same syntax, it's natural they should share the same attributes.

(** Except that when it comes to C, compilers for it tend to be too lax
in unsafe ways. You've got to get that semicolon just right, but never
mind that you've missed out a return statement in that non-void function!)




David Brown

unread,
Dec 1, 2021, 7:12:33 AM12/1/21
to
On 01/12/2021 11:44, Bart wrote:
> On 01/12/2021 08:15, David Brown wrote:
>> On 01/12/2021 00:52, Bart wrote:
>
>>> I just have my preferences and you have yours.
>>>
>>
>> Yes, and that's fine.  But drop the delusion that your unusual
>> collection of personal opinions is the absolute truth of how programming
>> languages should be.
>>
>
> Here's a summary of what I've been talking about:
>
>                        C (eg)      Me
>
>   Case-sensitive       Yes         No
>   0-based              Yes         No (both 1-based and N-based)
>   Braces               Yes         No (keyword block delimiters)
>   Library Read/Print   Yes         No (read/print *statements*)
>   Char-based text i/o  Yes         No (line-oriented i/o)
>   Millions of ";"      Yes         No (line-oriented source)
>
>
> It's just struck me that all the languages corresponding to the
> left-hand column are generally more rigid and inflexible**.
>
> The ones having attributes from the right are more forgiving, more
> tolerant, and therefore more user-friendly. That would be a desirable
> attribute of a scripting language.
>

Words like "flexible" are of questionable value - they mean different
things to different people. To me, C (and other languages with similar
attributes to those you list here - C is an example, nothing more) is
more flexible. Being case-sensitive is more flexible than
case-insensitive because it lets you choose capitalisation that conveys
more information to the user. Having 0-based arrays is more flexible
than 1-based arrays because it is easier to work with more complex
structures. (Allowing a choice of starting values, and other indexing
types, is much more flexible.) Use of statement terminators or
separators (C has statement terminators, Pascal has statement
separators) makes the language more flexible because your source code
layout can match the structure of the code you want to express in the
way you want to write it, rather than being forced into lines.

(I don't think the other points on your list affect "flexibility".)


Then there is the question of "tolerance" and "forgiving", and whether
that makes a language "user friendly". Here I can accept that your
language may be more "tolerant" and "forgiving", but that's based
entirely on your judgement - nothing on the list here is, IMHO, a matter
of "tolerance".

But having long experience with more static and rigid languages such as
C and C++, and also with more tolerant and dynamic languages such as
Python, I think would be wrong to say one is more "user-friendly" than
the other. It is better to say that they are more suited for different
tasks. Python is much more user-friendly for some case, C for other
cases. A key point here is the dynamic nature of Python - it lets you
write code at a higher level (user-friendly), but can't do the kind of
compile-time error checking and control that is possible in C (making it
user-unfriendly).


> Since I develop both a compiled and scripting language which have the
> same syntax, it's natural they should share the same attributes.
>

I disagree. They might share some aspects, but you have different uses
and different needs for a compiled language and a scripting language
(otherwise, why have two languages at all?). Gratuitous similarities
are no better than gratuitous differences.

> (** Except that when it comes to C, compilers for it tend to be too lax
> in unsafe ways. You've got to get that semicolon just right, but never
> mind that you've missed out a return statement in that non-void function!)
>

I agree that a lot of C compilers are far too lenient by default. But
that's easy to solve - don't use the default settings. Adding "-Wall"
or "/W" or whatever flag you need, is not rocket science.

Bart

unread,
Dec 1, 2021, 8:55:55 AM12/1/21
to
You can use exactly the same capitalisaion, say of "AbcDef", in
case-insensitive syntax. But it's optional, so more flexible.

> Having 0-based arrays is more flexible
> than 1-based arrays because it is easier to work with more complex
> structures.

I'd dispute that, but I won't go into it, since I also do N-based which
/includes/ 0-based. More flexible.

> (Allowing a choice of starting values, and other indexing
> types, is much more flexible.)

Yeah...


> Use of statement terminators or
> separators (C has statement terminators, Pascal has statement
> separators) makes the language more flexible because your source code
> layout can match the structure of the code you want to express in the
> way you want to write it, rather than being forced into lines.

Actually the brace thing is not that much more flexible one way or
another. But my terminators do provide a choice, for example 'end', 'end
if', 'endif' or 'fi' can terminate an if-statement. 'end' can terminate
any block; the others must match the statement.

My users liked using 'End' and other camelcase:

Proc RuimOP =
For I = 1 To NidTot Do
DrawItem(NiewId[I],0)
DeleteItem(NiewId[I])
End
End

Their choice.

>
> Then there is the question of "tolerance" and "forgiving", and whether
> that makes a language "user friendly". Here I can accept that your
> language may be more "tolerant" and "forgiving", but that's based
> entirely on your judgement - nothing on the list here is, IMHO, a matter
> of "tolerance".

I'm tolerant of semicolons; I use them as separators, but 99% don't need
to be typed as they coincide with end-of-line.

And extra semicolons are harmless; in C they can have dramatic consequences.

If you're reading 3 numbers from ONE line of input using scanf, then
that line needs to have exactly 3 numbers, or it'll go wrong. My
line-based Read can tolerate fewer or more numbers on that line so much
better for interactive input.

If do 'print a,b,c', then change the types of those expressions, nothing
needs to change in that statement. I can also copy and paste that line
to print the local a,b,c expressions elsewhere. In C you have to rewrite
the format string.


>
>> Since I develop both a compiled and scripting language which have the
>> same syntax, it's natural they should share the same attributes.
>>
>
> I disagree. They might share some aspects, but you have different uses
> and different needs for a compiled language and a scripting language
> (otherwise, why have two languages at all?).

Why shouldn't they have the same syntax? Or near enough the same (one
will need more type annotations and so on).


This is fibonacci in one language:

function fib(int n)int=
if n<3 then
return 1
else
return fib(n-1)+fib(n-2)
fi
end

And in the other:

function fib(n)=
if n<3 then
return 1
else
return fib(n-1)+fib(n-2)
fi
end

(This demonstrates one reason why I like to have declarations out of the
way of the main code: the body of the function can be more easily ported
to the other language; it keeps the code clean.)

This is the driver code for a test program:

for i to 36 do
println i,fib(i)
od

which works in either language (except that the static language will go
beyond 36!).

(The static language needs it wrapped in a function; it will still
worked as dynamic, but there it is optional.)


Bart

unread,
Dec 6, 2021, 1:59:52 PM12/6/21
to
On 29/11/2021 15:19, David Brown wrote:
> On 29/11/2021 14:06, Bart wrote:

>> This is just pure jealousy. Show me the C code needed to do the
>> equivalent of this (without knowing the types of a, b, c other than they
>> are numeric):
>>
>>    print "?"
>>    readln a, b, c
>>    println a, b, c

> You are under the delusion that there is one "correct" interpretation
> here. You think that /your/ ideas are the only "obvious" or "proper"
> way to handle things. In reality, there are dozens of questions that
> could be asked here, including:
>
> Does there have to be a delimiter between the inputs? Does it have to
> be comma, or space, or newline? Are these ignored if there are more
> than one? Are numbers treated differently in the input? Would an input
> of "true" be treated as a string or a boolean? Are there limits to the
> sizes?


This also comes up in command-line parameters to shell commands.

There you can also ask all your questions. The difference is that rather
than not make itemised parameters available at all (eg. as a single
string), it decides on sensible defaults.

But they are still not as sensible as what I use for Read. In Windows or
Linux, a command like:

prog a b c

returns those three args as 3 string parameters "a", "b", c". This:

prog a,b,c

returns one arg of "a,b,c". Here:

prog a, b, c

it results in 3 parameters "a," "b," "c" (notice the trailing commas).

prog "a b" c

gives 2 params "a b" and "c", without the quotes. So better than your
Python example. Here however:

prog *.c

Windows give one parameter "*.c", Linux gives *240* parameters. When I
try this:

prog *.c -option

Windows gives me "*.c" and "-option", but Linux now gives 241
parameters; information has been lost.

Anyway, imagine what a nuisance it would have been in C's main() was
defined like this:

int main(char* cmdline) {}

Just one string that you had to parse yourself.

When C decides to do something that is convenient, then that is great.
If I decide to do that, you have 101 reasons why it is a terrible idea.

NIH syndrome?

David Brown

unread,
Dec 7, 2021, 2:15:57 AM12/7/21
to
Information has not been "lost". In the *nix world, there is a strong
tradition for avoiding duplication of work. The shell knows how to
expand wildcards, so it does that job - letting "prog" concentrate on
its own job. It makes it possible to keep things simpler and more
consistent (while retaining the flexibility to be overly complex and
inconsistent - I'm not saying everything in the *nix world is perfect).
In the Windows world, every program has to re-implement its own wheels
from scratch, every time.

If you want to pass "*.c" as an option in *nix, write :

prog \*.c -option

or

prog "*.c" -option

It's quite simple.

> Anyway, imagine what a nuisance it would have been in C's main() was
> defined like this:
>
>    int main(char* cmdline) {}
>
> Just one string that you had to parse yourself.
>
> When C decides to do something that is convenient, then that is great.
> If I decide to do that, you have 101 reasons why it is a terrible idea.
>
> NIH syndrome?

It is not "C" that decides this. It is the OS conventions that is in
charge of passing information to the start of the program, in
cooperation with the runtime startup code. It is no coincidence that
you get the same arguments in argv in C and in sys.argv in Python.

So the /shell/ is the bit that does the wildcard expansion. It is
normal for shells on *nix to expand wildcards, and normal for the more
minimal "command prompt" in DOS and Windows not to expand wildcards.
But you can have a shell in *nix that does not expand them, and a shell
in Windows that does.

It is the /OS/ and the the link-loader that splits command lines into
parts and passes them to the start of the program, regardless of the
language of the program.

Could this have been done differently? Sure. Could it have been done
better? Other ways might have had some advantages, and some
disadvantages - better in some ways, less good in others. Is this part
of the great conspiracy where everything in the computer world is made
because of C, is worse because of C, and designed that way with the sole
intention of annoying Bart? No.

Dmitry A. Kazakov

unread,
Dec 7, 2021, 2:56:22 AM12/7/21
to
On 2021-12-07 08:14, David Brown wrote:

> So the /shell/ is the bit that does the wildcard expansion.

That was a quite a problem back then. The damn thing ran out of memory
expanding *'s on i368 25Mz machines we used.

> Could this have been done differently? Sure.

In a well-designed system you would have a standard system library to
process the command line in a unified way. UNIX was a mixed bag trying
and failing to do both. In the end nobody respected any conventions and
UNIX utilities have totally unpredictable syntax of arguments.

[ Though I think UNIX missed an opportunity to make it even worse.
Consider if it not only expanded file lists but also opened the files
and passed the file descriptors to the process! ]

> Is this part
> of the great conspiracy where everything in the computer world is made
> because of C, is worse because of C, and designed that way with the sole
> intention of annoying Bart? No.

It is much bigger than Bart, the conspiracy, I mean... (:-))

Bart

unread,
Dec 7, 2021, 4:24:02 AM12/7/21
to
That might be the case on Unix where where the OS and C are so
intertwined that that you don't know where one ends and the other begins.

On Windows, that doesn't happen automatically. If you see that behaviour
then it's due to the language runtime.

That 'int main(int argc, char** argv)' entry-point doesn't magically happen!

I need to call __getmainargs() in msvcrt.dll to get those arguments
expected of C programs. Before I knew about __getmainargs(), I used
GetCommandLine() from WinAPI to get the commands as one string, that I
had to parse myself.

> Could this have been done differently? Sure. Could it have been done
> better? Other ways might have had some advantages, and some
> disadvantages - better in some ways, less good in others. Is this part
> of the great conspiracy where everything in the computer world is made
> because of C, is worse because of C, and designed that way with the sole
> intention of annoying Bart? No.

I use a more sophisticated version of what happens with command-line
params (without turning the latter into a whole language like some
shells), and make it available via Readln on every line of console or
file input, not just the bit following the command invocation.

You said that is unworkable. C's argc/argv scheme shows that it can be.

David Brown

unread,
Dec 7, 2021, 4:45:35 AM12/7/21
to
On 07/12/2021 08:54, Dmitry A. Kazakov wrote:
> On 2021-12-07 08:14, David Brown wrote:
>
>> So the /shell/ is the bit that does the wildcard expansion.
>
> That was a quite a problem back then. The damn thing ran out of memory
> expanding *'s on i368 25Mz machines we used.

With big enough sets of files or command lines, /something/ is going to
run out of memory! Yes, sometimes command lines on *nix get too long,
and you have to use something like xargs. On the other side, because
DOS and Windows don't expand wildcards, the systems were made with much
shorter limits on the length of command lines which can lead to problems
with long filenames, lots of files (such as for linking), or lots of
flags. This is mostly a thing of the past, however, on both *nix and
Windows.

>
>> Could this have been done differently?  Sure.
>
> In a well-designed system you would have a standard system library to
> process the command line in a unified way. UNIX was a mixed bag trying
> and failing to do both. In the end nobody respected any conventions and
> UNIX utilities have totally unpredictable syntax of arguments.
>

It is far from being "totally unpredictable" - there are conventions
that are followed by most programs. But these are not enforced in any
way by *nix.

> [ Though I think UNIX missed an opportunity to make it even worse.
> Consider if it not only expanded file lists but also opened the files
> and passed the file descriptors to the process! ]
>

That would not make any sense - command line parameters are not
necessarily files!

David Brown

unread,
Dec 7, 2021, 4:57:30 AM12/7/21
to
On 07/12/2021 10:22, Bart wrote:

> That might be the case on Unix where where the OS and C are so
> intertwined that that you don't know where one ends and the other begins.

No, /you/ don't know where one ends and the other begins - because you
have worked yourself into such an obsessive hatred of both that you
refuse to learn anything about either. Please don't judge others by
your own wilful ignorance - there are countless millions who manage to
program in C and work with *nix (the two being independent in practice).
There's nothing wrong with preferring other languages and/or other
OS's, but your struggles with C and your dislike of *nix are a personal
matter for you.

Please, find something new and interesting to post about rather than
your misconceptions, misunderstandings and FUD about languages and
systems that you don't like. It would be nice to get back to some
positive discussions.

Bart

unread,
Dec 7, 2021, 5:06:00 AM12/7/21
to
On 07/12/2021 09:44, David Brown wrote:
> On 07/12/2021 08:54, Dmitry A. Kazakov wrote:
>> On 2021-12-07 08:14, David Brown wrote:
>>
>>> So the /shell/ is the bit that does the wildcard expansion.
>>
>> That was a quite a problem back then. The damn thing ran out of memory
>> expanding *'s on i368 25Mz machines we used.
>
> With big enough sets of files or command lines, /something/ is going to
> run out of memory!

But that needn't be the case! Suppose you had a million files in the
current directory, then input of "*" or "*.*" will try and create
1000000 strings; something is likely to break.

On Windows, it will just see one string. If the application actually
expected to work on multiple files, then with "*", it could iterate over
the files one by one, without needing to first create a list of them all.


>> [ Though I think UNIX missed an opportunity to make it even worse.
>> Consider if it not only expanded file lists but also opened the files
>> and passed the file descriptors to the process! ]
>>
>
> That would not make any sense - command line parameters are not
> necessarily files!

It doesn't make sense anyway: an input like "*" might means something
specific to an application, but the shell will turn it into an arbirary
list of strings, or into nothing.

Even if the app works with files, it can see that "* *" are two items
(perhaps two sets of files for different purposes), but on Linux, it
will turn it into one giant list, with duplicate files.

Dmitry A. Kazakov

unread,
Dec 7, 2021, 5:14:12 AM12/7/21
to
On 2021-12-07 10:44, David Brown wrote:
> On 07/12/2021 08:54, Dmitry A. Kazakov wrote:
>> On 2021-12-07 08:14, David Brown wrote:
>>
>>> So the /shell/ is the bit that does the wildcard expansion.
>>
>> That was a quite a problem back then. The damn thing ran out of memory
>> expanding *'s on i368 25Mz machines we used.
>
> With big enough sets of files or command lines, /something/ is going to
> run out of memory!

Normally you would just walk the list of files without expanding it in
the memory.

[ The secret lore lost to younger generations, that there is no need
load all document into the memory in order to read or edit it. (:-)) ]

>> In a well-designed system you would have a standard system library to
>> process the command line in a unified way. UNIX was a mixed bag trying
>> and failing to do both. In the end nobody respected any conventions and
>> UNIX utilities have totally unpredictable syntax of arguments.
>
> It is far from being "totally unpredictable" - there are conventions
> that are followed by most programs.

Like in the case of dd?

>> [ Though I think UNIX missed an opportunity to make it even worse.
>> Consider if it not only expanded file lists but also opened the files
>> and passed the file descriptors to the process! ]
>
> That would not make any sense - command line parameters are not
> necessarily files!

They are, unless introduced by a symbol of a key, e.g. /a, -a, --a etc,
so was the "convention." I never liked it, BTW.

Bart

unread,
Dec 7, 2021, 5:33:46 AM12/7/21
to
On 07/12/2021 09:57, David Brown wrote:
> On 07/12/2021 10:22, Bart wrote:
>
>> That might be the case on Unix where where the OS and C are so
>> intertwined that that you don't know where one ends and the other begins.
>
> No, /you/ don't know where one ends and the other begins - because you
> have worked yourself into such an obsessive hatred of both

Where's the hatred above? I'm still stating what I see.

that you
> refuse to learn anything about either. Please don't judge others by
> your own wilful ignorance - there are countless millions who manage to
> program in C and work with *nix (the two being independent in practice).
> There's nothing wrong with preferring other languages and/or other
> OS's, but your struggles with C and your dislike of *nix are a personal
> matter for you.
>
> Please, find something new and interesting to post about rather than
> your misconceptions, misunderstandings and FUD about languages and
> systems that you don't like. It would be nice to get back to some
> positive discussions.
>

You said this:

> It is the /OS/ and the the link-loader that splits command lines into
> parts and passes them to the start of the program, regardless of the
> language of the program.

That appears to be incorrect.

It might be happy coincidence that on Unix, at the entry point to a
program in any language, the parameter stack happens to contain suitable
values of argc and argv, just like you get with C's main(); what a bit
of luck!

But I haven't seen that in Windows (or any previous OSes I've used).

Can you point me to a link which says that Windows does exactly the
same, or please say you were mistaken.

/You/ seem to have an obsessive hatred of anything I say or do.

Dmitry A. Kazakov

unread,
Dec 7, 2021, 5:56:46 AM12/7/21
to
On 2021-12-07 11:33, Bart wrote:
> On 07/12/2021 09:57, David Brown wrote:

> You said this:
>
> > It is the /OS/ and the the link-loader that splits command lines into
> > parts and passes them to the start of the program, regardless of the
> > language of the program.
>
> That appears to be incorrect.

Nope, it is perfectly correct. If external command line parsing happens,
then it is ultimately the OS that pushes the results to the program. It
a part of the interface between C's run-time and the OS. Other languages
do more or less the same, e.g. see Ada RM A.15 The package Command_Line.

> It might be happy coincidence that on Unix, at the entry point to a
> program in any language, the parameter stack happens to contain suitable
> values of argc and argv, just like you get with C's main(); what a bit
> of luck!
>
> But I haven't seen that in Windows (or any previous OSes I've used).
>
> Can you point me to a link which says that Windows does exactly the
> same, or please say you were mistaken.

https://docs.microsoft.com/en-us/cpp/c-runtime-library/argc-argv-wargv?view=msvc-170

https://docs.microsoft.com/en-us/cpp/c-language/parsing-c-command-line-arguments?view=msvc-170

Bart

unread,
Dec 7, 2021, 6:22:10 AM12/7/21
to
On 07/12/2021 10:56, Dmitry A. Kazakov wrote:
> On 2021-12-07 11:33, Bart wrote:
>> On 07/12/2021 09:57, David Brown wrote:
>
>> You said this:
>>
>>  > It is the /OS/ and the the link-loader that splits command lines into
>>  > parts and passes them to the start of the program, regardless of the
>>  > language of the program.
>>
>> That appears to be incorrect.
>
> Nope, it is perfectly correct. If external command line parsing happens,
> then it is ultimately the OS that pushes the results to the program.

On Windows it pushes nothing. Language start-up code needs to do the
work behinds the scenes so that user-programs can use entry-point
functions like:

main(argc, argv)
WinMain(Hinstance, etc)

> It
> a part of the interface between C's run-time and the OS.

Yes, it is part of the language. It is NOT the OS as David Brown stated.
It might be on Unix since Unix and C are so chummy.


> Other languages
> do more or less the same, e.g. see Ada RM A.15 The package Command_Line.
>
>> It might be happy coincidence that on Unix, at the entry point to a
>> program in any language, the parameter stack happens to contain
>> suitable values of argc and argv, just like you get with C's main();
>> what a bit of luck!
>>
>> But I haven't seen that in Windows (or any previous OSes I've used).
>>
>> Can you point me to a link which says that Windows does exactly the
>> same, or please say you were mistaken.
>
> https://docs.microsoft.com/en-us/cpp/c-runtime-library/argc-argv-wargv?view=msvc-170
>
>
> https://docs.microsoft.com/en-us/cpp/c-language/parsing-c-command-line-arguments?view=msvc-170

Those two links are specific to C implementations. So of course they
will set up the main's arguments for you! That's what I have to do too
in my C implementation:


#include <stdio.h>

int main (int n, char** a) {

for (int i=1; i<=n; ++i) {
printf("%d: %s\n",i,*a);
++a;
}
}

This generates the asm code below. The user's 'main' function is renamed
'.main'.

A new 'main' function is generated which calls __getmainargs. (There are
also __argc and __argv exported by msvcrt.dll, but I don't use those.)

Once obtained, it calls .main() with those new parameters. The program
thinks the OS put those on the stack; apparently so does everyone else!

Below this code, is the import list used by a version of this C program
compiled with tcc. Tcc also has a startup routine (bigger than mine),
which loads the arguments that become main's argc/argv. But notice it
imports __getmainargs too.

This routine is not the OS. The msvcrt.dll library is not the OS.


!x64 output for showargs.c
align 16
!------------------------------------
`main::
sub Dstack,160
lea D0,[Dstack+8]
push D0
sub Dstack,32
lea D0,[Dstack+196]
mov [Dstack],D0
lea D0,[Dstack+184]
mov [Dstack+8],D0
lea D0,[Dstack+176]
mov [Dstack+16],D0
mov A0,0
mov [Dstack+24],A0
mov D10,[Dstack]
mov D11,[Dstack+8]
mov D12,[Dstack+16]
mov D13,[Dstack+24]
call __getmainargs*
add Dstack,16
mov A0,[Dstack+180]
mov [Dstack],A0
mov D0,[Dstack+168]
mov [Dstack+8],D0
mov D10,[Dstack]
mov D11,[Dstack+8]
call .main
mov A10,A0
call exit*

.main::
push Dframe
mov Dframe, Dstack
sub Dstack, 16
mov [Dframe+16], D10
mov [Dframe+24], D11
! -------------------------------------------------
mov word32 [Dframe-8], 1
jmp L4
L5:
sub Dstack, 32
mov D0, [Dframe+24]
push word64 [D0]
mov D10, KK1
mov A11, [Dframe-8]
pop D12
call `printf*
add Dstack, 32
add word64 [Dframe+24], 8
L2:
inc word32 [Dframe-8]
L4:
mov A0, [Dframe-8]
cmp A0, [Dframe+16]
jle L5
L3:
L1:
! -------------------------------------------------
sub Dstack, 32
mov D10, 0
call exit*

!String Table
segment idata
align 8
KK1:db "%d: %s",10,0


Tcc imports:

Name: msvcrt.dll
Import Addr RVA: 2038
Import: 20d3 0 printf
Import: 20dc 0 __set_app_type
Import: 20ed 0 _controlfp
Import: 20fa 0 __argc
Import: 2103 0 __argv
Import: 210c 0 _environ
Import: 2117 0 __getmainargs
Import: 2127 0 exit

Bart

unread,
Dec 7, 2021, 6:40:51 AM12/7/21
to
On 07/12/2021 07:14, David Brown wrote:

> So the /shell/ is the bit that does the wildcard expansion. It is
> normal for shells on *nix to expand wildcards, and normal for the more
> minimal "command prompt" in DOS and Windows not to expand wildcards.
> But you can have a shell in *nix that does not expand them, and a shell
> in Windows that does.

This is not quite right either.

If I take this line that I use to obtain the command params on Windows:

__getmainargs(&nargs, &args, &env,0, &startupinfo)

and change that 0 to 1, then I will get expanded wildcards too!

(And the first message I saw was: "Too many params"! I have a limit of
128 parameters, which seems more than adeqite for normal shell use, but
there were 160 matching files. Such expansion is just inappropropriate
at this point.)

Anyway, this bit is clearly not a shell.

Interesting the things you find out when you implement languages instead
of merely using them.

Dmitry A. Kazakov

unread,
Dec 7, 2021, 7:00:13 AM12/7/21
to
On 2021-12-07 12:22, Bart wrote:
> On 07/12/2021 10:56, Dmitry A. Kazakov wrote:
>> On 2021-12-07 11:33, Bart wrote:
>>> On 07/12/2021 09:57, David Brown wrote:
>>
>>> You said this:
>>>
>>>  > It is the /OS/ and the the link-loader that splits command lines into
>>>  > parts and passes them to the start of the program, regardless of the
>>>  > language of the program.
>>>
>>> That appears to be incorrect.
>>
>> Nope, it is perfectly correct. If external command line parsing
>> happens, then it is ultimately the OS that pushes the results to the
>> program.
>
> On Windows it pushes nothing. Language start-up code needs to do the
> work behinds the scenes so that user-programs can use entry-point
> functions like:
>
>     main(argc, argv)
>     WinMain(Hinstance, etc)

Yes and that was the point. This happens *before* main() is called.

> Yes, it is part of the language. It is NOT the OS as David Brown stated.
> It might be on Unix since Unix and C are so chummy.

It is specified by the language and fulfilled by the OS. Maybe, you
meant something like the context where parsing to happen:

Linux - The caller process
Windows - The callee process
xxx - The system kernel, maybe VxWorks would fall into this
category, I am not sure

?

This is kind of pointless distinction, especially because processes in
Linux and Windows are very different.

Bart

unread,
Dec 7, 2021, 8:58:13 AM12/7/21
to
On 07/12/2021 12:00, Dmitry A. Kazakov wrote:
> On 2021-12-07 12:22, Bart wrote:
>> On 07/12/2021 10:56, Dmitry A. Kazakov wrote:
>>> On 2021-12-07 11:33, Bart wrote:
>>>> On 07/12/2021 09:57, David Brown wrote:
>>>
>>>> You said this:
>>>>
>>>>  > It is the /OS/ and the the link-loader that splits command lines
>>>> into
>>>>  > parts and passes them to the start of the program, regardless of the
>>>>  > language of the program.
>>>>
>>>> That appears to be incorrect.
>>>
>>> Nope, it is perfectly correct. If external command line parsing
>>> happens, then it is ultimately the OS that pushes the results to the
>>> program.
>>
>> On Windows it pushes nothing. Language start-up code needs to do the
>> work behinds the scenes so that user-programs can use entry-point
>> functions like:
>>
>>      main(argc, argv)
>>      WinMain(Hinstance, etc)
>
> Yes and that was the point. This happens *before* main() is called.

But *after* execution commences at the program's official entry point,
by the language's startup code.

What it comes down is that, if you are implementing a language, these
argc/argv values don't magically appear on the stack, not on Windows.
The language must arrange for that to happen.


>> Yes, it is part of the language. It is NOT the OS as David Brown
>> stated. It might be on Unix since Unix and C are so chummy.
>
> It is specified by the language and fulfilled by the OS.

The only WinAPI function I know of that gives that info is
GetCommandLine, which delivers a single string you have to process.

If you know of a better WinAPI function on Windows, or of some exported
data from a Windows DLL that provides the same info, or perhaps of some
data block within the loaded PE image that contains it, then I will use
that instead.

> Maybe, you
> meant something like the context where parsing to happen:
>
>    Linux   - The caller process
>    Windows - The callee process
>    xxx     - The system kernel, maybe VxWorks would fall into this
> category, I am not sure
>
> ?
>
> This is kind of pointless distinction, especially because processes in
> Linux and Windows are very different.

Well, I was responding to this:

DB:
> It is the /OS/ and the the link-loader that splits command lines into
> parts and passes them to the start of the program, regardless of the
> language of the program.

The distinction was important since it is this very process that is
commonly done on Unix and/or C, which is equivalent to what I do with
Read on every line. Apparently it's OK when C (and/or Unix) does it on
the command line, but not OK when a language does it on any input line.

Correction: when /my/ language does it.

Dmitry A. Kazakov

unread,
Dec 7, 2021, 10:26:43 AM12/7/21
to
On 2021-12-07 14:57, Bart wrote:
> On 07/12/2021 12:00, Dmitry A. Kazakov wrote:
>> On 2021-12-07 12:22, Bart wrote:
>>> On 07/12/2021 10:56, Dmitry A. Kazakov wrote:
>>>> On 2021-12-07 11:33, Bart wrote:
>>>>> On 07/12/2021 09:57, David Brown wrote:
>>>>
>>>>> You said this:
>>>>>
>>>>>  > It is the /OS/ and the the link-loader that splits command lines
>>>>> into
>>>>>  > parts and passes them to the start of the program, regardless of
>>>>> the
>>>>>  > language of the program.
>>>>>
>>>>> That appears to be incorrect.
>>>>
>>>> Nope, it is perfectly correct. If external command line parsing
>>>> happens, then it is ultimately the OS that pushes the results to the
>>>> program.
>>>
>>> On Windows it pushes nothing. Language start-up code needs to do the
>>> work behinds the scenes so that user-programs can use entry-point
>>> functions like:
>>>
>>>      main(argc, argv)
>>>      WinMain(Hinstance, etc)
>>
>> Yes and that was the point. This happens *before* main() is called.
>
> But *after* execution commences at the program's official entry point,
> by the language's startup code.

The official entry point of a C console program is main().

> What it comes down is that, if you are implementing a language, these
> argc/argv values don't magically appear on the stack, not on Windows.
> The language must arrange for that to happen.

The linker, there is a switch to instruct the MS linker which CRT to
link to the executable. E.g. one can link one that skips command line
parsing altogether.

> The only WinAPI function I know of that gives that info is
> GetCommandLine, which delivers a single string you have to process.

> If you know of a better WinAPI function on Windows, or of some exported
> data from a Windows DLL that provides the same info, or perhaps of some
> data block within the loaded PE image that contains it, then I will use
> that instead.

CommandLineToArgv[A|W]

Why do you think it should be physically stored in the process address
space in the first place? It might be the case for a very primitive OS
UNIX was when it was developed. These days it could be anywhere. You
know there is GetCommandLineA and GetCommandLineW, which one is a fake?
Why do you even care?

>> Maybe, you meant something like the context where parsing to happen:
>>
>>     Linux   - The caller process
>>     Windows - The callee process
>>     xxx     - The system kernel, maybe VxWorks would fall into this
>> category, I am not sure
>>
>> ?
>>
>> This is kind of pointless distinction, especially because processes in
>> Linux and Windows are very different.
>
> Well, I was responding to this:
>
> DB:
> > It is the /OS/ and the the link-loader that splits command lines into
> > parts and passes them to the start of the program, regardless of the
> > language of the program.
>
> The distinction was important since it is this very process that is
> commonly done on Unix and/or C, which is equivalent to what I do with
> Read on every line. Apparently it's OK when C (and/or Unix) does it on
> the command line, but not OK when a language does it on any input line.

I have no idea what this is supposed to mean.

Bart

unread,
Dec 7, 2021, 11:54:47 AM12/7/21
to
On 07/12/2021 15:25, Dmitry A. Kazakov wrote:
> On 2021-12-07 14:57, Bart wrote:

>>> Yes and that was the point. This happens *before* main() is called.
>>
>> But *after* execution commences at the program's official entry point,
>> by the language's startup code.
>
> The official entry point of a C console program is main().

No. You don't quite understand how it works, that's fine.

But if your C program's entrypoint point is 'main', and the EXE's
entrypoint name is also 'main', then this code must be executed within
main(), by specially injected code.

Actually, with gcc, it changes the EXE's entry point to something else,
probably some injected code if it is not some function in the runtime.
It then eventually calls the user-code main(), with argc/argv as parameters.

In any case, fetching the command params is done after the application
has started running.


>> What it comes down is that, if you are implementing a language, these
>> argc/argv values don't magically appear on the stack, not on Windows.
>> The language must arrange for that to happen.
>
> The linker, there is a switch to instruct the MS linker which CRT to
> link to the executable. E.g. one can link one that skips command line
> parsing altogether.

I don't use a linker...

But if what you say is correct, then this is still code that is within
the application.

>> The only WinAPI function I know of that gives that info is
>> GetCommandLine, which delivers a single string you have to process.
>
>> If you know of a better WinAPI function on Windows, or of some
>> exported data from a Windows DLL that provides the same info, or
>> perhaps of some data block within the loaded PE image that contains
>> it, then I will use that instead.
>
> CommandLineToArgv[A|W]

This would be the next step /after/ calling GetCommandLine.

I might use this, but instead I will probably apply my own parsing since
the C-style processing is not quite up to scratch. For example it would
be nice to do:

readln @cmdline, a, b, c

when a, b, c are numbers, without all the usual palaver of checking argc
and applying atoi and the rest, or having to employ some library to do
that simple task.


> Why do you think it should be physically stored in the process address
> space in the first place? It might be the case for a very primitive OS
> UNIX was when it was developed. These days it could be anywhere. You
> know there is GetCommandLineA and GetCommandLineW, which one is a fake?
> Why do you even care?
>
>>> Maybe, you meant something like the context where parsing to happen:
>>>
>>>     Linux   - The caller process
>>>     Windows - The callee process
>>>     xxx     - The system kernel, maybe VxWorks would fall into this
>>> category, I am not sure
>>>
>>> ?
>>>
>>> This is kind of pointless distinction, especially because processes
>>> in Linux and Windows are very different.
>>
>> Well, I was responding to this:
>>
>> DB:
>>  > It is the /OS/ and the the link-loader that splits command lines into
>>  > parts and passes them to the start of the program, regardless of the
>>  > language of the program.
>>
>> The distinction was important since it is this very process that is
>> commonly done on Unix and/or C, which is equivalent to what I do with
>> Read on every line. Apparently it's OK when C (and/or Unix) does it on
>> the command line, but not OK when a language does it on any input line.
>
> I have no idea what this is supposed to mean.

This subthread is about the similarity between:

- Command line parsing (chopping one input line into separate args)
- My language's Readln which reads separate items from one input line

DB said the latter can't possibly work because of too many unknowns. Yet
it hasn't stopped shells using command line parameters.


Dmitry A. Kazakov

unread,
Dec 7, 2021, 2:31:39 PM12/7/21
to
On 2021-12-07 17:54, Bart wrote:

> In any case, fetching the command params is done after the application
> has started running.

No, the process /= application. There is a lot of things happening
between creation of a process and the application running on the context
of that process (or processes).

>>> What it comes down is that, if you are implementing a language, these
>>> argc/argv values don't magically appear on the stack, not on Windows.
>>> The language must arrange for that to happen.
>>
>> The linker, there is a switch to instruct the MS linker which CRT to
>> link to the executable. E.g. one can link one that skips command line
>> parsing altogether.
>
> I don't use a linker...
>
> But if what you say is correct, then this is still code that is within
> the application.

No, it is within the C run-time and furthermore nothing prevents an
implementation of the run-time to call to the system kernel and/or other
processes and services.

>>> The only WinAPI function I know of that gives that info is
>>> GetCommandLine, which delivers a single string you have to process.
>>
>>> If you know of a better WinAPI function on Windows, or of some
>>> exported data from a Windows DLL that provides the same info, or
>>> perhaps of some data block within the loaded PE image that contains
>>> it, then I will use that instead.
>>
>> CommandLineToArgv[A|W]
>
> This would be the next step /after/ calling GetCommandLine.

Calling GetCommandLine is no way obligatory and it tells nothing about
the implementation of. When a Windows process is created, the caller can
specify a command line parameter either as an ASCII or as an UTF-16
encoded string. What happens with that parameter, e.g. if it is
marshaled to process address space, converted, or whatever else is up to
Windows. You are making groundless assumptions about the implementation
of Windows API. You shall not, as it is a subject of change at any time
MS finds that appropriate.

> This subthread is about the similarity between:
>
>   - Command line parsing (chopping one input line into separate args)
>   - My language's Readln which reads separate items from one input line
>
> DB said the latter can't possibly work because of too many unknowns. Yet
> it hasn't stopped shells using command line parameters.

He is right.

1. If the OS does not impose a specific way of treating parameters,
there is no safe way to process arguments.

2. If the OS, as Linux does, requires parameters pre-parsed outside the
process, there are again limits to what could be done. E.g. in Linux
there is no reliable way to get the original command line it simply
might not exist.

In both cases there is absolutely no guarantee of any correspondence
between the "perceived" command line and what the process gets.

Bart

unread,
Dec 7, 2021, 3:49:49 PM12/7/21
to
On 07/12/2021 19:31, Dmitry A. Kazakov wrote:
> On 2021-12-07 17:54, Bart wrote:
>
>> In any case, fetching the command params is done after the application
>> has started running.
>
> No, the process /= application. There is a lot of things happening
> between creation of a process and the application running on the context
> of that process (or processes).

OK, it has to load the PE and do a bunch of fixups, but eventually it
will pass control to the entry point.

Then, where are the command line parameters to be found?

I can tell you they are not on the stack, which is where they must be if
C's main(argc, argv) is to work properly.

They have to be put there. Windows will not do that. The application's
language's startup code must do it.

That's the bit /I/ write.


>> DB said the latter can't possibly work because of too many unknowns.
>> Yet it hasn't stopped shells using command line parameters.
>
> He is right.
>
> 1. If the OS does not impose a specific way of treating parameters,
> there is no safe way to process arguments.

> 2. If the OS, as Linux does, requires parameters pre-parsed outside the
> process, there are again limits to what could be done. E.g. in Linux
> there is no reliable way to get the original command line it simply
> might not exist.
>
> In both cases there is absolutely no guarantee of any correspondence
> between the "perceived" command line and what the process gets.

The point is that C, somehow, ended up with a scheme where that one line
of commands WAS processed into convenient chunks for the application
work work.

Despite there being 'too many variables' to work; too many possible ways
that different programmers might want that command line parsed.

So, why can't a language also specify a set of defaults for proper
line-reading routines:

readln a, b, c

But I can see that I'm banging my head against a brick wall:

* No one here is ever going to admit that Bart's Readln statements
might actually be a good idea, despite C command-line processing
doing pretty much the same thing.

* And apparently no one is going to admit that that command-line
processing is not actually done automatically by Windows; it is up
to the startup code of a language implementation to get it sorted




Dmitry A. Kazakov

unread,
Dec 7, 2021, 4:36:31 PM12/7/21
to
On 2021-12-07 21:49, Bart wrote:
> On 07/12/2021 19:31, Dmitry A. Kazakov wrote:
>> On 2021-12-07 17:54, Bart wrote:
>>
>>> In any case, fetching the command params is done after the
>>> application has started running.
>>
>> No, the process /= application. There is a lot of things happening
>> between creation of a process and the application running on the
>> context of that process (or processes).
>
> OK, it has to load the PE and do a bunch of fixups, but eventually it
> will pass control to the entry point.
>
> Then, where are the command line parameters to be found?

> I can tell you they are not on the stack, which is where they must be
> if C's main(argc, argv) is to work properly.

If calling conventions are to use the stack, then both argc and
argv (a pointer) are on the stack, if not they are in the registers, why
should I care?

You are trying to make a point about some imaginary implementation. Even
if your musings were true that would not prove or disprove anything. The
API are as they are. The OS can send the command line to another end of
the universe and back using quantum entanglement. So?

> They have to be put there Windows will not do that. The
> application's language's startup code must do it.
>
> That's the bit /I/ write.

There was never objection to that. Prior to calling main() a lot of
things happens, and?

> The point is that C, somehow, ended up with a scheme where that one
> line of commands WAS processed into convenient chunks for the
> application work work.

Nope. You can call a program without having any commands at all:

1. Windows:

https://docs.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-createprocessa

2. Linux:

https://www.man7.org/linux/man-pages/man3/posix_spawn.3.html

Again, these are interfaces to bring C's argc and argv into a desired
state, nothing more. There is no need to parse anything, just pass what
you want a be done with that.

> But I can see that I'm banging my head against a brick wall:

A quite unhealthy custom...

> * No one here is ever going to admit that Bart's Readln statements
> might actually be a good idea, despite C command-line processing
> doing pretty much the same thing.

It is not. The real problems with this stuff is lack of typing and its
low level nature. Consider passing another process as a parameter
accompained by access rights, an end of a stream, an event etc. These
produce an incredibly ugly code both under Windows and Linux.

> * And apparently no one is going to admit that that command-line
> processing is not actually done automatically by Windows; it is up to
> the startup code of a language implementation to get it sorted

It is done automatically by the CRT.

David Brown

unread,
Dec 8, 2021, 3:37:13 AM12/8/21
to
On 07/12/2021 21:49, Bart wrote:
> So, why can't a language also specify a set of defaults for proper
> line-reading routines:
>
>    readln a, b, c
>
> But I can see that I'm banging my head against a brick wall:
>
>  * No one here is ever going to admit that Bart's Readln statements
>    might actually be a good idea, despite C command-line processing
>    doing pretty much the same thing.
>

You are not the only one who feels like he is banging his head against a
wall. You misinterpret everything though your paranoia.

Your "readln" could be a perfectly good solution - for /your/ language,
and /your/ needs. Something similar could be useful in wider contexts.

Where you are wrong, however, is in your believe that it is somehow
better than any other solution, or that it covers other peoples' needs,
or that it is somehow "fundamental" and something that should be part of
any language.

It doesn't cover typing, formatting, separators, syntax matching,
errors, or any of a dozen other possible requirements. If you don't
need any of that - you've just got a little script or a dedicated
program run in a specific way - then that's fine. People who need
something more, can't use it.

And no, just because you don't like C's input facilities does not mean
your methods are naturally superior.

And just because someone says your readln is too limited and simplistic,
does not mean they think C's alternatives are good or complete.

A low-level language needs basic, raw input facilities that you can use
to build the high-level input concepts that you need for your use. It
does not need high-level input facilities - those should be in libraries
so the user can choose what they need at the time (either from standard
libraries of common solutions, or roll their own specialist one).

A simple, limited high-level language can get away with saying "this is
what you get - take it or leave it". Much of the programming world left
such philosophies behind decades ago, but if you want to hang onto it
with your own language, that's up to you - that's an advantage of having
your own language.




Bart

unread,
Dec 8, 2021, 6:22:37 AM12/8/21
to
Having easy-to-use Read/Print statements doesn't mean more advanced or
more customised ways of doing i/o are off the table. (For a start, I can
call scanf/printf etc via the FFI of my language, if absolutely
necessary, but it rarely is.)

One important difference with mine is that they are line-oriented. C
input especially is character-oriented so it will see \n as white space,
and gives rise to all sorts of synchronisation issues when input
(interactive from keyword, or text files) /is/ strongly line-oriented.

This is an idea I had yesterday which I've now implemented:

proc start =
int a,b,c

readln @cmdline, a, b, c

println "Args:", a, b, c
println "Total:", a + b + c
end

If I invoke this program like this:

prog 10 20 30

it will read those as numbers and print their sum. (If they're not
numbers or just missing, it reads zeros.)

But look, I can do this too:

prog 10,20,30
prog 10, 20, 30

it still works! (Because it's using the same code as my Readln uses
elsewhere.)

Here's the C equivalent:

#include <stdio.h> // for printf
#include <stdlib.h> // for atoi

int main(int n, char** argv) {
int a = atoi(argv[1]);
int b = atoi(argv[2]);
int c = atoi(argv[3]);

printf("Args: %d %d %d\n", a, b, c);
printf("Total: %d\n", a + b + c);
}

If works with "10 20 30", and with "10, 20, 30". But with "10,20,30" or
"10" or no input, it crashes.

BTW for input of "10 20 30", n has the value 4, obviously!


One more trick: I decide to make a, b, c floats. In my code, I just
change "int" to "real", and it just works.

In the C, I need to change "int" to "double", all "atoi" to "strtod",
and all "%d" to "%f".

Is that it? Not quite: strtod needs a NULL second argument. Or maybe
'atof' could have been used?

Yeah, you can clearly see how C is superior here; you need that precise
control!

James Harris

unread,
Dec 13, 2021, 12:42:53 PM12/13/21
to
On 07/12/2021 10:14, Dmitry A. Kazakov wrote:
> On 2021-12-07 10:44, David Brown wrote:
>> On 07/12/2021 08:54, Dmitry A. Kazakov wrote:

...

>>> [ Though I think UNIX missed an opportunity to make it even worse.
>>> Consider if it not only expanded file lists but also opened the files
>>> and passed the file descriptors to the process! ]
>>
>> That would not make any sense - command line parameters are not
>> necessarily files!
>
> They are, unless introduced by a symbol of a key, e.g. /a, -a, --a etc,
> so was the "convention." I never liked it, BTW.
>

What convention would you prefer? I have tried to come up with something
better but without success.

BTW, words on a command line don't have to be file names.


--
James Harris

Dmitry A. Kazakov

unread,
Dec 13, 2021, 2:45:53 PM12/13/21
to
On 2021-12-13 18:42, James Harris wrote:
> On 07/12/2021 10:14, Dmitry A. Kazakov wrote:
>> On 2021-12-07 10:44, David Brown wrote:
>>> On 07/12/2021 08:54, Dmitry A. Kazakov wrote:
>
> ...
>
>>>> [ Though I think UNIX missed an opportunity to make it even worse.
>>>> Consider if it not only expanded file lists but also opened the files
>>>> and passed the file descriptors to the process! ]
>>>
>>> That would not make any sense - command line parameters are not
>>> necessarily files!
>>
>> They are, unless introduced by a symbol of a key, e.g. /a, -a, --a
>> etc, so was the "convention." I never liked it, BTW.
>>
>
> What convention would you prefer? I have tried to come up with something
> better but without success.

Same here. I prefer something that resembles a sentence, but it is
difficult to remember too.

> BTW, words on a command line don't have to be file names.

Yes, which is why expanding filename wildcards was a bad idea from the
start.

I think that there is no solution. A command line language is a problem
domain one. All problem domain languages are bad no matter how you
design them. It is a law of nature. So any command line language is
necessarily doomed.

An alternative existed since early days when OS came from the same
vendor. Many things were configurable per unified UI. That reduced need
in the command line languages. File managers helped to get rid on
command line file operations. IDE helped with compiler switches. Around
mid-90s all UIs switched to OO. You clicked the mouse on the object and
got the list of "virtual" functions applicable to the object. [This does
not work well with many objects, that pesky multiple dispatch is in the
way.]

Actually both Windows and Linux go this path deploying registry, XML,
SQLite DBs, UIs, managers etc to keep and modify parameters instead of
using commands. The success is sort of questionable.

P.S. The younger generation seems to be unaware of command line
interfaces. It was fun to watch Linus Tech Tips (LTT) Linux challenge
series on youtube. Two guys in 30s capable to install NAS servers and
tinkering hardware could not configure and use a Linux box fighting
through macabre Linux GUIs when a command line would do the work in 5
minutes.
0 new messages