Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

C Plagiarism

74 views
Skip to first unread message

Bart

unread,
Nov 19, 2022, 11:01:57 AM11/19/22
to

On 16/11/2022 16:50, David Brown wrote:
> Yes, but for you, a "must-have" list for a programming language would be
> mainly "must be roughly like ancient style C in functionality, but with
> enough change in syntax and appearance so that no one will think it is
> C". If that's what you like, and what pays for your daily bread, then
> that's absolutely fine.

On 18/11/2022 07:12, David Brown wrote:
> Yes, it is a lot like C. It has a number of changes, some that I think
> are good, some that I think are bad, but basically it is mostly like C.

The above remarks implies strongly that my systems language is a rip-off
of C.

This is completely untrue.

I started devising languages in 1981, but didn't write my first C
program until at least 1992 (which is when I acquired a compiler).

I did buy the K&R book around 1982, but was disappointed enough by the
language that I sold it a colleague (at a significant loss too). That
1992 Visual C compiler was itself given away for nothing.

I didn't write any significant C program until about 2010, and even for
that, I had to create a syntax wrapper, preprocessed with a script, to
make the process more palatable.

My influences were these languages:

Algol 68 (this one I'd never used, only read about it avidly)
Algol 60
Pascal
Fortran IV
Babbage (a machine-oriented language I'd implemented for PDP10)
ASM for PDP10 and for Z80

There were some things that were eventually adopted from C, but much
later on:

* 0xABC notation for hex constants (instead of 0ABCH)
* F() notation for calling functions with no parameters
(instead of just F)
* Allowing F as well as &F to create function references
* '$caligned' option for structs, to force C-style member
layouts, but mainly to be inline with such structs in APIs

The design of my systems language, especially its type system, was
largely driven by machine architecture, so necessarily had to have
similarities with other lower level languages.

C itself has a considerable number of differences from my approach:

* An utterly different syntax for a start (and a crazy one too)

* Character- not line-oriented

* It's case-sensitive

* Arrays are indexed always from 0

* It uses 3 'char' types (which are really small integers)

* It has very loosely defined integer types

* It has fixed-width types often defined on top of those loose
types, with unwelcome consequences (eg. you cannot use %lld or 123LL
for those u/intN_t types, or int32_t may or may not be a synonym for
either int or long)

* It conflates arrays and pointers (a fact that has significant
consequences)

* It doesn't support value arrays in expressions or for direct
parameter passing (only when contained within structs)

* It has separate statements and expressions

* It has block scopes

* It has the concept of struct tags

* It uses different namespaces for labels, tags and everyting else

* How it interprets hex floating point constants is a dramatic
difference, being a mix of hex, decimal and binary (mine is pure
hex)

* There are very peculiar rules for how many {} pairs are needed in
data initialisers

* It uses some 30 system headers, needed even for basic functionality
(eg. it needs a header to enable 'printf', 'int8_t', 'NULL`)

* It doesn't allow out-of-order declarations, or forward declarations
are needed

* Augmented assignments like `a += b` return a value

* It doesn't have a conventional for-loop

* It has a 'const' attribute on types

* It doesn't have default parameter values, or keyword parameters,
or language-supported reference parameters

* It makes signed integer overflow, and lots of other aspects,
undefined behaviour

* The rules for mixing signed and unsigned ints in binary operations,
in terms of what signedness will be used in evaluation and result,
are elaborate

* It uses primitive means for printing values (to print an elaborate
expression that may involve opaque types, you need to know the
exact type

* There is no easy way to get the length of a fixed array type

* It has a token-based macro system on which it relies extensively for
added functionality.

Lots of other things, but in all, all things my 'ripoff' has
inexplicably never bothered to replicate. Plus there's all the bigger
stuff I listed elsewhere that I have and C doesn't.

From the lofty, condescending viewpoint of a functional language, this
may all seem petty squabbling about inconsequential details.

But for somebody involved in this space of devising and implementing
these lower level languages (the kind that actually run things) these
are all significant.

James Harris

unread,
Nov 19, 2022, 3:17:32 PM11/19/22
to
On 19/11/2022 16:01, Bart wrote:
>
> On 16/11/2022 16:50, David Brown wrote:
> > Yes, but for you, a "must-have" list for a programming language would be
> > mainly "must be roughly like ancient style C in functionality, but with
> > enough change in syntax and appearance so that no one will think it is
> > C".  If that's what you like, and what pays for your daily bread, then
> > that's absolutely fine.
>
> On 18/11/2022 07:12, David Brown wrote:
> > Yes, it is a lot like C.  It has a number of changes, some that I think
> > are good, some that I think are bad, but basically it is mostly like C.
>
> The above remarks implies strongly that my systems language is a rip-off
> of C.

I don't think anyone could accuse /you/ of copying C! Your view of it is
consistently negative. IIRC you even produced a long list of things
which are wrong with C.

...

> My influences were these languages:
>
>     Algol 68 (this one I'd never used, only read about it avidly)
>     Algol 60
>     Pascal
>     Fortran IV
>     Babbage (a machine-oriented language I'd implemented for PDP10)
>     ASM for PDP10 and for Z80

I try to keep my main influences to hardware and various assembly
languages I've used over the years. But even though we try not to be
influenced by C I don't think any of us can help it. Two reasons: C
became the base for so many languages which came after it, and C so well
fits the underlying machine.

I even suspect that the CPUs we use today are also as they are in part
due to C. It has been that influential.


--
James Harris


Bart

unread,
Nov 19, 2022, 3:30:15 PM11/19/22
to
On 19/11/2022 20:17, James Harris wrote:

>
> I try to keep my main influences to hardware and various assembly
> languages I've used over the years. But even though we try not to be
> influenced by C I don't think any of us can help it. Two reasons: C
> became the base for so many languages which came after it, and C so well
> fits the underlying machine.
>
> I even suspect that the CPUs we use today are also as they are in part
> due to C. It has been that influential.

Well, there's a lot of C code around that needs to be keep working.

However, what aspects of today's processors do you think owe anything to C?

The progression from 8 to 16 to 32 to 64 bits and beyond has long been
on the cards, irrespective of languages.

Actually C is lagging behind since most implementations are stuck with a
32-bit int type. Which means lots of software, for those lazily using
'int' everywhere, will perpetuate the limitations of that type.

C famously also doesn't like to pin down its types. It doesn't even have
a `byte` type, and its `char` type, apart from not have a specified
signedness, could have any width of 8 bits or more.

James Harris

unread,
Nov 19, 2022, 4:02:55 PM11/19/22
to
On 19/11/2022 20:30, Bart wrote:
> On 19/11/2022 20:17, James Harris wrote:
>
>>
>> I try to keep my main influences to hardware and various assembly
>> languages I've used over the years. But even though we try not to be
>> influenced by C I don't think any of us can help it. Two reasons: C
>> became the base for so many languages which came after it, and C so
>> well fits the underlying machine.
>>
>> I even suspect that the CPUs we use today are also as they are in part
>> due to C. It has been that influential.
>
> Well, there's a lot of C code around that needs to be keep working.

Yes.

>
> However, what aspects of today's processors do you think owe anything to C?

Things like the 8-bit byte, 2's complement, and the lack of segmentation.

>
> The progression from 8 to 16 to 32 to 64 bits and beyond has long been
> on the cards, irrespective of languages.
>
> Actually C is lagging behind since most implementations are stuck with a
> 32-bit int type. Which means lots of software, for those lazily using
> 'int' everywhere, will perpetuate the limitations of that type.
>
> C famously also doesn't like to pin down its types. It doesn't even have
> a `byte` type, and its `char` type, apart from not have a specified
> signedness, could have any width of 8 bits or more.

Pre C99 yes. But AIUI since C99 C has had very precise types such as

int64_t

but it only allows specific sizes.


--
James Harris


Bart

unread,
Nov 19, 2022, 4:49:05 PM11/19/22
to
On 19/11/2022 21:02, James Harris wrote:
> On 19/11/2022 20:30, Bart wrote:
>> On 19/11/2022 20:17, James Harris wrote:
>>
>>>
>>> I try to keep my main influences to hardware and various assembly
>>> languages I've used over the years. But even though we try not to be
>>> influenced by C I don't think any of us can help it. Two reasons: C
>>> became the base for so many languages which came after it, and C so
>>> well fits the underlying machine.
>>>
>>> I even suspect that the CPUs we use today are also as they are in
>>> part due to C. It has been that influential.
>>
>> Well, there's a lot of C code around that needs to be keep working.
>
> Yes.
>
>>
>> However, what aspects of today's processors do you think owe anything
>> to C?
>
> Things like the 8-bit byte, 2's complement, and the lack of segmentation.

Really? C was pretty much the only language in the world that does not
specify the size of a byte. (It doesn't even a 'byte' type.)

And it's a language that, even now (until C23) DOESN'T stipulate that
integers use two's complement.

As for segmentation, or lack of, that was very common across machines.

It is really nothing at all to do with C. (How would it have influenced
that anyway, given that C implementions were adept are dealing with any
memory model?)

>
>>
>> The progression from 8 to 16 to 32 to 64 bits and beyond has long been
>> on the cards, irrespective of languages.
>>
>> Actually C is lagging behind since most implementations are stuck with
>> a 32-bit int type. Which means lots of software, for those lazily
>> using 'int' everywhere, will perpetuate the limitations of that type.
>>
>> C famously also doesn't like to pin down its types. It doesn't even
>> have a `byte` type, and its `char` type, apart from not have a
>> specified signedness, could have any width of 8 bits or more.
>
> Pre C99 yes. But AIUI since C99 C has had very precise types such as
>
>   int64_t

I'm sure the byte type, it's size and byte-addressibility, was more
influenced more by IBM, such as with its 360 mainframes from the 1960s
BC (Before C). The first byte-addressed machine I used was a 360-clone.

In any case, I would dispute that C even now properly has fixed-width
types. First, you need to do this to enable them:

#include <stdint.h>

Otherwise it knows nothing about them. Second, if you look inside a
typical stdint.h file (this one is from gcc/TDM on Windows), you might
well see:

typedef signed char int8_t;
typedef unsigned char uint8_t;

Nothing here guarantees that int8_t will be an 8-bit type; these
'exact-width' types are defined on top of those loosely-defined types.
They're an illusion.


James Harris

unread,
Nov 19, 2022, 5:23:36 PM11/19/22
to
On 19/11/2022 21:49, Bart wrote:
> On 19/11/2022 21:02, James Harris wrote:
>> On 19/11/2022 20:30, Bart wrote:
>>> On 19/11/2022 20:17, James Harris wrote:

...

>>>> I even suspect that the CPUs we use today are also as they are in
>>>> part due to C. It has been that influential.
>>>
>>> Well, there's a lot of C code around that needs to be keep working.
>>
>> Yes.
>>
>>>
>>> However, what aspects of today's processors do you think owe anything
>>> to C?
>>
>> Things like the 8-bit byte, 2's complement, and the lack of segmentation.
>
> Really? C was pretty much the only language in the world that does not
> specify the size of a byte. (It doesn't even a 'byte' type.)
>
> And it's a language that, even now (until C23) DOESN'T stipulate that
> integers use two's complement.

That's not what I was thinking. Rather, it was C's lower-level approach
to storage which helped cement in programmers' minds memory as an array
of bytes. Kernighan's C text even included an allocator which used
standard C to manage memory.

Don't get me wrong I am not saying C was the main driver or even that we
wouldn't have had 2's complement and 8-bit bytes without it but that C
gave programmers access to implementation details,and the logic of chars
using 8 bits all encouraged programmers and IT people in general to
think in terms of octet-addressable storage.

>
> As for segmentation, or lack of, that was very common across machines.

I remember reading that when AMD wanted to design a 64-bit architecture
they asked programmers (especially at Microsoft) what they wanted. One
thing was 'no segmentation'. The C model had encouraged programmers to
think in terms of flat address spaces, and the mainstream segmented
approach for x86 was a nightmare that people didn't want to repeat.

...

>>> C famously also doesn't like to pin down its types. It doesn't even
>>> have a `byte` type, and its `char` type, apart from not have a
>>> specified signedness, could have any width of 8 bits or more.
>>
>> Pre C99 yes. But AIUI since C99 C has had very precise types such as
>>
>>    int64_t
>
> I'm sure the byte type, it's size and byte-addressibility, was more
> influenced more by IBM, such as with its 360 mainframes from the 1960s
> BC (Before C). The first byte-addressed machine I used was a 360-clone.

I used a 6502 and a Z80 before starting work but probably like you I
began work on S360. IIRC IBM pioneered different architectures
(including various byte sizes) on their Stretch product.

>
> In any case, I would dispute that C even now properly has fixed-width
> types. First, you need to do this to enable them:
>
>     #include <stdint.h>
>
> Otherwise it knows nothing about them.

Types don't have to be inbuilt to be provided as part of the standard.

> Second, if you look inside a
> typical stdint.h file (this one is from gcc/TDM on Windows), you might
> well see:
>
>     typedef signed char int8_t;
>     typedef unsigned char uint8_t;
>
> Nothing here guarantees that int8_t will be an 8-bit type; these
> 'exact-width' types are defined on top of those loosely-defined types.
> They're an illusion.

The header is built to match the distribution.


--
James Harris


Bart

unread,
Nov 19, 2022, 7:35:02 PM11/19/22
to
On 19/11/2022 22:23, James Harris wrote:
> On 19/11/2022 21:49, Bart wrote:

>> Really? C was pretty much the only language in the world that does not
>> specify the size of a byte. (It doesn't even a 'byte' type.)
>>
>> And it's a language that, even now (until C23) DOESN'T stipulate that
>> integers use two's complement.
>
> That's not what I was thinking. Rather, it was C's lower-level approach
> to storage which helped cement in programmers' minds memory as an array
> of bytes. Kernighan's C text even included an allocator which used
> standard C to manage memory.

> Don't get me wrong I am not saying C was the main driver or even that we
> wouldn't have had 2's complement and 8-bit bytes without it but that C
> gave programmers access to implementation details,and the logic of chars
> using 8 bits all encouraged programmers and IT people in general to
> think in terms of octet-addressable storage.


>>
>> As for segmentation, or lack of, that was very common across machines.
>
> I remember reading that when AMD wanted to design a 64-bit architecture
> they asked programmers (especially at Microsoft) what they wanted. One
> thing was 'no segmentation'. The C model had encouraged programmers to
> think in terms of flat address spaces, and the mainstream segmented
> approach for x86 was a nightmare that people didn't want to repeat.


I think you're ascribing too much to C. In what way did any other
languages (Algol, Pascal, Cobol, Fortran, even Ada by then) encourage
the use of segmented memory?

Do you mean because C required the use of different kinds of pointers,
and people were fed up with that? Whereas other languages hid that
detail better.

You might as well say then that Assembly was equally responsible since
it was even more of a pain to deal with segments!


(Actually, aren't the segments still there on x86? Except they are 4GB
in size instead of 64KB.)

>> Nothing here guarantees that int8_t will be an 8-bit type; these
>> 'exact-width' types are defined on top of those loosely-defined types.
>> They're an illusion.
>
> The header is built to match the distribution.

Still, it is something when the language most famous for being 'close to
the metal' doesn't allow you to use a byte type, unless you enable it.


David Brown

unread,
Nov 21, 2022, 10:30:29 AM11/21/22
to
On 19/11/2022 17:01, Bart wrote:
>
> On 16/11/2022 16:50, David Brown wrote:
> > Yes, but for you, a "must-have" list for a programming language would be
> > mainly "must be roughly like ancient style C in functionality, but with
> > enough change in syntax and appearance so that no one will think it is
> > C".  If that's what you like, and what pays for your daily bread, then
> > that's absolutely fine.
>
> On 18/11/2022 07:12, David Brown wrote:
> > Yes, it is a lot like C.  It has a number of changes, some that I think
> > are good, some that I think are bad, but basically it is mostly like C.
>
> The above remarks implies strongly that my systems language is a rip-off
> of C.
>

No, it does not. You can infer what you want from what I write, but I
don't see any such implications from my remark. If anyone were to write
a (relatively) simple structured language for low level work, suitable
for "direct" compilation to assembly on a reasonable selection of common
general-purpose processors, and with the aim of giving a "portable
alternative to writing in assembly", then the result will inevitably
have a good deal in common with C. There can be plenty of differences
in the syntax and details, but the "ethos" or "flavour" of the language
will be similar.

Note that I have referred to Pascal as C-like in this sense.

David Brown

unread,
Nov 21, 2022, 12:56:04 PM11/21/22
to
On 19/11/2022 22:49, Bart wrote:
> On 19/11/2022 21:02, James Harris wrote:
>> On 19/11/2022 20:30, Bart wrote:
>>> On 19/11/2022 20:17, James Harris wrote:
>>>
>>>>
>>>> I try to keep my main influences to hardware and various assembly
>>>> languages I've used over the years. But even though we try not to be
>>>> influenced by C I don't think any of us can help it. Two reasons: C
>>>> became the base for so many languages which came after it, and C so
>>>> well fits the underlying machine.
>>>>
>>>> I even suspect that the CPUs we use today are also as they are in
>>>> part due to C. It has been that influential.

C is /massively/ influential to the general purpose CPUs we have today.
The prime requirement for almost any CPU design is that you should be
able to use it efficiently for C. After all, the great majority of
software is written in languages that, at their core, are similar to C
(in the sense that once the compiler front-end has finished with them,
you have variables, imperative functions, pointers, objects in memory,
etc., much like C). Those languages that are significantly different
rely on run-times and libraries that are written in C.

>>>
>>> Well, there's a lot of C code around that needs to be keep working.
>>
>> Yes.
>>
>>>
>>> However, what aspects of today's processors do you think owe anything
>>> to C?
>>
>> Things like the 8-bit byte, 2's complement, and the lack of segmentation.
>
> Really? C was pretty much the only language in the world that does not
> specify the size of a byte. (It doesn't even a 'byte' type.)
>

8-bit byte and two's complement were, I think, inevitable regardless of
C. But while the C standard does not require them, their popularity has
grown along with C.

> And it's a language that, even now (until C23) DOESN'T stipulate that
> integers use two's complement.
>
> As for segmentation, or lack of, that was very common across machines.
>

There are plenty of architectures that did not have linear addressing,
and there are many advantages of not allowing memory to be viewed and
accessed as one continuous address space (primarily, it can make buffer
overruns and out of bounds accesses almost impossible). C's model does
not /require/ a simple linear memory space, but such a setup makes C far
easier.

> It is really nothing at all to do with C. (How would it have influenced
> that anyway, given that C implementions were adept are dealing with any
> memory model?)
>

C implementations are /not/ good at dealing with non-linear memory, and
lots of C software assumes memory is linear (and also that bytes are
8-bit, and integers are two's complement). Having the C standard
/allow/ more varied systems does not imply that other systems are good
for C.

But of course C was not the only influence on processor evolution.

>>
>>>
>>> The progression from 8 to 16 to 32 to 64 bits and beyond has long
>>> been on the cards, irrespective of languages.
>>>
>>> Actually C is lagging behind since most implementations are stuck
>>> with a 32-bit int type. Which means lots of software, for those
>>> lazily using 'int' everywhere, will perpetuate the limitations of
>>> that type.
>>>
>>> C famously also doesn't like to pin down its types. It doesn't even
>>> have a `byte` type, and its `char` type, apart from not have a
>>> specified signedness, could have any width of 8 bits or more.
>>
>> Pre C99 yes. But AIUI since C99 C has had very precise types such as
>>
>>    int64_t
>
> I'm sure the byte type, it's size and byte-addressibility, was more
> influenced more by IBM, such as with its 360 mainframes from the 1960s
> BC (Before C). The first byte-addressed machine I used was a 360-clone.
>
> In any case, I would dispute that C even now properly has fixed-width
> types. First, you need to do this to enable them:

Dispute all you want - it does not change a thing.

>
>     #include <stdint.h>
>
> Otherwise it knows nothing about them. Second, if you look inside a
> typical stdint.h file (this one is from gcc/TDM on Windows), you might
> well see:
>
>     typedef signed char int8_t;
>     typedef unsigned char uint8_t;
>
> Nothing here guarantees that int8_t will be an 8-bit type; these
> 'exact-width' types are defined on top of those loosely-defined types.
> They're an illusion.
>

Sorry, you are completely wrong here. Feel free to look it up in the C
standards if you don't believe me.


One of the biggest influences C had on processor design was the idea of
a single stack for return addresses and data, with stack pointer +
offset and frame pointer + offset addressing. C is not the only
language that works well with that setup, but it can't really take any
kind of advantage of more advanced setups with multiple stacks or linked
stack frames. Languages that have local functions, such as Pascal or
Ada, could benefit from more sophisticated stack models. Better stack
models on processors would also greatly reduce the risk of stack
overflows, corruption (intentionally or unintentionally) of return
addresses on stacks, and other bugs in software.

However, any kind of guesses as to how processors would have looked
without C, and therefore what influence C /really/ had, are always going
to be speculative.

Bart

unread,
Nov 21, 2022, 1:44:06 PM11/21/22
to
On 21/11/2022 17:56, David Brown wrote:
> On 19/11/2022 22:49, Bart wrote:

>>>>> I even suspect that the CPUs we use today are also as they are in
>>>>> part due to C. It has been that influential.
>
> C is /massively/ influential to the general purpose CPUs we have today.

"Massively" influential? Why, how do you think CPUs would have ended up
without C?

Two of the first machines I used were PDP10 and PDP11, developed by DEC
in the 1960s, both using linear memory spaces. While the former was
word-based, the PDP11 was byte-addressable, just like the IBM 360 also
from the 1960s.

The early microprocessors I used (6800, Z80) also had a linear memory
space, at a time when it was unlikely C implementations existed for
them, or that people even thought that much about C outside of Unix.

>  The prime requirement for almost any CPU design is that you should be
> able to use it efficiently for C.

And not Assembly, or Fortran or any other language? Don't forget that at
the point it all began to change, mid-70s to mid-80, C wasn't that
dominant. Any C implementations for microprocessors were incredibly slow
and produced indifferent code.

The OSes I used (for PDP10, PDP11, ICL 4/72, Z80) had no C involvement.
When x86 popularised segment memory, EVERYBODY hated it, and EVERY
language had a problem with it.

The REASON for segmented memory was becaused 16-bits and address spaces
larger than 64K words didn't mix. When this was eventually fixed on
80386 for x86, that was able to use 32-bit registers.

According to you, without C, we would have been using 64KB segments even
with 32 bit registers, or we maybe would never have got to 32 bits at
all. What nonsense!

(I was designing paper CPUs with linear addressing long before then,
probably like lots of people.)


> ll, the great majority of
> software is written in languages that, at their core, are similar to C
> (in the sense that once the compiler front-end has finished with them,
> you have variables, imperative functions, pointers, objects in memory,
> etc., much like C).

I wish people would just accept that C does not have and never has had a
monopoly on lower level languages.

It a shame that people now associate 'close-to-the-metal' programming
with a language where a function pointer type is written as
`void(*)(void)`, and that's in the simples case.

>

>> Really? C was pretty much the only language in the world that does not
>> specify the size of a byte. (It doesn't even a 'byte' type.)
>>
>
> 8-bit byte and two's complement were, I think, inevitable regardless of
> C.

So were lots of things. It didn't take a clairvoyant to guess that the
next progression of 8 -> 16 was going to be 32 and then 64.

(The Z8000 came out in 1979. It was a 16-bit processor with a register
set that could be accessed as 8, 16, 32 or 64-bit chunks. Actually you
can also look at 68000 from that era, and the NatSemi 32032. I was an
engineer at the time and very familiar with this stuff.

C didn't figure in that world at all as far as I was concerned.)

>> It is really nothing at all to do with C. (How would it have
>> influenced that anyway, given that C implementions were adept are
>> dealing with any memory model?)
>>
>
> C implementations are /not/ good at dealing with non-linear memory,

No longer likes it, as I said.

> But of course C was not the only influence on processor evolution.

OK, you admit now it was not '/massive/'; good!

>>
>>      #include <stdint.h>
>>
>> Otherwise it knows nothing about them. Second, if you look inside a
>> typical stdint.h file (this one is from gcc/TDM on Windows), you might
>> well see:
>>
>>      typedef signed char int8_t;
>>      typedef unsigned char uint8_t;
>>
>> Nothing here guarantees that int8_t will be an 8-bit type; these
>> 'exact-width' types are defined on top of those loosely-defined types.
>> They're an illusion.
>>
>
> Sorry, you are completely wrong here.  Feel free to look it up in the C
> standards if you don't believe me.

The above typedefs are from a C compiler you may have heard of: 'gcc'.
Some may well use internal types such as `__int8`, but the above is the
actual content of stdint.h, and makes `int8_t` a direct synonym for
`signed char`.



> However, any kind of guesses as to how processors would have looked
> without C, and therefore what influence C /really/ had, are always going
> to be speculative.

Without C, another lower-level systems language would have dominated,
since such a language was necessary.

More interesting however is what Unix would have looked like without C.

Dmitry A. Kazakov

unread,
Nov 21, 2022, 3:20:22 PM11/21/22
to
On 2022-11-21 19:44, Bart wrote:

> Two of the first machines I used were PDP10 and PDP11, developed by DEC
> in the 1960s, both using linear memory spaces. While the former was
> word-based, the PDP11 was byte-addressable, just like the IBM 360 also
> from the 1960s.

PDP-11 was not linear. The internal machine address was 24-bit. But the
effective address in the program was 16-bit. The address space was 64K
for data and 64K for code mapped by the virtual memory manager. Some
machines had a third 64K space.

> And not Assembly, or Fortran or any other language?

Assember is not portable. FORTRAN had no pointers. Programmers
implemented memory management on top of an array (e.g. LOGICAL*1, since
it had no bytes or characters either (:-)). Since FORTRAN was totally
untyped, you don't even need to cast anything! (:-))

> The REASON for segmented memory was becaused 16-bits and address spaces
> larger than 64K words didn't mix. When this was eventually fixed on
> 80386 for x86, that was able to use 32-bit registers.

Segmented memory requires less memory registers because the segment size
may vary. A potential advantage, as it was already mentioned, is that
you could theoretically implement bounds checking on top of it. One
example of such techniques was VAX debugger which ran programs at normal
speed between breakpoints. The trick was to place active breakpoints on
no-access pages. I don't advocate segmented memory, BTW.

> More interesting however is what Unix would have looked like without C.

Though I hate both, I don't think C influenced UNIX much.

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

David Brown

unread,
Nov 21, 2022, 3:22:51 PM11/21/22
to
On 21/11/2022 19:44, Bart wrote:
> On 21/11/2022 17:56, David Brown wrote:
>> On 19/11/2022 22:49, Bart wrote:
>
>>>>>> I even suspect that the CPUs we use today are also as they are in
>>>>>> part due to C. It has been that influential.
>>
>> C is /massively/ influential to the general purpose CPUs we have today.
>
> "Massively" influential? Why, how do you think CPUs would have ended up
> without C?

As I said at the end of my previous post, it's very difficult to tell.
Maybe they would be more varied. Maybe we'd have more stacks. Maybe
we'd be freed from the idea that a "pointer" is nothing more than a
linear address - it could have bounds, or access flags. Registers and
memory could hold type information as well as values. Processors could
have had support for multi-threading or parallel processing. They could
have been designed around event models and signal passing, or have
hardware acceleration for accessing code or data by name. They could
have been better at handling coroutines. There are all kinds of
different things hardware /could/ do, at least some of which would
greatly suit some of the many different kinds of programming languages
we have seen through the years.

A few of these have turned up - there are processors with multiple
stacks optimised for Forth, there were early massively parallel
processors designed alongside the Occam language, the company Linn Smart
Computing made a radical new processor design for more efficient
implementation of their own programming language. Some ARM cores had
hardware acceleration for Java virtual machines.

But I have no specific thoughts - predictions about possible parallel
pasts are just as hard as predictions about the future!

>
> Two of the first machines I used were PDP10 and PDP11, developed by DEC
> in the 1960s, both using linear memory spaces. While the former was
> word-based, the PDP11 was byte-addressable, just like the IBM 360 also
> from the 1960s.
>

C was developed originally for these processors, and was a major reason
for their long-term success.

C was designed with some existing processors in mind - I don't think
anyone is suggesting that features such as linear memory came about
solely because of C. But there was more variety of processor
architectures in the old days, while almost all we have now are
processors that are good for running C code.

> The early microprocessors I used (6800, Z80) also had a linear memory
> space, at a time when it was unlikely C implementations existed for
> them, or that people even thought that much about C outside of Unix.
>
>>   The prime requirement for almost any CPU design is that you should
>> be able to use it efficiently for C.
>
> And not Assembly, or Fortran or any other language?

Not assembly, no - /very/ little code is now written in assembly.
FORTRAN efficiency used to be important for processor design, but not
for a very long time. (FORTRAN is near enough the same programming
model as C, however.)

> Don't forget that at
> the point it all began to change, mid-70s to mid-80, C wasn't that
> dominant. Any C implementations for microprocessors were incredibly slow
> and produced indifferent code.
>
> The OSes I used (for PDP10, PDP11, ICL 4/72, Z80) had no C involvement.
> When x86 popularised segment memory, EVERYBODY hated it, and EVERY
> language had a problem with it.
>

Yes - the choice of the 8086 for PC's was a huge mistake. It was purely
economics - the IBM designers wanted a 68000 processor. But IBM PHB's
said that since the IBM PC was just a marketing exercise and they would
never make more than a few thousand machines, technical benefits were
irrelevant and the 8086 devices were cheaper. (By the same logic, they
bought the cheapest OS they could get, despite everyone saying it was
rubbish.)

> The REASON for segmented memory was becaused 16-bits and address spaces
> larger than 64K words didn't mix. When this was eventually fixed on
> 80386 for x86, that was able to use 32-bit registers.
>
> According to you, without C, we would have been using 64KB segments even
> with 32 bit registers, or we maybe would never have got to 32 bits at
> all. What nonsense!
>

Eh, no. I did not say anything /remotely/ like that.

> (I was designing paper CPUs with linear addressing long before then,
> probably like lots of people.)
>
>
>> ll, the great majority of software is written in languages that, at
>> their core, are similar to C (in the sense that once the compiler
>> front-end has finished with them, you have variables, imperative
>> functions, pointers, objects in memory, etc., much like C).
>
> I wish people would just accept that C does not have and never has had a
> monopoly on lower level languages.
>

I does have, and has had for 40+ years, a /near/ monopoly on low-level
languages. You can dislike C as much as you want, but you really cannot
deny that!

> It a shame that people now associate 'close-to-the-metal' programming
> with a language where a function pointer type is written as
> `void(*)(void)`, and that's in the simples case.
>

I don't disagree that it is a shame, or that better (for whatever value
of "better" you like) low-level languages exist or can be made. That
doesn't change the facts.

>>
>
>>> Really? C was pretty much the only language in the world that does
>>> not specify the size of a byte. (It doesn't even a 'byte' type.)
>>>
>>
>> 8-bit byte and two's complement were, I think, inevitable regardless
>> of C.
>
> So were lots of things. It didn't take a clairvoyant to guess that the
> next progression of 8 -> 16 was going to be 32 and then 64.
>

Agreed.

> (The Z8000 came out in 1979. It was a 16-bit processor with a register
> set that could be accessed as 8, 16, 32 or 64-bit chunks. Actually you
> can also look at 68000 from that era, and the NatSemi 32032. I was an
> engineer at the time and very familiar with this stuff.
>
> C didn't figure in that world at all as far as I was concerned.)
>
>>> It is really nothing at all to do with C. (How would it have
>>> influenced that anyway, given that C implementions were adept are
>>> dealing with any memory model?)
>>>
>>
>> C implementations are /not/ good at dealing with non-linear memory,
>
> No longer likes it, as I said.
>
>> But of course C was not the only influence on processor evolution.
>
> OK, you admit now it was not '/massive/'; good!
>

Would you please stop making things up and pretending I said them?

C was a /massive/ influence on processor evolution and the current
standardisation of general-purpose processors as systems for running C
code efficiently. But it was not the only influence, or the sole reason
for current processor design.

>>>
>>>      #include <stdint.h>
>>>
>>> Otherwise it knows nothing about them. Second, if you look inside a
>>> typical stdint.h file (this one is from gcc/TDM on Windows), you
>>> might well see:
>>>
>>>      typedef signed char int8_t;
>>>      typedef unsigned char uint8_t;
>>>
>>> Nothing here guarantees that int8_t will be an 8-bit type; these
>>> 'exact-width' types are defined on top of those loosely-defined
>>> types. They're an illusion.
>>>
>>
>> Sorry, you are completely wrong here.  Feel free to look it up in the
>> C standards if you don't believe me.
>
> The above typedefs are from a C compiler you may have heard of: 'gcc'.
> Some may well use internal types such as `__int8`, but the above is the
> actual content of stdint.h, and makes `int8_t` a direct synonym for
> `signed char`.
>

They are part of C - specified precisely in the C standards. It does
not matter how any particular implementation defines them. The C
standards say they are part of C, and the type names are introduced into
the current namespace using "#include <stdint.h>" (Or "#include
<inttypes.h>".) The standards also say that "int8_t" is an 8-bit type,
with no padding, and two's complement representation. This has been the
case since C99 - there is no "looseness" or "illusions" in these types.

>
>
>> However, any kind of guesses as to how processors would have looked
>> without C, and therefore what influence C /really/ had, are always
>> going to be speculative.
>
> Without C, another lower-level systems language would have dominated,
> since such a language was necessary.

Perhaps - perhaps not. Domination of a particular market or niche does
not always happen. Perhaps we would instead have Forth, Ada, and
compiled BASIC in an equal balance.

>
> More interesting however is what Unix would have looked like without C.

How do you think it would have looked?


Bart

unread,
Nov 21, 2022, 4:38:20 PM11/21/22
to
On 21/11/2022 20:20, Dmitry A. Kazakov wrote:
> On 2022-11-21 19:44, Bart wrote:
>
>> Two of the first machines I used were PDP10 and PDP11, developed by
>> DEC in the 1960s, both using linear memory spaces. While the former
>> was word-based, the PDP11 was byte-addressable, just like the IBM 360
>> also from the 1960s.
>
> PDP-11 was not linear. The internal machine address was 24-bit. But the
> effective address in the program was 16-bit. The address space was 64K
> for data and 64K for code mapped by the virtual memory manager. Some
> machines had a third 64K space.


My PDP11/34 probably didn't have that much memory. But if you couldn't
access more than 64K per task (say for code or data, if treated
separately), then I would still call that linear from the task's point
of view.


>> And not Assembly, or Fortran or any other language?
>
> Assember is not portable.

That is not relevant. The suggestion was that keeping C happy was a
motivation for CPU designers, but a lot of ASM code was still being run too.


> FORTRAN had no pointers. Programmers
> implemented memory management on top of an array

But those arrays work better in linear memory. There was a lot of
Fortran code around too (probably a lot more than C at the time I got
into it), and /that/ code needed to stay efficient too.

So I was questioning whether C was that big a factor at that period when
the architectures we have now were just beginning to be developed.


Dmitry A. Kazakov

unread,
Nov 21, 2022, 5:03:12 PM11/21/22
to
On 2022-11-21 22:38, Bart wrote:
> On 21/11/2022 20:20, Dmitry A. Kazakov wrote:
>> On 2022-11-21 19:44, Bart wrote:
>>
>>> Two of the first machines I used were PDP10 and PDP11, developed by
>>> DEC in the 1960s, both using linear memory spaces. While the former
>>> was word-based, the PDP11 was byte-addressable, just like the IBM 360
>>> also from the 1960s.
>>
>> PDP-11 was not linear. The internal machine address was 24-bit. But
>> the effective address in the program was 16-bit. The address space was
>> 64K for data and 64K for code mapped by the virtual memory manager.
>> Some machines had a third 64K space.
>
> My PDP11/34 probably didn't have that much memory. But if you couldn't
> access more than 64K per task (say for code or data, if treated
> separately), then I would still call that linear from the task's point
> of view.

So is segmented memory if you have single segment. Once you needed more
that 64K of data or code, your linearity would end.

>> FORTRAN had no pointers. Programmers implemented memory management on
>> top of an array
>
> But those arrays work better in linear memory.

FORTRAN was not that high level to support memory mapping on indexing.
The method of handling larger than address space data structures and
code was per loader's overlay trees, kind of precursor of paging/swap.
Segmented or paged played no difference.

Bart

unread,
Nov 22, 2022, 7:38:18 AM11/22/22
to
On 21/11/2022 20:22, David Brown wrote:
> On 21/11/2022 19:44, Bart wrote:

>> Two of the first machines I used were PDP10 and PDP11, developed by
>> DEC in the 1960s, both using linear memory spaces. While the former
>> was word-based, the PDP11 was byte-addressable, just like the IBM 360
>> also from the 1960s.
>>
>
> C was developed originally for these processors, and was a major reason
> for their long-term success.

Of the PDP10 and IBM 360? Designed in the 1960s and discontinued in 1983
and 1979 respectively. C only came out in a first version in 1972.

The PDP11 was superceded around this time (either side of 1980) by the
VAX-11, a 32-bit version, no doubt inspired by the C language, one that
was well known for not specifying the sizes of its types - it adapted to
the size of the hardware.

Do you really believe this stuff?

> C was designed with some existing processors in mind - I don't think
> anyone is suggesting that features such as linear memory came about
> solely because of C.  But there was more variety of processor
> architectures in the old days, while almost all we have now are
> processors that are good for running C code.

As I said, C is the language that adapts itself to the hardware, and in
fact still is the primary language now that can and does run on every
odd-ball architecture.

Which is why it is an odd candidate for a language that was supposed to
drive the evolution of hardware because of its requirements.

>> The early microprocessors I used (6800, Z80) also had a linear memory
>> space, at a time when it was unlikely C implementations existed for
>> them, or that people even thought that much about C outside of Unix.
>>
>>>   The prime requirement for almost any CPU design is that you should
>>> be able to use it efficiently for C.
>>
>> And not Assembly, or Fortran or any other language?
>
> Not assembly, no - /very/ little code is now written in assembly.

Now, yes. I'm talking about that formative period of mid-70s to mid-80s
when everything changed. From being dominated by mainframes, to 32-bit
microprocessors which are only one step behind the 64-bit ones we have now.


> FORTRAN efficiency used to be important for processor design, but not
> for a very long time.  (FORTRAN is near enough the same programming
> model as C, however.)

Oh, right. In that case could be it possibly have been the need to run
Fortran efficiency that was a driving force in that period?

(I spent a year in the late 70s writing Fortran code in two scientific
establishments in the UK. No one used C.)

>> Don't forget that at the point it all began to change, mid-70s to
>> mid-80, C wasn't that dominant. Any C implementations for
>> microprocessors were incredibly slow and produced indifferent code.
>>
>> The OSes I used (for PDP10, PDP11, ICL 4/72, Z80) had no C
>> involvement. When x86 popularised segment memory, EVERYBODY hated it,
>> and EVERY language had a problem with it.
>>
>
> Yes - the choice of the 8086 for PC's was a huge mistake.  It was purely
> economics - the IBM designers wanted a 68000 processor.

When you looked at the 68000 more closely, it had nearly as much
non-orthoganality as the 8086. (I was trying at that time to get my
company to switch to a processor like the 68k.)

(The 8086 was bearable, but it had one poor design choice that had huge
implications: forming an address by shifting a 16-bit segment address by
4 bits instead 8.

That meant an addressing range of only 1MB instead of 16MB, leading to a
situation later where you could cheaply install 4MB or 8MB of memory,
but you couldn't easily make use of it.)


>> According to you, without C, we would have been using 64KB segments
>> even with 32 bit registers, or we maybe would never have got to 32
>> bits at all. What nonsense!
>>
>
> Eh, no.  I did not say anything /remotely/ like that.

It sounds like it! Just accept that C had no more nor less influence
than any other language /at that time/.


> I does have, and has had for 40+ years, a /near/ monopoly on low-level
> languages.  You can dislike C as much as you want, but you really cannot
> deny that!

It's also the fact that /I/ at least have also successively avoided
using C for 40+ years (and, probably fairly uniquely, have used private
languages). I'm sure there are other stories like mine that you don't
hear about.


>>> But of course C was not the only influence on processor evolution.
>>
>> OK, you admit now it was not '/massive/'; good!
>>
>
> Would you please stop making things up and pretending I said them?

You actually said this:

> C is /massively/ influential to the general purpose CPUs we have today.

Which suggests that you don't think any other language comes close.

I don't know which individual language, if any, was most influential,
but I doubt C played a huge part because it came out too late, and was
not that popular in those formative years, but which time the way
processors were going to evolve was becoming clear anyway.

(That is, still dominated by von Neumann architectures, as has been the
case since long before C.)

But C probably has influenced modern 64-bit ABIs, even though they are
supposed to be language-independent.

>> More interesting however is what Unix would have looked like without C.
>
> How do you think it would have looked?

Case insensitive? Or maybe that's just wishful thinking.

David Brown

unread,
Nov 22, 2022, 10:29:45 AM11/22/22
to
On 22/11/2022 13:38, Bart wrote:
> On 21/11/2022 20:22, David Brown wrote:
>> On 21/11/2022 19:44, Bart wrote:
>
>>> Two of the first machines I used were PDP10 and PDP11, developed by
>>> DEC in the 1960s, both using linear memory spaces. While the former
>>> was word-based, the PDP11 was byte-addressable, just like the IBM 360
>>> also from the 1960s.
>>>
>>
>> C was developed originally for these processors, and was a major
>> reason for their long-term success.
>
> Of the PDP10 and IBM 360? Designed in the 1960s and discontinued in 1983
> and 1979 respectively. C only came out in a first version in 1972.
>

I was thinking primarily of the PDP11, which was the first real target
for C (assuming I have my history correct - this was around the time I
was born). And by "long-term success" of these systems, I mean their
successors that were built in the same style - such as the VAX.

> The PDP11 was superceded around this time (either side of 1980) by the
> VAX-11, a 32-bit version, no doubt inspired by the C language, one that
> was well known for not specifying the sizes of its types - it adapted to
> the size of the hardware.
>
> Do you really believe this stuff?
>
>> C was designed with some existing processors in mind - I don't think
>> anyone is suggesting that features such as linear memory came about
>> solely because of C.  But there was more variety of processor
>> architectures in the old days, while almost all we have now are
>> processors that are good for running C code.
>
> As I said, C is the language that adapts itself to the hardware, and in
> fact still is the primary language now that can and does run on every
> odd-ball architecture.
>

C does not "adapt itself to the hardware". It is specified with some
details of features being decided by the implementer. (Some of these
details are quite important.) Part of the reason for this is to allow
efficient implementations on a wide range of hardware, but it also
determines a balance between implementer freedom, and limits that a
programmer can rely upon. There are plenty of cases where different
implementations on the same hardware make different choices of the
details. (Examples include the size of "long" on 64-bit x86 systems
being different for Windows and the rest of the world, or some compilers
for the original m68k having 16-bit int while others had 32-bit int.)

> Which is why it is an odd candidate for a language that was supposed to
> drive the evolution of hardware because of its requirements.

There is a difference between a language being usable on a range of
systems, and being very /efficient/ on a range of systems. You can use
C on an 8-bit AVR processor - there is a gcc port. But it is not a good
processor design for C - there are few pointer registers, 16-bit
manipulation is inefficient, there are separate address spaces for flash
and ram, there is no stack pointer + offset addressing mode. So while C
is far and away the most popular language for programming AVR's, AVR's
are not good processors for C. (Other 8-bit cores such as the 8051 are
even worse, and that is a reason for them being dropped as soon as
32-bit ARM cores became cheap enough.)

>
>>> The early microprocessors I used (6800, Z80) also had a linear memory
>>> space, at a time when it was unlikely C implementations existed for
>>> them, or that people even thought that much about C outside of Unix.
>>>
>>>>   The prime requirement for almost any CPU design is that you should
>>>> be able to use it efficiently for C.
>>>
>>> And not Assembly, or Fortran or any other language?
>>
>> Not assembly, no - /very/ little code is now written in assembly.
>
> Now, yes. I'm talking about that formative period of mid-70s to mid-80s
> when everything changed. From being dominated by mainframes, to 32-bit
> microprocessors which are only one step behind the 64-bit ones we have now.
>

OK, for a time the ability to program efficiently in assembly was
important. But that was already in decline by the early 1980's in big
systems, as we began to see a move towards RISC processors optimised for
compiler output rather than CISC processors optimised for human assembly
coding. (The continued existence of CISC was almost entirely due to the
IBM PC's choice of the 8088 processor.)

>
>> FORTRAN efficiency used to be important for processor design, but not
>> for a very long time.  (FORTRAN is near enough the same programming
>> model as C, however.)
>
> Oh, right. In that case could be it possibly have been the need to run
> Fortran efficiency that was a driving force in that period?

That would have been important too, but C quickly overwhelmed FORTRAN in
popularity. FORTRAN was used in scientific and engineering work, but C
was the choice for systems programming and most application programming.

>
> (I spent a year in the late 70s writing Fortran code in two scientific
> establishments in the UK. No one used C.)
>
>>> Don't forget that at the point it all began to change, mid-70s to
>>> mid-80, C wasn't that dominant. Any C implementations for
>>> microprocessors were incredibly slow and produced indifferent code.
>>>
>>> The OSes I used (for PDP10, PDP11, ICL 4/72, Z80) had no C
>>> involvement. When x86 popularised segment memory, EVERYBODY hated it,
>>> and EVERY language had a problem with it.
>>>
>>
>> Yes - the choice of the 8086 for PC's was a huge mistake.  It was
>> purely economics - the IBM designers wanted a 68000 processor.
>
> When you looked at the 68000 more closely, it had nearly as much
> non-orthoganality as the 8086. (I was trying at that time to get my
> company to switch to a processor like the 68k.)

No, it does not. (Yes, I have looked at it closely, and used 68k
processors extensively.)

>
> (The 8086 was bearable, but it had one poor design choice that had huge
> implications: forming an address by shifting a 16-bit segment address by
> 4 bits instead 8.
>
> That meant an addressing range of only 1MB instead of 16MB, leading to a
> situation later where you could cheaply install 4MB or 8MB of memory,
> but you couldn't easily make use of it.)

The 8086 was horrible in all sorts of ways. Comparing a 68000 with an
8086 is like comparing a Jaguar E-type with a bathtub with wheels. And
for the actual chip used in the first PC, an 8088, half the wheels were
removed.

>
>
>>> According to you, without C, we would have been using 64KB segments
>>> even with 32 bit registers, or we maybe would never have got to 32
>>> bits at all. What nonsense!
>>>
>>
>> Eh, no.  I did not say anything /remotely/ like that.
>
> It sounds like it! Just accept that C had no more nor less influence
> than any other language /at that time/.
>

The most successful (by a huge margin - like it or not) programming
language evolved, spread and conquered the programming world, at the
same time as the basic processor architecture evolved and solidified
into a style that is very good at executing C programs, and is missing
countless features that would be useful for many other kinds of
programming languages. Coincidence? I think not.

Of course there were other languages that benefited from those same
processors, but none were or are as popular as C and its clear descendants.

>
>> I does have, and has had for 40+ years, a /near/ monopoly on low-level
>> languages.  You can dislike C as much as you want, but you really
>> cannot deny that!
>
> It's also the fact that /I/ at least have also successively avoided
> using C for 40+ years (and, probably fairly uniquely, have used private
> languages). I'm sure there are other stories like mine that you don't
> hear about.

Sure. But for every person like you that has made a successful career
with your own language, there are perhaps 100,000 other programmers who
have used other languages as the basis of their careers. 90% of them at
least will have C or its immediate descendants (C++, Java, C#, etc.) as
their main language.

You can have your opinions about quality, but in terms of /quantity/
there is no contest.

>
>
>>>> But of course C was not the only influence on processor evolution.
>>>
>>> OK, you admit now it was not '/massive/'; good!
>>>
>>
>> Would you please stop making things up and pretending I said them?
>
> You actually said this:
>
> > C is /massively/ influential to the general purpose CPUs we have today.
>
> Which suggests that you don't think any other language comes close.

That is correct. But it also means I don't think it was the only reason
processors are the way they are. And I most certainly did not "admit
now that it was not massive".

>
> I don't know which individual language, if any, was most influential,
> but I doubt C played a huge part because it came out too late, and was
> not that popular in those formative years, but which time the way
> processors were going to evolve was becoming clear anyway.
>
> (That is, still dominated by von Neumann architectures, as has been the
> case since long before C.)
>
> But C probably has influenced modern 64-bit ABIs, even though they are
> supposed to be language-independent.
>

What makes you think they are supposed to be language independent? What
makes you think they are not? What makes you care?

The types and terms from C are a very convenient way to describe an ABI,
since it is a language familiar to any programmer who might be
interested in the details of an ABI. Such ABI's only cover a
(relatively) simple common subset of possible interfaces, but do so in a
way that can be used from any language (with wrappers if needed) and can
be extended as needed.

People make ABI's for practical use. MS made the ABI for Win64 to suit
their own needs and uses. AMD and a range of *nix developers (both OS
and application developers) and compiler developers got together to
develop the 64-bit x86 ABI used by everyone else, designed to suit
/there/ needs and uses.

If a language needs something more for their ABI - such as Python
wanting support for Python objects - then that can be built on top of
the standard ABI.


>>> More interesting however is what Unix would have looked like without C.
>>
>> How do you think it would have looked?
>
> Case insensitive? Or maybe that's just wishful thinking.
>

Case insensitivity is a mistake, born from the days before computers
were advanced enough to have small letters as well as capitals. It
leads to ugly inconsistencies, wastes the opportunity to convey useful
semantic information, and is an absolute nightmare as soon as you stray
from the simple English-language alphabet.

I believe Unix's predecessor, Multics, was case-sensitive. But I could
be wrong.

Andy Walker

unread,
Nov 22, 2022, 11:27:11 AM11/22/22
to
On 22/11/2022 15:29, David Brown wrote:
> Case insensitivity is a mistake, born from the days before computers
> were advanced enough to have small letters as well as capitals.

I don't believe I have ever used a computer that did not "have
small letters". There has been some discussion over in "comp.compilers"
recently, but it's basically the difference between punched cards and
paper tape. The Flexowriter can be traced back to the 1920s, and its
most popular form was certainly being used by computers in the 1950s,
so there really weren't many "days before" to be considered.

--
Andy Walker, Nottingham.
Andy's music pages: www.cuboid.me.uk/andy/Music
Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Hertel

Bart

unread,
Nov 22, 2022, 12:13:32 PM11/22/22
to
On 22/11/2022 15:29, David Brown wrote:
> On 22/11/2022 13:38, Bart wrote:

>> When you looked at the 68000 more closely, it had nearly as much
>> non-orthoganality as the 8086. (I was trying at that time to get my
>> company to switch to a processor like the 68k.)
>
> No, it does not.  (Yes, I have looked at it closely, and used 68k
> processors extensively.)

As a compiler writer? The first thing you noticed is that you have to
decide whether to use D-registers or A-registers, as they had different
characteristics, but the 3-bit register field of instructions could only
use one or the other.

That made the 8086 simpler because there was no choice! The registers
were limited and only one was general purpose.

>> But C probably has influenced modern 64-bit ABIs, even though they are
>> supposed to be language-independent.
>>
>
> What makes you think they are supposed to be language independent?  What
> makes you think they are not?  What makes you care?

Language A can talk to language B via the machine's ABI. Where does C
come into it?

Language A can talk to a library or OS component that resides in a DLL,
via the ABI. The library might have been implemented in C, or assembler,
or in anything else, but in binary form, is pure machine code anyway.

What makes /you/ think that such ABIs were invented purely for the use
of C programs? Do you think the designers of the ABI simply assumed that
only programs written in the C language could call into the OS?

When you download a shared library DLL, do you think they have different
versions depending on what language will be using the DLL?

> The types and terms from C are a very convenient way to describe an ABI,

They're pretty terrible actually. The types involved in SYS V ABI can be
expressed as follows in a form that everyone understands and many
languages use:

i8 i16 i32 i64 i128
u8 u16 u32 u64 u128
f32 f64 f128

This document (https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf)
lists the C equivalents as follows (only signed integers shown):

i8 char, signed char
i16 short, signed short
i32 int, signed int
i64 long, signed long, long long, signed long long
i128 __int128, signed __int128

(No use of int8_t etc despite the document dated 2012.)

This comes up in APIs too where it is 100 times more relevant (only
compiler writers care about the API). The C denotations shown here are
not fit for purpose for language-neutral interfaces.

(Notice also that 'long' and 'long long' are both 64 bits, and that
'char' is assumed to be signed. In practice the C denotations would vary
across platforms, while those i8-i128 would stay constant, provided only
that the machine uses conventional register sizes.)

So it's more like, such interfaces were developed /despite/ C.

> since it is a language familiar to any programmer who might be
> interested in the details of an ABI.  Such ABI's only cover a
> (relatively) simple common subset of possible interfaces, but do so in a
> way that can be used from any language (with wrappers if needed) and can
> be extended as needed.
>
> People make ABI's for practical use.  MS made the ABI for Win64 to suit
> their own needs and uses.  AMD and a range of *nix developers (both OS
> and application developers) and compiler developers got together to
> develop the 64-bit x86 ABI used by everyone else, designed to suit
> /there/ needs and uses.

x86-32 used a number of different ABIs depending on language and
compiler. x86-64 tends to use one ABI, which is a strong indication that
that that ABI was intended to work across languages and compilers.


>> Case insensitive? Or maybe that's just wishful thinking.
>>
>
> Case insensitivity is a mistake, born from the days before computers
> were advanced enough to have small letters as well as capitals.  It
> leads to ugly inconsistencies, wastes the opportunity to convey useful
> semantic information, and is an absolute nightmare as soon as you stray
> from the simple English-language alphabet.

Yet Google searches are case-insensitive. How is that possible, given
that search strings can use Unicode which you say does not define case
equivalents across most alphabets?

As are email addresses and domain names.

As are most things in everyday life, even now that it is all tied up
with computers and smartphones and tablets with everything being online.

(Actually, most people's exposure to case-sensitivity is in online
passwords, which is also the worst place to have it, as usually you
can't see them!)

Your objections make no sense at all. Besides which, plenty of
case-insensitive languages, file-systems and shell programs and
applications exist.

> I believe Unix's predecessor, Multics, was case-sensitive.  But I could
> be wrong.

I'm surprised the Unix and C developers even had a terminal that could
do upper and lower case. I was stuck with upper case for the first year
or two. File-systems and global linker symbols were also restricted in
length and case for a long time, to minimise space.

Case-sensitivity was a luxury into the 80s.

Dmitry A. Kazakov

unread,
Nov 22, 2022, 12:47:10 PM11/22/22
to
On 2022-11-22 18:13, Bart wrote:

> Language A can talk to language B via the machine's ABI. Where does C
> come into it?

Data types of arguments including padding/gaping of structures, calling
conventions. E.g. Windows' native calling convention is stdcall, while C
deploys cdecl.

> Language A can talk to a library or OS component that resides in a DLL,
> via the ABI. The library might have been implemented in C, or assembler,
> or in anything else, but in binary form, is pure machine code anyway.

Same as above. Data types, calling conventions.

> What makes /you/ think that such ABIs were invented purely for the use
> of C programs? Do you think the designers of the ABI simply assumed that
> only programs written in the C language could call into the OS?

That depends on the OS.

- VMS used MACRO-11 and unified calling conventions. That was DEC and
that was the time people really care, before the Dark Age of Computing.

- Windows was stdcall, but then some it parts gave way to C.

- UNIXes used C's conventions, naturally.

> When you download a shared library DLL, do you think they have different
> versions depending on what language will be using the DLL?

That is certainly a possibility. There are lots of libraries having
language-specific adapters. If you use a higher level language you would
like to make advantage of this. Usually there are quite complicated
elaboration protocols upon library loading ensuring initialization of
complex objects, versioning consistency all thing the stupid loaders
cannot. The price is that you might not be able to use it with C or
another language.

> I'm surprised the Unix and C developers even had a terminal that could
> do upper and lower case.

No idea, but already DEC VT52 has lower-case.

(Of course, case-sensitivity was an incredibly stupid choice)

Andy Walker

unread,
Nov 22, 2022, 3:14:19 PM11/22/22
to
On 22/11/2022 17:13, Bart wrote:
> I'm surprised the Unix and C developers even had a terminal that
> could do upper and lower case. I was stuck with upper case for the
> first year or two. [...]
> Case-sensitivity was a luxury into the 80s.

As per my nearby article, lower case was available for paper
tape long before [electronic] computers existed. It's difficult to
do word processing without a decent character set; I was doing it
[admittedly in a rather primitive way] in the mid-60s. There were
some peripherals [esp many lineprinters, card punches and teletypes]
that were restricted to upper case, but lower case was scarcely a
"luxury" when many secretaries were using electric typewriters.

Bart

unread,
Nov 22, 2022, 3:17:52 PM11/22/22
to
On 22/11/2022 15:29, David Brown wrote:
> On 22/11/2022 13:38, Bart wrote:

>> As I said, C is the language that adapts itself to the hardware, and
>> in fact still is the primary language now that can and does run on
>> every odd-ball architecture.
>>
>
> C does not "adapt itself to the hardware".

It will work on a greater range of hardware than one of my languages.

For example, mine would have trouble on anything which is not
byte-addressable, has anything other than 8-bit bytes, or supports
primitive types that are now powers of two.

This is important because you are claiming that it is the less fussy C
language which is driving those characteristics, whereas it is more
likely other languages that are more demanding.

You may in fact be partly right in that the existence of controllers
with odd word sizes may actually be due to their being a language which
was conducive to writing suitable custom compilers. If C wasn't around,
they may have needed to be more conforming.

But then you can't have a language which is responsible both for 64-bit
desktop processors, and 24-bit signal processors.

Or perhaps we can, let's just rewrite history so that C is
single-handedly responsible for the design of all hardware, even that
devised a decade before C came out. No other languages matter.


Bart

unread,
Nov 22, 2022, 3:42:18 PM11/22/22
to
On 22/11/2022 17:47, Dmitry A. Kazakov wrote:
> On 2022-11-22 18:13, Bart wrote:
>
>> Language A can talk to language B via the machine's ABI. Where does C
>> come into it?
>
> Data types of arguments including padding/gaping of structures, calling
> conventions.
Actually the Win64 ABI doesn't go into types much at all. The main types
are integers which are 1/2/4/8 bytes, each of which occupies one 64-bit
GP register or one 64-bit stack slot; and floats of 4/8 bytes which are
passed in the bottom end of a 128-bit XMM register, or via one 64-bit
stack slot.

Surely you're not going to claim that this is all thanks to C? That the
hardware uses 64/128-bit GP/XMM registers and requires a 64/128-bit
aligned stack couldn't possibly be the reason?

Or are you going to claim like David Brown that the hardware is like
that solely due to the need to run C programs? (Because any other
language would run perfectly fine with 37-bit integers implemented on a
29-bit-addressable memory.)

It doesn't go into struct layouts much other, but those are mainly
driven by alignment needs, which again are due to hardware, not C.
(Several structs which occur in Win32 API actually aren't strictly aligned.)

> E.g. Windows' native calling convention is stdcall, while C
> deploys cdecl.

That all disappears with 64 bits. With 32-bit DLLs, while there was
still one DLL, you needed to know the call-convention in use; this would
have been part of the API. But while there were 100s of languages, there
were only a handful of call conventions.

>
>> When you download a shared library DLL, do you think they have
>> different versions depending on what language will be using the DLL?
>
> That is certainly a possibility.

My Win64 machine has 3300 DLL files in \windows\system32. Which language
should they be for? It would be crazy to cater to every possible language.

Plus, DLLs tend to include other DLLs; when the OS loads a DLL A, which
imports DLL B, it will not know which language version of B to look for
(and they would all be called B.DLL). All it might do is look for 32-bit
and 64-bit versions which are stored in different places.


Bart

unread,
Nov 22, 2022, 3:54:43 PM11/22/22
to
On 22/11/2022 20:14, Andy Walker wrote:
> On 22/11/2022 17:13, Bart wrote:
>> I'm surprised the Unix and C developers even had a terminal that
>> could do upper and lower case. I was stuck with upper case for the
>> first year or two. [...]
>> Case-sensitivity was a luxury into the 80s.
>
>     As per my nearby article, lower case was available for paper
> tape long before [electronic] computers existed.  It's difficult to
> do word processing without a decent character set;  I was doing it
> [admittedly in a rather primitive way] in the mid-60s.  There were
> some peripherals [esp many lineprinters, card punches and teletypes]
> that were restricted to upper case, but lower case was scarcely a
> "luxury" when many secretaries were using electric typewriters.
>

Perhaps the computer department of my college, and the places I worked
at, were poorly equipped then. We used ASR33s and video terminals that
emulated those teletypes, so upper case only.

All the Fortran I wrote was in upper case that I can remember.

The file systems of my PDP10 machine at least used 'sixbit' encoding, so
could only do upper-case. The 'radix50' encoding of the PDP11 linker
also restricted things to upper case.

The bitmaps fonts of early screens and dot-matrix printers may also have
been limited to upper case (the first video display of my own was).

I think the Tektronix 4010 vector display I used was upper case only.

My point was, there were so many restrictions, how did people manage to
write C code? It was only into the 1980s that I could reliably make use
of mixed case.

Dmitry A. Kazakov

unread,
Nov 22, 2022, 5:24:16 PM11/22/22
to
On 2022-11-22 21:42, Bart wrote:
> On 22/11/2022 17:47, Dmitry A. Kazakov wrote:
>> On 2022-11-22 18:13, Bart wrote:
>>
>>> Language A can talk to language B via the machine's ABI. Where does C
>>> come into it?
>>
>> Data types of arguments including padding/gaping of structures,
>> calling conventions.
> Actually the Win64 ABI doesn't go into types much at all.

It is all about types. The funny thing, it even specifies endianness
thanks to the C's stupidity of unions, see how LARGE_INTEGER is defined.

> Or are you going to claim like David Brown that the hardware is like
> that solely due to the need to run C programs?

Nobody would ever use any hardware if there is no C compiler. So David
is certainly right.

Long ago, there existed Lisp machines, machines designed for tagging
data with types etc. All this sunk down when C took the reign. Today
situation slowly changes with FPGA and the NN hype foaming over...

> That all disappears with 64 bits. With 32-bit DLLs, while there was
> still one DLL, you needed to know the call-convention in use; this would
> have been part of the API. But while there were 100s of languages, there
> were only a handful of call conventions.

There are as many conventions as languages because complex types and
closures require techniques unknown to plain C.

> Plus, DLLs tend to include other DLLs; when the OS loads a DLL A, which
> imports DLL B, it will not know which language version of B to look for
> (and they would all be called B.DLL).

This is why there exists the callback on DLL load. Elaboration stuff is
performed from there. In case of Ada, lots of things happen there
because all library level objects are initialized there, library level
tasks start there etc.

Bart

unread,
Nov 22, 2022, 7:03:17 PM11/22/22
to
On 22/11/2022 22:24, Dmitry A. Kazakov wrote:
> On 2022-11-22 21:42, Bart wrote:
>> On 22/11/2022 17:47, Dmitry A. Kazakov wrote:
>>> On 2022-11-22 18:13, Bart wrote:
>>>
>>>> Language A can talk to language B via the machine's ABI. Where does
>>>> C come into it?
>>>
>>> Data types of arguments including padding/gaping of structures,
>>> calling conventions.
>> Actually the Win64 ABI doesn't go into types much at all.
>
> It is all about types. The funny thing, it even specifies endianness
> thanks to the C's stupidity of unions, see how LARGE_INTEGER is defined.

LARGE_INTEGER is not mentioned in the ABI and is not listed here:
https://learn.microsoft.com/en-us/windows/win32/winprog/windows-data-types.

The ABI really doesn't care about types other than it needs to know how
many bytes values occupy, and whether they need to go into GP or FLOAT
registers. It is quite low-level.

>> Or are you going to claim like David Brown that the hardware is like
>> that solely due to the need to run C programs?
>
> Nobody would ever use any hardware if there is no C compiler. So David
> is certainly right.

You're both certainly wrong. People used hardware before C; they used
hardware without C. And I spent a few years building bare computer
boards that I programmed from scratch, with no C compiler in sight.


> Long ago, there existed Lisp machines, machines designed for tagging
> data with types etc. All this sunk down when C took the reign. Today
> situation slowly changes with FPGA and the NN hype foaming over...

It sunk down because nobody used Lisp.


>> That all disappears with 64 bits. With 32-bit DLLs, while there was
>> still one DLL, you needed to know the call-convention in use; this
>> would have been part of the API. But while there were 100s of
>> languages, there were only a handful of call conventions.
>
> There are as many conventions as languages because complex types and
> closures require techniques unknown to plain C.

If complex language X wants to talk to complex language Y, then they
have to agree on a common way to represent the concepts that they share.
Then they can either build on top of the platform ABI, or devise a
private ABI.

The platform ABI is still needed if they want to make use of a third
party library that exports functions in a lower level API.

Some libraries I can't use even via a DLL and via the ABI because they
use complex C++ types or things like COM. But this will be clear when
you look at their APIs.

I can only use lower-level APIs using more primitive types. But I get
angry when people suggest that such interfaces that I used for many
years within my own languages, even my own hardware, are now being
claimed as an invention of C and that wouldn't exist otherwise. What
nonsense!

Give people a /choice/ of lower level languages and there would have
been more possibilities for writing libraries with such interfaces.

If I export a function F taking an i64 type and returning an i64 type,
it is thanks to C that that is possible? Nothing to do with the hardware
implementing a 64-bit type and making use of that fact.

This is 50% of why I hate C (the other 50% is because it does things in
such a crappy manner - unsigned long long int indeed, which still
actually 74 bits rather than 64), because people credit it with too much.

To me it still looks like something a couple of students threw together
over a drunken weekend for an laugh.


Andy Walker

unread,
Nov 22, 2022, 7:27:06 PM11/22/22
to
On 22/11/2022 20:54, Bart wrote:
> Perhaps the computer department of my college, and the places I
> worked at, were poorly equipped then. We used ASR33s and video
> terminals that emulated those teletypes, so upper case only.
> All the Fortran I wrote was in upper case that I can remember.

Yes, Fortran was very usually upper case, and equally usually on
punched cards rather than paper tape. I can't usefully comment on what
your college did. But in the UK, paper tape equipment was reasonably
common, so were Flexowriters; and Algol [all dialects; amongst others]
was always printed using lower/mixed case [usually with some stropping
regime to allow for upper-case environments]. Lower case may not have
been universal, but it was not your claimed "luxury". Not everyone was
tied to Fortran and "scientific computing".

> The file systems of my PDP10 machine at least used 'sixbit' encoding,
> so could only do upper-case. The 'radix50' encoding of the PDP11
> linker also restricted things to upper case.

File systems? Now /that/ was a luxury! I had been computing for
more than a decade before we got our hands on one. Or an editor. We had
to cut and splice our paper tapes by hand. As for "'sixbit' encoding",
perhaps worth noting that the Flexowriter was six-bit plus parity, and
[eg] Atlas stored eight six-bit characters per 48-bit word. There were
"shift" characters to switch between cases, as on a typewriter.

> The bitmaps fonts of early screens and dot-matrix printers may also
> have been limited to upper case (the first video display of my own
> was).

Some were; and many early screens/printers had really ugly lower
case fonts. Unlike, for example, daisy-wheel printers.

> I think the Tektronix 4010 vector display I used was upper case
> only.
> My point was, there were so many restrictions, how did people manage
> to write C code? It was only into the 1980s that I could reliably
> make use of mixed case.

Well, obviously people with little or no access to peripherals
that could use lower case were SOL. But Bell Labs had such peripherals
and were not, in 1970-odd, planning for Unix/C to take over the world.
So they just used what was available /to them/.

Dmitry A. Kazakov

unread,
Nov 23, 2022, 4:04:32 AM11/23/22
to
On 2022-11-23 01:03, Bart wrote:
> On 22/11/2022 22:24, Dmitry A. Kazakov wrote:
>> On 2022-11-22 21:42, Bart wrote:
>>> On 22/11/2022 17:47, Dmitry A. Kazakov wrote:
>>>> On 2022-11-22 18:13, Bart wrote:
>>>>
>>>>> Language A can talk to language B via the machine's ABI. Where does
>>>>> C come into it?
>>>>
>>>> Data types of arguments including padding/gaping of structures,
>>>> calling conventions.
>>> Actually the Win64 ABI doesn't go into types much at all.
>>
>> It is all about types. The funny thing, it even specifies endianness
>> thanks to the C's stupidity of unions, see how LARGE_INTEGER is defined.
>
> LARGE_INTEGER is not mentioned in the ABI and is not listed here:
> https://learn.microsoft.com/en-us/windows/win32/winprog/windows-data-types.

It is defined in winnt.h

> The ABI really doesn't care about types other than it needs to know how
> many bytes values occupy, and whether they need to go into GP or FLOAT
> registers. It is quite low-level.

At this point, I must ask, did you ever use any OS API at all? Or maybe
you just do not know what a datatype is?

>>> Or are you going to claim like David Brown that the hardware is like
>>> that solely due to the need to run C programs?
>>
>> Nobody would ever use any hardware if there is no C compiler. So David
>> is certainly right.
>
> You're both certainly wrong. People used hardware before C; they used
> hardware without C. And I spent a few years building bare computer
> boards that I programmed from scratch, with no C compiler in sight.

We do not talk about hobby projects.

>>> That all disappears with 64 bits. With 32-bit DLLs, while there was
>>> still one DLL, you needed to know the call-convention in use; this
>>> would have been part of the API. But while there were 100s of
>>> languages, there were only a handful of call conventions.
>>
>> There are as many conventions as languages because complex types and
>> closures require techniques unknown to plain C.
>
> If complex language X wants to talk to complex language Y,

They just don't. Most more or less professionally designed languages
provide interfacing to and from C. That limits the things that could be
interfaced to a bare minimum. Dynamic languages are slightly better
because of their general primitivism and because they are actually
written in C. But dealing with real languages like C++ is almost
impossible, e.g. handing virtual tables etc. So nobody cares.

> If I export a function F taking an i64 type and returning an i64 type,
> it is thanks to C that that is possible?

For the machine you are using, the answer is yes.

> Nothing to do with the hardware
> implementing a 64-bit type and making use of that fact.

Not even with power supply unit and the screws holding the motherboard...

David Brown

unread,
Nov 23, 2022, 5:58:37 AM11/23/22
to
On 22/11/2022 17:27, Andy Walker wrote:
> On 22/11/2022 15:29, David Brown wrote:
>> Case insensitivity is a mistake, born from the days before computers
>> were advanced enough to have small letters as well as capitals.
>
>     I don't believe I have ever used a computer that did not "have
> small letters".  There has been some discussion over in "comp.compilers"
> recently, but it's basically the difference between punched cards and
> paper tape.  The Flexowriter can be traced back to the 1920s, and its
> most popular form was certainly being used by computers in the 1950s,
> so there really weren't many "days before" to be considered.
>

Computers were using 6-bit character encodings well into the 1970's,
before ASCII and EBCDIC became dominant (at least in the Western World).
There are lots of older programming languages where all keywords were
in capitals. When modernising to support small letters, many of these
choose to be case insensitive - allowing people to write code using
small letters instead of ugly, shouty capitals, but keeping
compatibility with existing code.

Such history is not the only reason for a given programming language to
be case insensitive, but it is certainly part of it for some languages.

Bart

unread,
Nov 23, 2022, 6:56:00 AM11/23/22
to
On 23/11/2022 09:04, Dmitry A. Kazakov wrote:
> On 2022-11-23 01:03, Bart wrote:

>> LARGE_INTEGER is not mentioned in the ABI and is not listed here:
>> https://learn.microsoft.com/en-us/windows/win32/winprog/windows-data-types.
>
> It is defined in winnt.h

In my 'windows.h' header for my C compilers, it is defined in windows.h.
So are myriad other types.

Did you look at my link? It says:

"The following table contains the following types: character, integer,
Boolean, pointer, and handle."

LARGE_INTEGER is not in there; it's something that is used for a handful
of functions out of 1000s. Maybe it was an early kind of type for
dealing with 64 bits and kept for compatibility.


>> The ABI really doesn't care about types other than it needs to know
>> how many bytes values occupy, and whether they need to go into GP or
>> FLOAT registers. It is quite low-level.
>
> At this point, I must ask, did you ever use any OS API at all? Or maybe
> you just do not know what a datatype is?

Have you ever looked at the Win64 ABI? Have you ever written compilers
that generate ABI-compliant code?

You are basically manipulating dumb chunks of data that are usually 64
bits wide; you just need to ensure correct alignment, and need to know
whether, is using registers, they need to be go into float registers
instead.

Apart from that, ABIs really, really don't care what that chunk
represents. They are mainly concerned with where things go.

>> You're both certainly wrong. People used hardware before C; they used
>> hardware without C. And I spent a few years building bare computer
>> boards that I programmed from scratch, with no C compiler in sight.
>
> We do not talk about hobby projects.

Who said they were hobby projects? I was an engineer developing business
computers as well as working on dozens on speculative, experimental
projects. I used some 4 kinds of processors in the designs, and
investigated many more as possibilities.

Those were not the days of downloading free compilers off the internet
and having a super-computer on your own desk to run them on.

The point, C did not figure in any of this AT ALL. I doubt I was the
only one either.

Why are you two trying to rewrite history?

>>>> That all disappears with 64 bits. With 32-bit DLLs, while there was
>>>> still one DLL, you needed to know the call-convention in use; this
>>>> would have been part of the API. But while there were 100s of
>>>> languages, there were only a handful of call conventions.
>>>
>>> There are as many conventions as languages because complex types and
>>> closures require techniques unknown to plain C.
>>
>> If complex language X wants to talk to complex language Y,
>
> They just don't. Most more or less professionally designed languages
> provide interfacing to and from C.

They need to provide a FFI to be able to deal with myriad libraries that
use a C-style API. I call it C-style because unfortunately there is no
other name that can describe a type system based around primitive
machine types.

Odd, because you find the same machine types used in a dozen other
contemporary languages.

For some reason, people think a type like int32 was popularised by C.
I'm sure I used such a type in 1980s without any help from C. So did a
million other people. But C gets the credit, EVEN THOUGH IT DIDN'T HAVE
FIXED WIDTH TYPES UNTIL 1999. Go figure.

That limits the things that could be
> interfaced to a bare minimum. Dynamic languages are slightly better
> because of their general primitivism and because they are actually
> written in C. But dealing with real languages like C++ is almost
> impossible, e.g. handing virtual tables etc. So nobody cares.
>
>> If I export a function F taking an i64 type and returning an i64 type,
>> it is thanks to C that that is possible?
>
> For the machine you are using, the answer is yes.

If you really believe that, then both you and David Brown are deluded. I
long suspected that C worshipping was more akin to a religious cult; it
now seems it's more widespread than I thought with people being
brainwashed into believing any old rubbish.

>> Nothing to do with the hardware implementing a 64-bit type and making
>> use of that fact.
>
> Not even with power supply unit and the screws holding the motherboard...

Now I know this is actually a wind-up. Fuck C.


Bart

unread,
Nov 23, 2022, 7:21:44 AM11/23/22
to
On 21/11/2022 15:30, David Brown wrote:
> On 19/11/2022 17:01, Bart wrote:
>>
>> On 16/11/2022 16:50, David Brown wrote:
>>  > Yes, but for you, a "must-have" list for a programming language
>> would be
>>  > mainly "must be roughly like ancient style C in functionality, but
>> with
>>  > enough change in syntax and appearance so that no one will think it is
>>  > C".  If that's what you like, and what pays for your daily bread, then
>>  > that's absolutely fine.
>>
>> On 18/11/2022 07:12, David Brown wrote:
>>  > Yes, it is a lot like C.  It has a number of changes, some that I
>> think
>>  > are good, some that I think are bad, but basically it is mostly
>> like C.
>>
>> The above remarks implies strongly that my systems language is a
>> rip-off of C.
>>
>
> No, it does not.  You can infer what you want from what I write, but I
> don't see any such implications from my remark.

I haven't responded before because I thought people could draw their own
conclusions from your remarks. But it seems it needs to be pointed out;
you wrote:

"must be roughly like ancient style C in functionality, but with enough
change in syntax and appearance so that no one will think it is C"

This is a /very/ thinly veiled suggestion that my language was a rip-off
of C, by copying the language and changing the syntax so that it looked
like a new language.

>  If anyone were to write
> a (relatively) simple structured language for low level work, suitable
> for "direct" compilation to assembly on a reasonable selection of common
> general-purpose processors, and with the aim of giving a "portable
> alternative to writing in assembly", then the result will inevitably
> have a good deal in common with C.  There can be plenty of differences
> in the syntax and details, but the "ethos" or "flavour" of the language
> will be similar.
>
> Note that I have referred to Pascal as C-like in this sense.

Now you're having a go at Pascal; maybe Pascal was a rip-off of C too
(even though it predated it).

I've got a better idea; why not call such languages 'Machine Oriented'?
That was an actual thing in the 1970s; I even implemented one such language.

(A summary of my various compilers is here:
https://github.com/sal55/langs/blob/master/mycompilers.md)

So, 'machine oriented' languages are a kind of language that I
independently discovered, through the demands of my work, were needed
and useful.

I used that to my advantage to create in-house tools to give us an edge.
Probably the same happened in lots of companies.

But, for unfortunately for everyone, it was C that popularised that kind
of language, with a crude, laughable implementation that we are now
stuck with. And that we all now have to kow-tow to. Fuck that.



Dmitry A. Kazakov

unread,
Nov 23, 2022, 7:40:31 AM11/23/22
to
On 2022-11-23 12:55, Bart wrote:
> On 23/11/2022 09:04, Dmitry A. Kazakov wrote:
>> On 2022-11-23 01:03, Bart wrote:
>
>>> LARGE_INTEGER is not mentioned in the ABI and is not listed here:
>>> https://learn.microsoft.com/en-us/windows/win32/winprog/windows-data-types.
>>
>> It is defined in winnt.h
>
> In my 'windows.h' header for my C compilers, it is defined in windows.h.
> So are myriad other types.

Now you have an opportunity to look at it.

> LARGE_INTEGER is not in there; it's something that is used for a handful
> of functions out of 1000s. Maybe it was an early kind of type for
> dealing with 64 bits and kept for compatibility.

LARGE_INTEGER is massively used in Windows API.

> Apart from that, ABIs really, really don't care what that chunk
> represents. They are mainly concerned with where things go.

Try to actually program using Windows API. You will know or at least
read some MS documentation. Start with something simple:

https://learn.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-token_access_information

(:-))

> Why are you two trying to rewrite history?

Maybe because we lived it through and saw it happen?...

> If you really believe that, then both you and David Brown are deluded. I
> long suspected that C worshipping was more akin to a religious cult; it
> now seems it's more widespread than I thought with people being
> brainwashed into believing any old rubbish.

I do not know about David, but I hate C and consider it a very bad
language. That does not alter the fact that C influenced and shaped both
hardware and software as well as corrupted minds of several generations
and is keeping doing so.

Bart

unread,
Nov 23, 2022, 9:04:19 AM11/23/22
to
On 23/11/2022 12:40, Dmitry A. Kazakov wrote:
> On 2022-11-23 12:55, Bart wrote:
>> On 23/11/2022 09:04, Dmitry A. Kazakov wrote:
>>> On 2022-11-23 01:03, Bart wrote:
>>
>>>> LARGE_INTEGER is not mentioned in the ABI and is not listed here:
>>>> https://learn.microsoft.com/en-us/windows/win32/winprog/windows-data-types.
>>>
>>> It is defined in winnt.h
>>
>> In my 'windows.h' header for my C compilers, it is defined in
>> windows.h. So are myriad other types.
>
> Now you have an opportunity to look at it.
>
>> LARGE_INTEGER is not in there; it's something that is used for a
>> handful of functions out of 1000s. Maybe it was an early kind of type
>> for dealing with 64 bits and kept for compatibility.
>
> LARGE_INTEGER is massively used in Windows API.
>
>> Apart from that, ABIs really, really don't care what that chunk
>> represents. They are mainly concerned with where things go.
>
> Try to actually program using Windows API. You will know or at least
> read some MS documentation. Start with something simple:
>
> https://learn.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-token_access_information
>

I've used the WinAPI since the early 90s, but also kept my interactions
with it to a minimum: just enough to be able to write GUI applicaions
used by 1000s of clients.

The reason for keeping it minimal are simple: when you use such a thing
from a private language, then you have to manually create bindings in
your language for every Type, Struct, Enum, Define, Macro and Function,
which would have been a huge undertaking as there many thousands. So you
translate only what is necessary.

>> Why are you two trying to rewrite history?
>
> Maybe because we lived it through and saw it happen?...

Are you that much older than me, or started as a toddler? I lived
through it too, and for the first 16 years, had virtually nothing to do
with C at all. Until I had to use Other People's Software, which in the
case of WinAPI, was defined as C headers.

(Now renamed by MS as C++; for some reason, they want to break the
association with C.)

>
>> If you really believe that, then both you and David Brown are deluded.
>> I long suspected that C worshipping was more akin to a religious cult;
>> it now seems it's more widespread than I thought with people being
>> brainwashed into believing any old rubbish.
>
> I do not know about David, but I hate C and consider it a very bad
> language.

Which languages don't you hate?

That does not alter the fact that C influenced and shaped both
> hardware and software as well as corrupted minds of several generations
> and is keeping doing so.

How can it have shaped hardware which was devised before it existed? How
could it have influenced processors like the 8080, Z80, 8086, 68000
which were devised when the use of C was still limited?

From somewhere on the internet:

"What was the leading programming language from 1965 to 1979?"

"From 1965 to 1980, Fortran kept the 1st place. In 1980, Pascal got
first and kept on top for five years. The C programming language got its
main popularity from 1985 until 2001."

But also from the internet:

"C (1972) was the very first high-level language"

It looks like people can just make stuff up. But I can very well believe
that C started to become dominant in the mid-80s, partly because I was
there.

And by the mid-80s, we already had 32-bit microprocessors.


David Brown

unread,
Nov 23, 2022, 9:53:06 AM11/23/22
to
On 22/11/2022 18:13, Bart wrote:
> On 22/11/2022 15:29, David Brown wrote:
>> On 22/11/2022 13:38, Bart wrote:
>
>>> When you looked at the 68000 more closely, it had nearly as much
>>> non-orthoganality as the 8086. (I was trying at that time to get my
>>> company to switch to a processor like the 68k.)
>>
>> No, it does not.  (Yes, I have looked at it closely, and used 68k
>> processors extensively.)
>
> As a compiler writer?

As an assembly programmer and C programmer.

> The first thing you noticed is that you have to
> decide whether to use D-registers or A-registers, as they had different
> characteristics, but the 3-bit register field of instructions could only
> use one or the other.
>

Yes, although they share quite a bit in common too. You have 8 data
registers that are all orthogonal and can be used for any data
instructions as source and designation, all 32 bit. You have 8 address
registers that could all be used for all kinds of addressing modes (and
a few kinds of calculations, and as temporary storage) - the only
special one was A7 that was used for stack operations (as well as being
available for all the other addressing modes).

How does that even begin to compare to the 8086 with its 4 schizophrenic
"main" registers that are sometimes 16-bit, sometimes two 8-bit
registers, with a wide range of different dedicated usages for each
register? Then you have 4 "index" registers, each with different
dedicated uses. And 4 "segment" registers, each with different
dedicated uses.

Where the 68000 has wide-spread, planned and organised orthogonality and
flexibility, the 8086 is a collection of random dedicated bits and pieces.

> That made the 8086 simpler because there was no choice! The registers
> were limited and only one was general purpose.
>

A design like the 8086 might feel nicer for some assembly programmers.
I've worked in assembly on a range of systems - including 8-bit CISC
devices with only a few dedicated registers, 8-bit RISC processors,
16-bit devices, and 32-bit devices. Without a doubt, the m68k
architecture is the nicest I have used for assembly programming. The
msp430 is also good, but as a 16-bit device it is a bit more limited.
(It has 16 registers, of which 12 are fully orthogonal.) At the high
end, PowerPC is extremely orthogonal but quite tedious to program - it's
hard to track so many registers (it has 32 registers) manually. A small
number of dedicated registers is okay if you are only doing very simple
and limited assembly programming. For a bit more advanced stuff, you
want more registers. (And for very advanced stuff you don't want to use
assembly at all.)

As you know, I personally have not written a compiler - but I know a
good deal more about compilers than most programmers. There is not a
shadow of a doubt that serious compiler writers prefer processors with a
reasonable number of orthogonal general-purpose registers to those with
a small number of specialised registers.

You can understand this by looking at the compiler market - there are
many compilers available for orthogonal processors, and multi-target
compilers commonly support many orthogonal processors. (This is not
just gcc and clang/llvm - it includes Metrowerks, Green Hills, Wind
River, etc.) Compilers that target CISC devices with specialised
registers are typically more dedicated and specialised tools, and often
very expensive. The only exception is the x86, which is so common that
lots of compilers support it.

You can also understand it by looking at the processor market. Real
CISC with dedicated and specialised registers is dead. In the progress
of x86 through 32-bit and then 64-bit, the architecture became more and
more orthogonal - the old registers A, B, C, D, SI, DI, etc., are now no
more than legacy alternative names for r0, r1, etc., general purpose
registers.



>>> But C probably has influenced modern 64-bit ABIs, even though they
>>> are supposed to be language-independent.
>>>
>>
>> What makes you think they are supposed to be language independent?
>> What makes you think they are not?  What makes you care?
>
> Language A can talk to language B via the machine's ABI. Where does C
> come into it?
>
> Language A can talk to a library or OS component that resides in a DLL,
> via the ABI. The library might have been implemented in C, or assembler,
> or in anything else, but in binary form, is pure machine code anyway.
>
> What makes /you/ think that such ABIs were invented purely for the use
> of C programs? Do you think the designers of the ABI simply assumed that
> only programs written in the C language could call into the OS?

As so often happens, you are making up stuff that you think I think. I
think you find it easier than reading what I actually write.

>
> When you download a shared library DLL, do you think they have different
> versions depending on what language will be using the DLL?
>
>> The types and terms from C are a very convenient way to describe an ABI,
>
> They're pretty terrible actually. The types involved in SYS V ABI can be
> expressed as follows in a form that everyone understands and many
> languages use:
>
>     i8 i16 i32 i64 i128
>     u8 u16 u32 u64 u128
>     f32 f64 f128

Or they can be expressed in a form that everyone understands, like
"char", "int", etc., that are defined in the ABI, and that everybody and
every language /does/ use when integrating between different languages.

I don't disagree that size-specific types might have been a better
choice to standardise on - but the world has standardised on C types for
the purpose. They do a good enough job, and everyone but you is happy
with them.

>
> This document (https://refspecs.linuxbase.org/elf/x86_64-abi-0.99.pdf)
> lists the C equivalents as follows (only signed integers shown):
>
>    i8    char, signed char
>    i16   short, signed short
>    i32   int, signed int
>    i64   long, signed long, long long, signed long long
>    i128  __int128, signed __int128
>
> (No use of int8_t etc despite the document dated 2012.)

That document has no mention anywhere of your personal short names for
size-specific types. It has a table stating the type names and sizes.
Think of it as just a definition of the technical terms used in the
document, no different from when one processor reference might define
"word" to mean 16 bits and another define "word" to mean 32 bits.

Why does it not use <stdint.h> types like "int16_t" ? Even now, in
2022, people still use C90 standard C - for good reasons or bad reasons.
And C++ did not standardise the <stdint.h> types until C++11 (though
every C++ implementation supported them long before that).

>
> This comes up in APIs too where it is 100 times more relevant (only
> compiler writers care about the API). The C denotations shown here are
> not fit for purpose for language-neutral interfaces.
>
> (Notice also that 'long' and 'long long' are both 64 bits, and that
> 'char' is assumed to be signed. In practice the C denotations would vary
> across platforms, while those i8-i128 would stay constant, provided only
> that the machine uses conventional register sizes.)
>
> So it's more like, such interfaces were developed /despite/ C.
>
>> since it is a language familiar to any programmer who might be
>> interested in the details of an ABI.  Such ABI's only cover a
>> (relatively) simple common subset of possible interfaces, but do so in
>> a way that can be used from any language (with wrappers if needed) and
>> can be extended as needed.
>>
>> People make ABI's for practical use.  MS made the ABI for Win64 to
>> suit their own needs and uses.  AMD and a range of *nix developers
>> (both OS and application developers) and compiler developers got
>> together to develop the 64-bit x86 ABI used by everyone else, designed
>> to suit /there/ needs and uses.
>
> x86-32 used a number of different ABIs depending on language and
> compiler. x86-64 tends to use one ABI, which is a strong indication that
> that that ABI was intended to work across languages and compilers.
>

There are so many x86-32 ABI's that it doesn't have an ABI - Intel never
bothered trying to make one for their processors, and MS never bothered
making one for their OS. (The various *nix systems for x86-32 agreed on
an ABI.)

For x86-64, there are two ABI's - the one developed by AMD, *nix
vendors, and compiler developers based on what would work efficiently
for real code, and the one MS picked based on... well, I don't know if
anyone really knows what they based it on. It has some pretty silly
differences from the one everyone else had standardised on before they
even started thinking about it.

But yes, even in the MS world the ABI situation is vastly better for
x86-64 than it was for x86-32, and it works across languages (limited by
the lowest common denominator) and compilers.


>
>>> Case insensitive? Or maybe that's just wishful thinking.
>>>
>>
>> Case insensitivity is a mistake, born from the days before computers
>> were advanced enough to have small letters as well as capitals.  It
>> leads to ugly inconsistencies, wastes the opportunity to convey useful
>> semantic information, and is an absolute nightmare as soon as you
>> stray from the simple English-language alphabet.
>
> Yet Google searches are case-insensitive. How is that possible, given
> that search strings can use Unicode which you say does not define case
> equivalents across most alphabets?

Human language is often case insensitive - certainly in speech. So
natural human language interfaces have to take that into account.
Programming is not a natural human language.

How does Google manage case-insensitive searches with text in Unicode in
many languages? By being /very/ smart. I didn't say it was impossible
to be case-insensitive beyond plain English alphabet, I said it was an
"absolute nightmare". It is. It is done where it has to be done -
you'll find all major databases have support for doing sorting,
searching, and case translation for large numbers of languages and
alphabets. It is a /huge/ undertaking to handle it all. You don't do
it if it is not important.

Name just /one/ real programming language that supports case-insensitive
identifiers but is not restricted to ASCII. (Let's define "real
programming language" as a programming language that has its own
Wikipedia entry.)

There are countless languages that have case-sensitive Unicode
identifiers, because that's easy to implement and useful for programmers.

>
> As are email addresses and domain names.

The email standards say that email addresses are case-sensitive, but
originally encouraged servers to be lenient in how they check them.

In current email standards, this is referred to as "unwise in practice":

<https://datatracker.ietf.org/doc/html/rfc6530#section-10.1>


Domain names are case insensitive if they are in ASCII. For other
characters, it gets complicated.

>
> As are most things in everyday life, even now that it is all tied up
> with computers and smartphones and tablets with everything being online.
>
> (Actually, most people's exposure to case-sensitivity is in online
> passwords, which is also the worst place to have it, as usually you
> can't see them!)

Programmers are not "most people". Programs are not "most things in
everyday life".

Most people are quite tolerant of spelling mistakes in everyday life -
do you think programming languages should be too?

>
> Your objections make no sense at all. Besides which, plenty of
> case-insensitive languages, file-systems and shell programs and
> applications exist.
>

They do exist, yes. That does not make them a good idea.

>> I believe Unix's predecessor, Multics, was case-sensitive.  But I
>> could be wrong.
>
> I'm surprised the Unix and C developers even had a terminal that could
> do upper and lower case. I was stuck with upper case for the first year
> or two. File-systems and global linker symbols were also restricted in
> length and case for a long time, to minimise space.
>
> Case-sensitivity was a luxury into the 80s.
>

Perhaps they were forward-thinking people.

David Brown

unread,
Nov 23, 2022, 10:03:21 AM11/23/22
to
On 23/11/2022 15:04, Bart wrote:
> On 23/11/2022 12:40, Dmitry A. Kazakov wrote:

>>
>> I do not know about David, but I hate C and consider it a very bad
>> language.
>
> Which languages don't you hate?
>

You asked the very question I was thinking!

He hates functional programming languages, because he didn't like using
a couple of ancient languages that were not functional programming
languages but happened to be declarative rather than imperative.

He hates any use of generics, templates, or other advanced language feature.

He likes imperative programming, and only imperative programming.

I think he likes object oriented programming, but it could just be that
he thinks he knows about it.

He hates C.


I know /you/ have a similar set of features that you like and dislike,
with the overriding rule of "if it is in C, it is bad". And I know the
only languages you really like are the ones you made yourself (though
you think ALGOL was not too bad).

But I can't figure out what Dmitry might like - unless he too has his
own personal language.

Dmitry A. Kazakov

unread,
Nov 23, 2022, 10:23:44 AM11/23/22
to
On 2022-11-23 16:03, David Brown wrote:

> But I can't figure out what Dmitry might like - unless he too has his
> own personal language.

No, I am not that megalomaniac. (:-))

I want stuff useful for software engineering. To me it is a DIY shop. I
choose techniques I find useful in long term perspective and reject
other. I generally avoid academic exercises, hobby languages,
big-tech/corporate/vendor-lock bullshit. You can guess which of your pet
languages falls into which category. (:-))

Dmitry A. Kazakov

unread,
Nov 23, 2022, 10:34:12 AM11/23/22
to
On 2022-11-23 15:53, David Brown wrote:

> Name just /one/ real programming language that supports case-insensitive
> identifiers but is not restricted to ASCII.  (Let's define "real
> programming language" as a programming language that has its own
> Wikipedia entry.)

1. https://en.wikipedia.org/wiki/Ada_(programming_language)

2. Ada Reference Manual 2.3:

Two identifiers are considered the same if they consist of the same
sequence of characters after applying locale-independent simple case
folding, as defined by documents referenced in the note in Clause 1 of
ISO/IEC 10646:2011.
After applying simple case folding, an identifier shall not be
identical to a reserved word.

James Harris

unread,
Nov 23, 2022, 10:56:11 AM11/23/22
to
On 20/11/2022 00:35, Bart wrote:
> On 19/11/2022 22:23, James Harris wrote:

...

>> I remember reading that when AMD wanted to design a 64-bit
>> architecture they asked programmers (especially at Microsoft) what
>> they wanted. One thing was 'no segmentation'. The C model had
>> encouraged programmers to think in terms of flat address spaces, and
>> the mainstream segmented approach for x86 was a nightmare that people
>> didn't want to repeat.
>
>
> I think you're ascribing too much to C. In what way did any other
> languages (Algol, Pascal, Cobol, Fortran, even Ada by then) encourage
> the use of segmented memory?

I wouldn't say they did. What I would say is that probably none of them
had C's influence on what programming became. Yes, Cobol was widespread
for a long time but its design didn't get incorporated into later
languages. Conversely, much of Algol's approach was adopted by nearly
all later languages but it itself never achieved the widespread use of
C. Only C had widespread use as well as strong influence on others. Much
of the programming community today still thinks in C terms even 50 years
(!!!) after its release.

>
> Do you mean because C required the use of different kinds of pointers,
> and people were fed up with that? Whereas other languages hid that
> detail better.

I am not sure what you mean but while some languages restricted /where/
a pointer could point C allowed a single pointer to point anywhere. (I
may be wrong but I think the only split that would work is between data
and code because the language can tell which of the two any reference is.)

On pointers to data consider the subexpression

f(p)

where p is a pointer. Even on a segmented machine that call has no
concept of whether p is pointing to, say, the stack or one of many data
segments. In general, all pointers have to be flat: any pointer can
point anywhere; that's the C model.

>
> You might as well say then that Assembly was equally responsible since
> it was even more of a pain to deal with segments!

There are lots of assembly languages, one per CPU!

But if you are thinking of Intel then you are right that their
half-hearted approach gave segmentation a bad name.

Consider a segment of memory as a simple range from byte 'first' to byte
'last'. With such ranges:

* all accesses can be range checked automatically
* no access outside the range would be permitted
* the range could be extended or shortened
* the memory used could be moved around as needed

all without impacting a program which accesses them.

>
>
> (Actually, aren't the segments still there on x86? Except they are 4GB
> in size instead of 64KB.)

Intel could, instead, have said that a 32-bit address was split into a
range and an offset, such that the CPU in hardware would use the range
to find 'first' and 'last', then add the offset to the 'first' and use
the 'last' for range checking.


--
James Harris


Bart

unread,
Nov 23, 2022, 11:36:39 AM11/23/22
to
On 23/11/2022 15:56, James Harris wrote:
> On 20/11/2022 00:35, Bart wrote:
>> On 19/11/2022 22:23, James Harris wrote:
>
> ...
>
>>> I remember reading that when AMD wanted to design a 64-bit
>>> architecture they asked programmers (especially at Microsoft) what
>>> they wanted. One thing was 'no segmentation'. The C model had
>>> encouraged programmers to think in terms of flat address spaces, and
>>> the mainstream segmented approach for x86 was a nightmare that people
>>> didn't want to repeat.
>>
>>
>> I think you're ascribing too much to C. In what way did any other
>> languages (Algol, Pascal, Cobol, Fortran, even Ada by then) encourage
>> the use of segmented memory?
>
> I wouldn't say they did. What I would say is that probably none of them
> had C's influence on what programming became.

Examples? Since the current crop of languages all have very different
ideas from C.

> Yes, Cobol was widespread
> for a long time but its design didn't get incorporated into later
> languages. Conversely, much of Algol's approach was adopted by nearly
> all later languages but it itself never achieved the widespread use of
> C. Only C had widespread use as well as strong influence on others. Much
> of the programming community today still thinks in C terms even 50 years
> (!!!) after its release.

Is it really C terms, or does that just happen to be the hardware model?

Yes, C is a kind of lingua franca that lots of people know, but notice
that people talk about a 'u64' type, something everyone understands, but
not 'unsigned long long int' (which is not even defined by C to be
exactly 64 bits), nor even `uint64_t` (which not even C programs
recognise unless you use stdint.h or inttypes.h!).


> On pointers to data consider the subexpression
>
>   f(p)
>
> where p is a pointer. Even on a segmented machine that call has no
> concept of whether p is pointing to, say, the stack or one of many data
> segments. In general, all pointers have to be flat: any pointer can
> point anywhere; that's the C model.

Why the C model? Do you have any languages in mind with a different model?

Pointers or references occur in many languages (from that time period,
Pascal, Ada, Algol68); I don't recall them being restricted in their
model of memory.

C, on the other, had lots of restrictions:

* Having FAR and NEAR pointer types

* Having distinct object and function pointers (you aren't even allowed
to directly cast between them)

* Not being able to compare pointers to two different objects

It is the only one that I recall which exposes the fact that these could
all exist in different, non-compatible and /non-linear/ regions of memory.

So your calling linear memory the 'C model' is backwards!

> Consider a segment of memory as a simple range from byte 'first' to byte
> 'last'. With such ranges:
>
> * all accesses can be range checked automatically
> * no access outside the range would be permitted
> * the range could be extended or shortened
> * the memory used could be moved around as needed
>
> all without impacting a program which accesses them.

See my comments above.


James Harris

unread,
Nov 23, 2022, 11:51:17 AM11/23/22
to
On 23/11/2022 16:36, Bart wrote:
> On 23/11/2022 15:56, James Harris wrote:
>> On 20/11/2022 00:35, Bart wrote:
>>> On 19/11/2022 22:23, James Harris wrote:
>>
>> ...
>>
>>>> I remember reading that when AMD wanted to design a 64-bit
>>>> architecture they asked programmers (especially at Microsoft) what
>>>> they wanted. One thing was 'no segmentation'. The C model had
>>>> encouraged programmers to think in terms of flat address spaces, and
>>>> the mainstream segmented approach for x86 was a nightmare that
>>>> people didn't want to repeat.
>>>
>>>
>>> I think you're ascribing too much to C. In what way did any other
>>> languages (Algol, Pascal, Cobol, Fortran, even Ada by then) encourage
>>> the use of segmented memory?
>>
>> I wouldn't say they did. What I would say is that probably none of
>> them had C's influence on what programming became.
>
> Examples? Since the current crop of languages all have very different
> ideas from C.

Cobol and Algol:

>
>> Yes, Cobol was widespread for a long time but its design didn't get
>> incorporated into later languages. Conversely, much of Algol's
>> approach was adopted by nearly all later languages but it itself never
>> achieved the widespread use of C. Only C had widespread use as well as
>> strong influence on others. Much of the programming community today
>> still thinks in C terms even 50 years (!!!) after its release.
>
> Is it really C terms, or does that just happen to be the hardware model?
>
> Yes, C is a kind of lingua franca that lots of people know, but notice
> that people talk about a 'u64' type, something everyone understands, but
> not 'unsigned long long int' (which is not even defined by C to be
> exactly 64 bits), nor even `uint64_t` (which not even C programs
> recognise unless you use stdint.h or inttypes.h!).

u64 is just a name.

>
>
>> On pointers to data consider the subexpression
>>
>>    f(p)
>>
>> where p is a pointer. Even on a segmented machine that call has no
>> concept of whether p is pointing to, say, the stack or one of many
>> data segments. In general, all pointers have to be flat: any pointer
>> can point anywhere; that's the C model.
>
> Why the C model? Do you have any languages in mind with a different model?

Yes, the C model is as stated: any pointer can point anywhere. A C
pointer must be able to point to rodata, stack, and anywhere in the data
section.

>
> Pointers or references occur in many languages (from that time period,
> Pascal, Ada, Algol68); I don't recall them being restricted in their
> model of memory.
>
> C, on the other, had lots of restrictions:
>
> * Having FAR and NEAR pointer types

Are you sure that FAR and NEAR were part of C?


--
James Harris


Bart

unread,
Nov 23, 2022, 1:12:10 PM11/23/22
to
On 23/11/2022 16:51, James Harris wrote:
> On 23/11/2022 16:36, Bart wrote:

>>> I wouldn't say they did. What I would say is that probably none of
>>> them had C's influence on what programming became.
>>
>> Examples? Since the current crop of languages all have very different
>> ideas from C.
>
> Cobol and Algol:

I was asking about C's influence, but those two languages predated C.

>>> well as strong influence on others. Much of the programming community
>>> today still thinks in C terms even 50 years (!!!) after its release.
>>
>> Is it really C terms, or does that just happen to be the hardware model?
>>
>> Yes, C is a kind of lingua franca that lots of people know, but notice
>> that people talk about a 'u64' type, something everyone understands,
>> but not 'unsigned long long int' (which is not even defined by C to be
>> exactly 64 bits), nor even `uint64_t` (which not even C programs
>> recognise unless you use stdint.h or inttypes.h!).
>
> u64 is just a name.

So what are the 'C terms' you mentioned? Since if talking about
primitive types for example, u64 or uint64 or whatever are common ways
of refering to a 64-bit unsigned integer type, then unless the
discussion specically about C, you wouldn't use C denotations for it.

>> Why the C model? Do you have any languages in mind with a different
>> model?
>
> Yes, the C model is as stated: any pointer can point anywhere. A C
> pointer must be able to point to rodata, stack, and anywhere in the data
> section.

And that is different from any other language that had pointers, how?

Because I'm having trouble in understanding how you can attribute linear
memory models to C and only C, when it is the one language that exposes
the limitations of non-linear memory.


>>
>> Pointers or references occur in many languages (from that time period,
>> Pascal, Ada, Algol68); I don't recall them being restricted in their
>> model of memory.
>>
>> C, on the other, had lots of restrictions:
>>
>> * Having FAR and NEAR pointer types
>
> Are you sure that FAR and NEAR were part of C?

They were part of implementations of it for 8086. There were actually
'near', 'far' and 'huge'. I think a 'far' pointer had a fixed segment part.

(I can't remember my own arrangements for 8086, but I certainly don't
remember having multiple pointer types. Probably I used only one type, a
32-bit segment+offset value, of which the offset part was usually
normalised to be in the range 0 to 15.)


Andy Walker

unread,
Nov 23, 2022, 1:32:56 PM11/23/22
to
On 23/11/2022 16:36, Bart wrote:
> C, on the other, had lots of restrictions:
> * Having FAR and NEAR pointer types

Never part of C. [Non-standard extension in some implementations.]

> * Having distinct object and function pointers (you aren't even
> allowed to directly cast between them)

Correctly so. In a proper HLL, type punning should, in general,
be forbidden. There could be a case made out for casting between two
structures that are identical apart from the names of the components,
otherwise it is a recipe for hard-to-find bugs.

> * Not being able to compare pointers to two different objects

Of course you can. Such pointers compare as unequal. You can
also reliably subtract pointers in some cases. What more can you
reasonably expect?

> It is the only one that I recall which exposes the fact that these
> could all exist in different, non-compatible and /non-linear/ regions
> of memory.

"Exposes"? How? Where? Examples? [In either K&R C or standard
C, of course, not in some dialect implementation.]

--
Andy Walker, Nottingham.
Andy's music pages: www.cuboid.me.uk/andy/Music
Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Marpurg

David Brown

unread,
Nov 23, 2022, 1:38:29 PM11/23/22
to
On 23/11/2022 16:23, Dmitry A. Kazakov wrote:
> On 2022-11-23 16:03, David Brown wrote:
>
>> But I can't figure out what Dmitry might like - unless he too has his
>> own personal language.
>
> No, I am not that megalomaniac. (:-))
>
> I want stuff useful for software engineering. To me it is a DIY shop. I
> choose techniques I find useful in long term perspective and reject
> other. I generally avoid academic exercises, hobby languages,
> big-tech/corporate/vendor-lock bullshit. You can guess which of your pet
> languages falls into which category. (:-))
>

The languages I mostly use are C, C++ and Python, depending on the task
and the target system. (And while I enjoy working with each of these,
and see their advantages in particular situations, I also appreciate
that they are not good in other cases and they all have features I
dislike.) Your criteria would not rule out any of these - I too
generally avoid languages with vendor lock-in, and small developer or
user communities. Academic exercise languages are of course no use
unless you are doing academic exercises.

Your criteria would also not rule out several key functional programming
languages, including Haskell, OCaml, and Scala.

It would rule out C#, VB, Bart's languages, and possibly Java. Pascal
is in theory open and standard, but in practice it is disjoint with
vendor-specific variants. (There's FreePascal, which has no lock-in.)

You would still have Ada, D, Erlang, Fortran, Forth, JavaScript, Lua,
Rust, Modula-2, Perl, and PHP.

I think that covers most of the big languages (I assume you also don't
like ones that have very small user bases).



David Brown

unread,
Nov 23, 2022, 1:38:31 PM11/23/22
to
On 23/11/2022 17:36, Bart wrote:
> On 23/11/2022 15:56, James Harris wrote:

>> On pointers to data consider the subexpression
>>
>>    f(p)
>>
>> where p is a pointer. Even on a segmented machine that call has no
>> concept of whether p is pointing to, say, the stack or one of many
>> data segments. In general, all pointers have to be flat: any pointer
>> can point anywhere; that's the C model.
>
> Why the C model? Do you have any languages in mind with a different model?

Languages that don't have pointers can have their data organised any way
they want. (I don't have any particular languages in mind.)

>
> Pointers or references occur in many languages (from that time period,
> Pascal, Ada, Algol68); I don't recall them being restricted in their
> model of memory.
>
> C, on the other, had lots of restrictions:
>
> * Having FAR and NEAR pointer types

The C language has never had any such thing. A few implementations of C
(such as some DOS compilers, as well as compilers for some brain-dead
8-bit CISC microcontrollers like the 8051, or microcontrollers whose
memory has outgrown their 16-bit address space) have had such features
as extensions used to let people write efficient but powerful code, at
the expense of being non-portable. Other languages had various methods
of dealing with the same kind of issues - some had pointers that were
fixed as always "near" pointers (for efficient code but limited memory
size), some where fixed as always "far" pointers, some used compiler
flags to choose the "memory model", some used compiler directives, some
supported both kinds in some manner. It is the same today for some
embedded processor toolchains.

>
> * Having distinct object and function pointers (you aren't even allowed
> to directly cast between them)
>

That seems both normal and sensible. Neither Pascal nor Ada will let
you mix object and function pointers or convert between them, at least
not without a great deal more effort than in C.

> * Not being able to compare pointers to two different objects

I don't know other languages' standards well enough to be sure. I doubt
if you do either. (I don't even know if their standards consider such
details.)

However, I did find this:
<https://en.wikibooks.org/wiki/Pascal_Programming/Pointers> which says
that ordering comparison operators like < and >= do not apply to
pointers - only "=" and "<>" are allowed (as they are in C for any
pointers, regardless of where they point). There are many variants of
Pascal, however, with very significant differences - some may happily
allow pointer comparison. (Just as some C implementations may happily
allow any pointer comparisons.)



Bart

unread,
Nov 23, 2022, 1:53:33 PM11/23/22
to
On 23/11/2022 18:31, Andy Walker wrote:
> On 23/11/2022 16:36, Bart wrote:
>> C, on the other, had lots of restrictions:
>> * Having FAR and NEAR pointer types
>
>     Never part of C.  [Non-standard extension in some implementations.]
>
>> * Having distinct object and function pointers (you aren't even
>> allowed to directly cast between them)
>
>     Correctly so.  In a proper HLL, type punning should, in general,
> be forbidden.  There could be a case made out for casting between two
> structures that are identical apart from the names of the components,
> otherwise it is a recipe for hard-to-find bugs.
>
>> * Not being able to compare pointers to two different objects
>
>     Of course you can.  Such pointers compare as unequal.  You can
> also reliably subtract pointers in some cases.  What more can you
> reasonably expect?

C doesn't allow relative compares, or subtracting operators. Or rather,
it will make those operations implementation defined or UB, simply
because pointers could in fact refer to incompatible memory regions.

This goes against the suggestion that C is more conducive to linear
memory than any other languages.


>> It is the only one that I recall which exposes the fact that these
>> could all exist in different, non-compatible and /non-linear/ regions
>> of memory.
>
>     "Exposes"?  How?  Where?  Examples?  [In either K&R C or standard
> C, of course, not in some dialect implementation.]


What is being claimed is that it is largely C that has been responsible
for linear memory layouts in hardware.

What I've been trying to establish is how exactly it managed that; what
did other languages with pointers do differently?

So far no has managed to answer that; it's just a C love-fest.

All I know is that when there IS segmented memory, then C will make you
aware of it. On the IBM PC x86 machines, then if you were writing in C,
then you still had to grapple with those kinds of pointers.

Actually I've lost track of what is being claimed, and now I'm highly
sceptical. So far:

* C was responsible for the success of hardware that predated C

* C influenced the design of microprocesses in the mid to late 70s,
where C was in its early days

* C single-handly was responsible for us having linear memory today (and
nothing to do with machines having more address bits)

* C was responsible for us having power-of-two word sizes now.

This despite C not being mainstream until the mid-80s when there were
already machines with power-of-two word sizes and linear memory.




Andy Walker

unread,
Nov 23, 2022, 3:25:02 PM11/23/22
to
On 23/11/2022 18:53, Bart wrote:
>>> * Not being able to compare pointers to two different objects
>>      Of course you can.  Such pointers compare as unequal.  You can
>> also reliably subtract pointers in some cases.  What more can you
>> reasonably expect?
> C doesn't allow relative compares, or subtracting operators. Or
> rather, it will make those operations implementation defined or UB,> simply because pointers could in fact refer to incompatible memory
> regions.

N2478 [other standards are available], section 6.5.6.10:

" When two pointers are subtracted, both shall point to elements of
" the same array object, or one past the last element of the array
" object; the result is the difference of the subscripts of the two
" array elements. The size of the result is implementation-defined,
" and its type (a signed integer type) is ptrdiff_t defined in the
" <stddef.h> header. If the result is not representable in an object
" of that type, the behavior is undefined. "

So the behaviour is undefined only if the subtraction overflows, and is
implementation defined only to the extent of what size of signed integer
the implementation prefers. It's difficult to see what other behaviour
could reasonably be specified in the Standard.

Section 6.5.8.6:

" When two pointers are compared, the result depends on the relative
" locations in the address space of the objects pointed to. If two
" pointers to object types both point to the same object, or both
" point one past the last element of the same array object, they
" compare equal. If the objects pointed to are members of the same
" aggregate object, pointers to structure members declared later
" compare greater than pointers to members declared earlier in the
" structure, and pointers to array elements with larger subscript
" values compare greater than pointers to elements of the same
" array with lower subscript values. All pointers to members of the
" same union object compare equal. If the expression P points to an
" element of an array object and the expression Q points to the last
" element of the same array object, the pointer expression Q+1
" compares greater than P. In all other cases, the behavior is
" undefined. "

Well, it's rather verbose, but it all seems common sense to me. No
mention anywhere of "incompatible memory regions", so I suspect that
you're making it up based on what you think C is like rather than how
it is defined in reality.

May be worth noting that [eg] Algol defines only the relations
"is" and "isn't" between pointers; C is at least more "helpful" than
that. But that is largely driven by C's use of pointers in arrays.

> This goes against the suggestion that C is more conducive to linear
> memory than any other languages.

Well, /I/ have made no such suggestion, and don't really even
know what that is claimed to mean. Most HLLs specifically hide the
layout and structure of memory from ordinary programmers; no doubt a
Good Thing.

>>> It is the only one that I recall which exposes the fact that these
>>> could all exist in different, non-compatible and /non-linear/ regions
>>> of memory.
>>      "Exposes"?  How?  Where?  Examples?  [In either K&R C or standard
>> C, of course, not in some dialect implementation.]
> What is being claimed is that it is largely C that has been
> responsible for linear memory layouts in hardware.

Not a claim I have ever made, nor even seen before. There is
remarkably little in [standard] C that relates to memory layouts. So
I take it that you have no examples of this claimed exposure?

FTAOD, I'm not a great fan of C. But that's another matter.

Dmitry A. Kazakov

unread,
Nov 23, 2022, 3:25:58 PM11/23/22
to
Narrow user base is no reason to reject a language. However there is a
danger that the language might go extinct.

To me most important is the language toolbox:

- modules, separate compilation, late bindings
- abstract data types
- generic programming (i.e. in terms of sets of types)
- formal verification, contracts, correctness proofs
- some object representation control
- interfacing to C and thus system and other libraries
- high level concurrency support
- program readability, reasonable syntax AKA don't be APL (:-))
- standard library abstracting the underlying OS
- some type introspection

Things not important or ones I actively avoid are

- lambdas
- relational algebra
- patterns
- recursive types
- closures
- dynamic/duck/weak/no-typing
- macros/preprocessor/templates/generics
- standard container library (like std or boost)
- standard GUI library

David Brown

unread,
Nov 23, 2022, 3:33:58 PM11/23/22
to
OK, that's one!


David Brown

unread,
Nov 23, 2022, 3:46:58 PM11/23/22
to
On 23/11/2022 19:53, Bart wrote:
> On 23/11/2022 18:31, Andy Walker wrote:
>> On 23/11/2022 16:36, Bart wrote:
>>> C, on the other, had lots of restrictions:
>>> * Having FAR and NEAR pointer types
>>
>>      Never part of C.  [Non-standard extension in some implementations.]
>>
>>> * Having distinct object and function pointers (you aren't even
>>> allowed to directly cast between them)
>>
>>      Correctly so.  In a proper HLL, type punning should, in general,
>> be forbidden.  There could be a case made out for casting between two
>> structures that are identical apart from the names of the components,
>> otherwise it is a recipe for hard-to-find bugs.
>>
>>> * Not being able to compare pointers to two different objects
>>
>>      Of course you can.  Such pointers compare as unequal.  You can
>> also reliably subtract pointers in some cases.  What more can you
>> reasonably expect?
>
> C doesn't allow relative compares, or subtracting operators. Or rather,
> it will make those operations implementation defined or UB, simply
> because pointers could in fact refer to incompatible memory regions.
>

It doesn't allow them because they don't make sense. When would you
want to subtract two unrelated pointers? What would it give you? At
what time might you want to compare two unrelated pointers for anything
other than equality? (Note that an implementation can support what it
likes in implementation-specific code, such as the guts of its standard
library.)

> This goes against the suggestion that C is more conducive to linear
> memory than any other languages.
>

Linear memory makes it easier to implement C. You can also have C for a
target that does not have linear memory. There is no contradiction there.

>
>>> It is the only one that I recall which exposes the fact that these
>>> could all exist in different, non-compatible and /non-linear/ regions
>>> of memory.
>>
>>      "Exposes"?  How?  Where?  Examples?  [In either K&R C or standard
>> C, of course, not in some dialect implementation.]
>
>
> What is being claimed is that it is largely C that has been responsible
> for linear memory layouts in hardware.
>

Who claimed that? All that has been said is that C is influential in
the way modern computer and processor architecture has developed - with
different vague expressions of how influential it has been, and some
points and cpu features that fit well for C but are not necessarily
ideal for some other languages.



David Brown

unread,
Nov 23, 2022, 3:59:28 PM11/23/22
to
Yes. (I said "very small user bases".)

> To me most important is the language toolbox:
>
> - modules, separate compilation, late bindings
> - abstract data types
> - generic programming (i.e. in terms of sets of types)

I thought you didn't like that?

> - formal verification, contracts, correctness proofs

Yet you reject functional programming? You can do a bit of formal
proofs with SPARK, but people doing serious formal correctness proofs
tend to prefer pure functional programming languages.

> - some object representation control
> - interfacing to C and thus system and other libraries
> - high level concurrency support
> - program readability, reasonable syntax AKA don't be APL (:-))
> - standard library abstracting the underlying OS
> - some type introspection

I think Haskell would fit for all of that. And C++ is as good as Ada.

>
> Things not important or ones I actively avoid are
>
> - lambdas
> - relational algebra
> - patterns
> - recursive types
> - closures
> - dynamic/duck/weak/no-typing
> - macros/preprocessor/templates/generics

So generics are important to you, but you actively avoid them?

Bart

unread,
Nov 23, 2022, 4:02:00 PM11/23/22
to
So, basically, everything is fully defined when both pointers refer to
the same objects, which is what I said, more briefly.

>  No
> mention anywhere of "incompatible memory regions", so I suspect that
> you're making it up based on what you think C is like rather than how
> it is defined in reality.

This part of it is implied by those restrictions, when you think of the
reasons why they might apply.

Except C applies those whether or not pointers to those memory registers
would be compatible or not.

In fact, you /can/ have distinct kinds of memory, though more common on
older hardware, or in microcontrollers.

But this is all by the by; my quest was trying to figure what it was
about how how C (and only C) does pointers, that made architecture
designers decide they need more linear memory than segmented.

My opinion is that C had very little if anything to do with it; it's
just natural evolution when you move from 16 address bits to 32 and then
64, and we already had 32 address bits on the 80386 in the mid-80s, and
on lesser know machines before that.

C wasn't mature enough for that much influence, and I don't believe
languages, or any one in particular, were that influential.

Computers had to be made to continue running the dozens of other
languages also in use, and most would equally benefit from the same
developments: speed, memory size, word sizes, more registers. Linear
memory is a consequence of having a big enough word size to address all
the code and data for a task.

The stuff about C being solely responsible may just have been a wind-up.


>     May be worth noting that [eg] Algol defines only the relations
> "is" and "isn't" between pointers;  C is at least more "helpful" than
> that.  But that is largely driven by C's use of pointers in arrays.
>
>> This goes against the suggestion that C is more conducive to linear
>> memory than any other languages.
>
>     Well, /I/ have made no such suggestion, and don't really even
> know what that is claimed to mean.  Most HLLs specifically hide the
> layout and structure of memory from ordinary programmers;  no doubt a
> Good Thing.

At least 3 people in the group were claiming all sorts of unlikely
things of C.

>>>> It is the only one that I recall which exposes the fact that these
>>>> could all exist in different, non-compatible and /non-linear/ regions
>>>> of memory.
>>>      "Exposes"?  How?  Where?  Examples?  [In either K&R C or standard
>>> C, of course, not in some dialect implementation.]
>> What is being claimed is that it is largely C that has been
>> responsible for linear memory layouts in hardware.
>
>     Not a claim I have ever made, nor even seen before.  There is
> remarkably little in [standard] C that relates to memory layouts.  So
> I take it that you have no examples of this claimed exposure?

It's not me making the claims.


Dmitry A. Kazakov

unread,
Nov 23, 2022, 4:34:01 PM11/23/22
to
Generic programming can be achieved without parametric/static
polymorphism. It is only one, inferior, way of constructing sets of
types. I prefer dynamic polymorphism.

>> - formal verification, contracts, correctness proofs
>
> Yet you reject functional programming?

Sure.

> You can do a bit of formal
> proofs with SPARK, but people doing serious formal correctness proofs
> tend to prefer pure functional programming languages.

It is about priorities. I need proving correctness of parts of real-life
programs. The most difficult problem with proofs is that you must bend
the program to make it provable potentially introducing bugs, e.g. in
contracts. I'd like to see partial and conditional proofs rather than
absolutist approaches.

>> - some object representation control
>> - interfacing to C and thus system and other libraries
>> - high level concurrency support
>> - program readability, reasonable syntax AKA don't be APL (:-))
>> - standard library abstracting the underlying OS
>> - some type introspection
>
> I think Haskell would fit for all of that.  And C++ is as good as Ada.

C++ has problems with high-level concurrency and massive syntax issues.
Looking at a modern C++ code I am not sure whether it is plain text or
Base64-encoded. Early C++ was in some aspects admirable language before
Stepanov poured poison in the ear of poor Bjarne... (:-))

>> Things not important or ones I actively avoid are
>>
>> - lambdas
>> - relational algebra
>> - patterns
>> - recursive types
>> - closures
>> - dynamic/duck/weak/no-typing
>> - macros/preprocessor/templates/generics
>
> So generics are important to you, but you actively avoid them?

See above. Generic programming /= programming using generics.

James Harris

unread,
Nov 23, 2022, 5:20:38 PM11/23/22
to
On 22/11/2022 17:13, Bart wrote:

...

> That made the 8086 simpler because there was no choice! The registers
> were limited and only one was general purpose.

One was /almost/ general purpose! :-)


--
James Harris


James Harris

unread,
Nov 23, 2022, 5:38:43 PM11/23/22
to
On 23/11/2022 18:12, Bart wrote:
> On 23/11/2022 16:51, James Harris wrote:
>> On 23/11/2022 16:36, Bart wrote:
>
>>>> I wouldn't say they did. What I would say is that probably none of
>>>> them had C's influence on what programming became.
>>>
>>> Examples? Since the current crop of languages all have very different
>>> ideas from C.
>>
>> Cobol and Algol:
>
> I was asking about C's influence, but those two languages predated C.

You mean the languages which C's design has influenced? Many such as
Java, C#, C++, Objective-C, D, Go, etc.

>
>>>> well as strong influence on others. Much of the programming
>>>> community today still thinks in C terms even 50 years (!!!) after
>>>> its release.
>>>
>>> Is it really C terms, or does that just happen to be the hardware model?
>>>
>>> Yes, C is a kind of lingua franca that lots of people know, but
>>> notice that people talk about a 'u64' type, something everyone
>>> understands, but not 'unsigned long long int' (which is not even
>>> defined by C to be exactly 64 bits), nor even `uint64_t` (which not
>>> even C programs recognise unless you use stdint.h or inttypes.h!).
>>
>> u64 is just a name.
>
> So what are the 'C terms' you mentioned?

I mean things like pointers, memory as an array of bytes, etc.

> Since if talking about
> primitive types for example, u64 or uint64 or whatever are common ways
> of refering to a 64-bit unsigned integer type, then unless the
> discussion specically about C, you wouldn't use C denotations for it.
>
>>> Why the C model? Do you have any languages in mind with a different
>>> model?
>>
>> Yes, the C model is as stated: any pointer can point anywhere. A C
>> pointer must be able to point to rodata, stack, and anywhere in the
>> data section.
>
> And that is different from any other language that had pointers, how?
>
> Because I'm having trouble in understanding how you can attribute linear
> memory models to C and only C, when it is the one language that exposes
> the limitations of non-linear memory.

Earlier in this discussion you seemed to understand that I was saying C
had a primary influence. When did that change to C being the only influence?

>
>
>>>
>>> Pointers or references occur in many languages (from that time
>>> period, Pascal, Ada, Algol68); I don't recall them being restricted
>>> in their model of memory.
>>>
>>> C, on the other, had lots of restrictions:
>>>
>>> * Having FAR and NEAR pointer types
>>
>> Are you sure that FAR and NEAR were part of C?
>
> They were part of implementations of it for 8086. There were actually
> 'near', 'far' and 'huge'. I think a 'far' pointer had a fixed segment part.

Then they weren't part of C. Perhaps their inclusion in certain
/implementations/ backs up my assertion that programmers viewed C's
pointers as unsegmented.


--
James Harris


Bart

unread,
Nov 23, 2022, 5:43:01 PM11/23/22
to
On 23/11/2022 14:53, David Brown wrote:
> On 22/11/2022 18:13, Bart wrote:

>> As a compiler writer?
>
> As an assembly programmer and C programmer.
>
>> The first thing you noticed is that you have to decide whether to use
>> D-registers or A-registers, as they had different characteristics, but
>> the 3-bit register field of instructions could only use one or the other.
>>
>
> Yes, although they share quite a bit in common too.  You have 8 data
> registers that are all orthogonal and can be used for any data
> instructions as source and designation, all 32 bit.  You have 8 address
> registers that could all be used for all kinds of addressing modes (and
> a few kinds of calculations, and as temporary storage) - the only
> special one was A7 that was used for stack operations (as well as being
> available for all the other addressing modes).
>
> How does that even begin to compare to the 8086 with its 4 schizophrenic
> "main" registers that are sometimes 16-bit, sometimes two 8-bit
> registers, with a wide range of different dedicated usages for each
> register?  Then you have 4 "index" registers, each with different
> dedicated uses.  And 4 "segment" registers, each with different
> dedicated uses.
>
> Where the 68000 has wide-spread, planned and organised orthogonality and
> flexibility, the 8086 is a collection of random dedicated bits and pieces.

It's too big an effort to dig into to now, many decades on, what gave me
that impression about the 68K. But the big one /is/ those two kinds of
registers.

Current machines already have GP and float registers to make things more
difficult, but here there are separate registers for integers - and
integers that might be used as memory addresses.

So you would have instructions that operated on one set but not the
other. You'd need to decide whether functions returned values in D0 or A0.

Glancing at the instruction set now, you have ADD which adds to
everything except A regs; ADDA which /only/ adds to AREGS.

ADDI which adds immed values to everything except AREGS, and ADDQ which
adds small values (1..8) to everything /including/ AREGS.

Similarly with ANDI, which works for every dest except AREGS, but there
is no version for AREGS (so if you were playing with tagged pointers and
needed to clear the bottom bits then use them for an address, it gets
awkward).

With a compiler, you had to make decisions on whether it's best to start
evaluating in DREGS or AREGS and then move across, if it involved mixed
operations that were only available for one set.

Note that the 80386 processor, which apparently first appeared in 1985,
removed many of the restrictions of the 8086, also widening the
registers but not adding any more. Further, these 32-bit additions and
new address modes were available while running in 16-bit mode within a
16-bit application.


> You can also understand it by looking at the processor market.  Real
> CISC with dedicated and specialised registers is dead.  In the progress
> of x86 through 32-bit and then 64-bit, the architecture became more and
> more orthogonal - the old registers A, B, C, D, SI, DI, etc., are now no
> more than legacy alternative names for r0, r1, etc., general purpose
> registers.

What become completely unorthogonal on x86 is the register naming. It's
a zoo of mismatched names of mixed lengths. The mapping is also bizarre,
with the stack pointer somewhere below the middle.

(However that is easy to fix as I can use my own register names and
ordering as well as the official names. My 64-bit registers are called
D0 to D15, with D15 (aka Dstack) being the stack pointer.)

>>      i8 i16 i32 i64 i128
>>      u8 u16 u32 u64 u128
>>      f32 f64 f128
>
> Or they can be expressed in a form that everyone understands, like
> "char", "int", etc., that are defined in the ABI, and that everybody and
> every language /does/ use when integrating between different languages.

Sorry, but C typenames using C syntax are NOT good enough, not for
cross-language use. You don't really want to see 'int long unsigned
long'; you want 'uint64' or 'u64'.

Even C decided that `int` `char` were not good enough by adding types
like `int32_t` and ... sorry I can't even tell you what `char`
corresponds to. That is how rubbish C type designations are.



> That document has no mention anywhere of your personal short names for
> size-specific types.

It uses names of its own like 'unsigned eightbyte' which unequivocally
describes the type. However you will see `u64` all over forums; you will
never see `unsigned eightbyte`, and never 'unsigned long long int'
outside of C forums or actual C code.

>  It has a table stating the type names and sizes.

Yes, that helps too. What doesn't help is just using 'long'.

> Think of it as just a definition of the technical terms used in the
> document, no different from when one processor reference might define
> "word" to mean 16 bits and another define "word" to mean 32 bits.

So defining a dozen variations on 'unsigned long long int` is better
than just using `u64` or `uint64`?

That must be the reason why a dozen different languages have all adopted
those C designations because they work so well and are so succinct and
unambiguous. Oh, hang on...



> How does Google manage case-insensitive searches with text in Unicode in
> many languages?  By being /very/ smart.  I didn't say it was impossible
> to be case-insensitive beyond plain English alphabet, I said it was an
> "absolute nightmare".  It is.


No, it really isn't. Now you're making things up. You don't need to be
very smart at all, it's actually very easy.



> It is done where it has to be done -
> you'll find all major databases have support for doing sorting,
> searching, and case translation for large numbers of languages and
> alphabets.  It is a /huge/ undertaking to handle it all.  You don't do
> it if it is not important.

Think about the 'absolute nightmare' if /everything/ was case sensitive
and a database has 1000 variations of people called 'David Brown'.
(There could be 130,000 with my name.)

Now imagine talking over the phone to someone, they create an account in
the name you give them, but they use or omit capitalisation you weren't
aware of. How would you log in?


> Name just /one/ real programming language that supports case-insensitive
> identifiers

I'm not talking about Unicode identifiers. I wouldn't go there becase
there are too many issues. For a start, which of the 1.1 million
characters should be allowed at the beginning, and which within an
identifier?

> but is not restricted to ASCII.  (Let's define "real
> programming language" as a programming language that has its own
> Wikipedia entry.)
>
> There are countless languages that have case-sensitive Unicode
> identifiers, because that's easy to implement and useful for programmers.

And also a nightmare, since there are probably 20 distinct characters
that share the same glyph as 'A'.

Adding Unicode to identifiers is too easy to do badly.


>
> Domain names are case insensitive if they are in ASCII.

Because?

> For other
> characters, it gets complicated.

So, the same situation with language keywords and commands in CLIs.

But hey, think of the advantage of having Sort and sorT working in
decreasing/increasing order; no need to specify that separately. Plus
you have 14 more variations to apply meanings to. Isn't this the point
of being case-sensitive?

Because if it isn't, then I don't get it. On Windows, I can type 'sort'
or `SORT`, it doesn't matter. I don't even need to look at the screen or
have to double-check caps lock.

BTW my languages (2 HLLs and one assembler) use case-insensitive
identifiers and keywords, but allow case-sensitive names when they are
sometimes needed, mainly to with working with FFIs.

It really isn't hard at all.

> Programmers are not "most people".  Programs are not "most things in
> everyday life".
>
> Most people are quite tolerant of spelling mistakes in everyday life -
> do you think programming languages should be too?

Using case is not a spelling mistake; it's a style. In my languages,
someone can write 'int', 'Int' or 'INT' according to preference.

Or that can use CamelCase if they like that, but someone importing such
a function can just write camelcase if they hate the style.

I use upper case when writing debug code so that I can instantly
identify it.


> They do exist, yes.  That does not make them a good idea.

Yes, it does. How do you explain to somebody why using exact case is
absolutely essential, when it clearly shouldn't matter?


Look: I create my own languages, yes? And I could have chosen at any
time to make them case sensitive, yes?

So why do you think I would choose to make like an 'absolute nightmare'
for myself?

The reason is obvious: because case insensitivity just works better and
is far more useful.

James Harris

unread,
Nov 23, 2022, 5:45:17 PM11/23/22
to
On 23/11/2022 21:01, Bart wrote:

...

> The stuff about C being solely responsible may just have been a wind-up.

Maybe I've missed it but I've not noticed anyone claim C was solely
responsible.

...

> It's not me making the claims.

It might be you making up the claims. ;-)


--
James Harris


Bart

unread,
Nov 23, 2022, 5:47:58 PM11/23/22
to
On 23/11/2022 21:33, Dmitry A. Kazakov wrote:
> On 2022-11-23 21:59, David Brown wrote:

>> I think Haskell would fit for all of that.  And C++ is as good as Ada.
>
> C++ has problems with high-level concurrency and massive syntax issues.
> Looking at a modern C++ code I am not sure whether it is plain text or
> Base64-encoded.

I've long had that problem in C, which, partly thanks to
case-sensitivity so that people have to write correctly cased names
(like macros), often looks like a sea of Mime-encoded text.

I can't hack it. C++, I just wouldn't bother with it; 90% seems to be
pointless punctuation.


Bart

unread,
Nov 23, 2022, 6:25:16 PM11/23/22
to
On 22/11/2022 15:29, David Brown wrote:
> The 8086 was horrible in all sorts of ways.  Comparing a 68000 with an
> 8086 is like comparing a Jaguar E-type with a bathtub with wheels.  And
> for the actual chip used in the first PC, an 8088, half the wheels were
> removed.

You've forgotten the 68008.

Bart

unread,
Nov 23, 2022, 6:31:29 PM11/23/22
to
Here's a selection of quotes from the thread (BC is me):

JH:
I even suspect that the CPUs we use today are also as they are in part
due to C. It has been that influential.

BC:

> However, what aspects of today's processors do you think owe anything
to C?

JH:

Things like the 8-bit byte, 2's complement, and the lack of segmentation.


JH:
I remember reading that when AMD wanted to design a 64-bit architecture
they asked programmers (especially at Microsoft) what they wanted. One
thing was 'no segmentation'. The C model had encouraged programmers to
think in terms of flat address spaces, and the mainstream segmented
approach for x86 was a nightmare that people didn't want to repeat.


DB:
C is /massively/ influential to the general purpose CPUs we have today.
The prime requirement for almost any CPU design is that you should be
able to use it efficiently for C. After all, the great majority of
software is written in languages that, at their core, are similar to C


BC:
> Two of the first machines I used were PDP10 and PDP11, developed by
DEC in the 1960s, both using linear memory spaces. While the former was
word-based, the PDP11 was byte-addressable, just like the IBM 360 also
from the 1960s.
>

DB:
C was developed originally for these processors, and was a major reason
for their long-term success.

BC:
> Of the PDP10 and IBM 360? Designed in the 1960s and discontinued in
1983 and 1979 respectively. C only came out in a first version in 1972.
>

DB:
I was thinking primarily of the PDP11, which was the first real target
for C (assuming I have my history correct - this was around the time I
was born). And by "long-term success" of these systems, I mean their
successors that were built in the same style - such as the VAX.



DB:
C was a /massive/ influence on processor evolution and the current
standardisation of general-purpose processors as systems for running C
code efficiently. But it was not the only influence, or the sole reason
for current processor design.


BC:
>> Or are you going to claim like David Brown that the hardware is like
that solely due to the need to run C programs?

DAK:
> Nobody would ever use any hardware if there is no C compiler. So
David is certainly right.


David Brown

unread,
Nov 24, 2022, 5:15:33 AM11/24/22
to
Roughly, yes. Some things - such as equality comparisons - are defined
even if they are in different objects.

>
>>   No
>> mention anywhere of "incompatible memory regions", so I suspect that
>> you're making it up based on what you think C is like rather than how
>> it is defined in reality.
>
> This part of it is implied by those restrictions, when you think of the
> reasons why they might apply.

You can't read between the lines like that and guess about what the
standard does /not/ say. Standards documents are a bit special - they
are concerned solely about what is explicitly discussed in the document,
and do not imply anything at all about things that are not covered.

So the standards don't say C can be used on systems with disjoint memory
regions, or systems with different address spaces. Nor do they say that
it /can't/ be used on them. Nor do they imply that such systems exist,
or don't exist.

They simply say that the C language says what happens when you
order-compare (or subtract) pointers that are within the same object,
because that's the only case the C language cares about.

>
> Except C applies those whether or not pointers to those memory registers
> would be compatible or not.

The C language does not care about that. And certainly the C language
standards don't "apply" anything.

A given C compiler can choose to do whatever it likes if you try to
order compare two pointers that are not part of the same object -
including assuming that it does not happen, and including doing a simple
naïve comparison of the pointer values as though they were integers.
(And that comparison could be done signed or unsigned, which may result
in a different answer from what you might be expecting.)

>
> In fact, you /can/ have distinct kinds of memory, though more common on
> older hardware, or in microcontrollers.
>

Or newer hardware with NUMA, remote memory on PCI buses (yes, that's a
thing now), virtual memory, disjoint memory setups for access control or
memory access debugging, or...

Usually, this is still all viewed as one logical address space, despite
being different physical spaces.

> But this is all by the by; my quest was trying to figure what it was
> about how how C (and only C) does pointers, that made architecture
> designers decide they need more linear memory than segmented.
>

Linear memory in one address space is much more convenient for
toolchains to handle. It is especially efficient when you have a
low-level compiled language, because it means you can use a simple naïve
implementation in most cases. It means you can implement a function like :

int get_data(const int * p) {
return *p;
}

with just a simple read instruction.

If you have a language that has no programmer-visible pointers then you
cannot write such functions in the library - there is no need to have a
way to implement it. Or if you have advanced pointers/references that
handle access control and bounds checking, it's a minor matter to add
checking for different address spaces too. Then your same system and
same code will work for, say, remote objects accessed over a network.

Or if you have a language built around communicating actors, then there
is never a need to access memory outside your local data, so memory can
be as disjoint as you like.

There are many different computing models other than the von Neumann
setup that has become ubiquitous as a result of the popularity of von
Neumann programming languages. C was not the first such language, nor
is it the only one, but it is far and away the biggest, most popular and
most vital to the computing world we have now.

This places great limits on the efficiency, cost, size, and power of
computing - von Neumann architectures and programs for them do not scale
well in directions other than single-thread speed. Buying a 32-core cpu
does not make your C code run 32 times faster - but it /would/ make your
Occam code run 32 times faster, or your well-written Haskell code, or
your Erlang code, or code written in languages that were not targeting
this simple linear model.


> The stuff about C being solely responsible may just have been a wind-up.
>

You always seem to end up with such conclusions when people disagree
with you.

>
>>      May be worth noting that [eg] Algol defines only the relations
>> "is" and "isn't" between pointers;  C is at least more "helpful" than
>> that.  But that is largely driven by C's use of pointers in arrays.
>>
>>> This goes against the suggestion that C is more conducive to linear
>>> memory than any other languages.
>>
>>      Well, /I/ have made no such suggestion, and don't really even
>> know what that is claimed to mean.  Most HLLs specifically hide the
>> layout and structure of memory from ordinary programmers;  no doubt a
>> Good Thing.
>
> At least 3 people in the group were claiming all sorts of unlikely
> things of C.
>

All three of these people are, I suspect, based strongly on your
misinterpretation or your exaggerated interpretation of what people
actually wrote. You have a long habit of assuming everything about C
(or rather, your skewed idea of C) is evil in every way, as well as
interpreting every comment other people make about C as some kind of
fan-boy obsessive love for the language.

>>>>> It is the only one that I recall which exposes the fact that these
>>>>> could all exist in different, non-compatible and /non-linear/ regions
>>>>> of memory.
>>>>      "Exposes"?  How?  Where?  Examples?  [In either K&R C or standard
>>>> C, of course, not in some dialect implementation.]
>>> What is being claimed is that it is largely C that has been
>>> responsible for linear memory layouts in hardware.
>>
>>      Not a claim I have ever made, nor even seen before.  There is
>> remarkably little in [standard] C that relates to memory layouts.  So
>> I take it that you have no examples of this claimed exposure?
>
> It's not me making the claims.
>

It is you making the claims about what others say.

David Brown

unread,
Nov 24, 2022, 10:03:13 AM11/24/22
to
Certainly the distinction between A and D registers is a
non-orthogonality. But it is just /one/ case, and it really isn't so
big in practice since you have many identical registers in each class.
It's akin to the difference between GP registers and FP registers you
mention below.

(I am not disagreeing with the remark that the 68000 is not entirely
orthogonal - I am disagreeing with the claim that it is at a similar
level to the 8086. And I am jogging happy memories of old processor
architectures!)

>
> Current machines already have GP and float registers to make things more
> difficult, but here there are separate registers for integers - and
> integers that might be used as memory addresses.

Note that there are very good reasons for separating integer and FP
registers, in terms of hardware implementations. It might be nice to
have them merged from the programmer's viewpoint, but it is not worth
the hardware cost. (A similar logic is behind the separate A and D
registers on the m68k architecture.)

>
> So you would have instructions that operated on one set but not the
> other. You'd need to decide whether functions returned values in D0 or A0.
>
> Glancing at the instruction set now, you have ADD which adds to
> everything except A regs; ADDA which /only/ adds to AREGS.
>
> ADDI which adds immed values to everything except AREGS, and ADDQ which
> adds small values (1..8) to everything /including/ AREGS.
>
> Similarly with ANDI, which works for every dest except AREGS, but there
> is no version for AREGS (so if you were playing with tagged pointers and
> needed to clear the bottom bits then use them for an address, it gets
> awkward).
>
> With a compiler, you had to make decisions on whether it's best to start
> evaluating in DREGS or AREGS and then move across, if it involved mixed
> operations that were only available for one set.
>

Yes, there is no doubt that it is a non-orthogonality. But it is a
minor matter in practice. A simple compiler can decide "pointers go in
A registers, everything else goes in D registers". That's it - done.
(To get the maximum efficiency, you'll need more complex register
allocations.)

In comparison to the 8086, it is /nothing/.

> Note that the 80386 processor, which apparently first appeared in 1985,
> removed many of the restrictions of the 8086, also widening the
> registers but not adding any more. Further, these 32-bit additions and
> new address modes were available while running in 16-bit mode within a
> 16-bit application.
>

Yes, the 80386 helped and removed some of the specialisations of the
8086. There were still plenty left, and still plenty of cases where the
use of particular registers was more efficient than others. The x86
world improved gradually in this way, so that the current x86-64 ISA is
vastly better than the 8086.

>
>> You can also understand it by looking at the processor market.  Real
>> CISC with dedicated and specialised registers is dead.  In the
>> progress of x86 through 32-bit and then 64-bit, the architecture
>> became more and more orthogonal - the old registers A, B, C, D, SI,
>> DI, etc., are now no more than legacy alternative names for r0, r1,
>> etc., general purpose registers.
>
> What become completely unorthogonal on x86 is the register naming. It's
> a zoo of mismatched names of mixed lengths. The mapping is also bizarre,
> with the stack pointer somewhere below the middle.

Yes.

>
> (However that is easy to fix as I can use my own register names and
> ordering as well as the official names. My 64-bit registers are called
> D0 to D15, with D15 (aka Dstack) being the stack pointer.)
>

I think it is not uncommon to refer to the registers in x86-64 as r0 to
r15 - that is, the A, B, C, D, DI, SI, SP, and BP registers are renamed,
with the extra 8 registers of x86-64 having never had any other name.

>>>      i8 i16 i32 i64 i128
>>>      u8 u16 u32 u64 u128
>>>      f32 f64 f128
>>
>> Or they can be expressed in a form that everyone understands, like
>> "char", "int", etc., that are defined in the ABI, and that everybody
>> and every language /does/ use when integrating between different
>> languages.
>
> Sorry, but C typenames using C syntax are NOT good enough, not for
> cross-language use. You don't really want to see 'int long unsigned
> long'; you want 'uint64' or 'u64'.

Sorry, but they /are/ good enough for everyone else. The world can't be
expected to change to suit /you/ - it is you who must adapt. (But you
don't have to like it!)

>
> Even C decided that `int` `char` were not good enough by adding types
> like `int32_t` and ... sorry I can't even tell you what `char`
> corresponds to. That is how rubbish C type designations are.

These type names were /added/ to the language - they did not replace the
existing types. People use different type names for different purposes.
I write "int" when "int" is appropriate, and "int32_t" when "int32_t"
is appropriate - it's not a case of one set of names being "better" than
the other.

>
>> That document has no mention anywhere of your personal short names for
>> size-specific types.
>
> It uses names of its own like 'unsigned eightbyte' which unequivocally
> describes the type. However you will see `u64` all over forums; you will
> never see `unsigned eightbyte`, and never 'unsigned long long int'
> outside of C forums or actual C code.
>

Standards documents are not everyday language. (I think I've mentioned
that before.) In everyday use, people tend to use shorter and more
convenient names - though they vary how they balance shortness with
explicitness, and that varies by context. (Programs are not everyday
language either.)

>>   It has a table stating the type names and sizes.
>
> Yes, that helps too. What doesn't help is just using 'long'.
>

It works fine. You read the table of definitions, see that in this
document the word "long" means "64-bit integer".

Standards documents define all kinds of terms and expressions in a
particular manner that applies only within the document (or other formal
texts that refer to the document).


>> Think of it as just a definition of the technical terms used in the
>> document, no different from when one processor reference might define
>> "word" to mean 16 bits and another define "word" to mean 32 bits.
>
> So defining a dozen variations on 'unsigned long long int` is better
> than just using `u64` or `uint64`?
>

Are you confusing the flexible syntax of C with the technical terms in
the ABI document? It sounds a lot like it.

> That must be the reason why a dozen different languages have all adopted
> those C designations because they work so well and are so succinct and
> unambiguous. Oh, hang on...
>

As I said - it might have been better to have names with explicit sizes.
That does not mean that the C terms are not good enough for the job,
regardless of what language you use. And since in the solid majority of
cases where ABI's are used between two languages, at least one of the
languages is C, it seems sensible to use C terms. Why should Rust users
be forced to learn Go's type names in order to use a C library - when
they need to know the C names anyway? Why should Go users need to learn
the names used by Rust?

Think of C like English - the spelling in English is horrible and
inconsistent, and is different depending on which side of the pond you
live. Yet it works extremely well for international communication, and
lets Bulgarians talk to Koreans. Perhaps Esperanto would be a
hypothetically better language, but it's not going to happen in practice.

>
>
>> How does Google manage case-insensitive searches with text in Unicode
>> in many languages?  By being /very/ smart.  I didn't say it was
>> impossible to be case-insensitive beyond plain English alphabet, I
>> said it was an "absolute nightmare".  It is.
>
>
> No, it really isn't. Now you're making things up. You don't need to be
> very smart at all, it's actually very easy.
>

You can do Unicode case-folding based on a table from the Unicode
people. But I think you'll find Google's search engine is a touch more
advanced than that.

>
>
>> It is done where it has to be done - you'll find all major databases
>> have support for doing sorting, searching, and case translation for
>> large numbers of languages and alphabets.  It is a /huge/ undertaking
>> to handle it all.  You don't do it if it is not important.
>
> Think about the 'absolute nightmare' if /everything/ was case sensitive
> and a database has 1000 variations of people called 'David Brown'.
> (There could be 130,000 with my name.)
>
> Now imagine talking over the phone to someone, they create an account in
> the name you give them, but they use or omit capitalisation you weren't
> aware of. How would you log in?
>

I have no idea what you are going on about.

Some things in life need to be flexible and deal with variations such as
spelling differences, capitalisation differences, etc.

Other things can and should be precise and unambiguous.

So when programming, you say /exactly/ what you mean. You don't write
"call fooo a few times" and expect it to be obvious to the computer how
many is "a few" and that you really meant "foo". You write "for i = 1
to 5 do foo()", or whatever the language in question expects.

I expect a compiler to demand precision from the code I write.
Accepting muddled letter case is setting the standard too low IMHO - I
want a complaint if I write "foo" one place and "Foo" another. Of
course I can live with such weaknesses in a language, and set higher
standards for my own code than the language allows - I do that for all
coding, as I think most people do. But I see no advantage in having
weak identifier matching in a programming language - it adds nothing to
code readability, allows poor coders to make more of a mess, and
generally allows a totally unnecessary inconsistency.

I see /no/ advantages in being able to write "foo" when defining an
identifier and "Foo" or "FOO" when using it. It is utterly pointless.
(It is a different matter to say that if you have defined an identifier
"foo" then you may not define a separate one written "Foo", disallowing
identifiers that differ only in case. I could appreciate wanting that
as a feature.)


And I cannot see any contradiction between wanting case sensitivity when
writing code while having no cases chatting to a human on the phone.

>
>> Name just /one/ real programming language that supports
>> case-insensitive identifiers
>
> I'm not talking about Unicode identifiers. I wouldn't go there becase
> there are too many issues. For a start, which of the 1.1 million
> characters should be allowed at the beginning, and which within an
> identifier?
>
>> but is not restricted to ASCII.  (Let's define "real programming
>> language" as a programming language that has its own Wikipedia entry.)
>>
>> There are countless languages that have case-sensitive Unicode
>> identifiers, because that's easy to implement and useful for programmers.
>
> And also a nightmare, since there are probably 20 distinct characters
> that share the same glyph as 'A'.
>
> Adding Unicode to identifiers is too easy to do badly.
>

It is another case of a feature that can be used or abused. You pick
the balance you want, accepting that either choice is a trade-off.

>
>>
>> Domain names are case insensitive if they are in ASCII.
>
> Because?

Who cares? They are domain names, not program code.

>
>> For other characters, it gets complicated.
>
> So, the same situation with language keywords and commands in CLIs.
>

No, these are case sensitive - except for systems that haven't grown up
since lower case letters were invented.

> But hey, think of the advantage of having Sort and sorT working in
> decreasing/increasing order; no need to specify that separately. Plus
> you have 14 more variations to apply meanings to. Isn't this the point
> of being case-sensitive?
>
> Because if it isn't, then I don't get it. On Windows, I can type 'sort'
> or `SORT`, it doesn't matter. I don't even need to look at the screen or
> have to double-check caps lock.
>
> BTW my languages (2 HLLs and one assembler) use case-insensitive
> identifiers and keywords, but allow case-sensitive names when they are
> sometimes needed, mainly to with working with FFIs.
>
> It really isn't hard at all.

It really isn't hard to write "sort".

>
>> Programmers are not "most people".  Programs are not "most things in
>> everyday life".
>>
>> Most people are quite tolerant of spelling mistakes in everyday life -
>> do you think programming languages should be too?
>
> Using case is not a spelling mistake; it's a style. In my languages,
> someone can write 'int', 'Int' or 'INT' according to preference.
>

No, it is a mess.

But of course, it is not a problem in your language - personal
preferences are entirely consistent there.

And in serious languages that are case-insensitive, such as Ada, people
stick strongly to the conventions and write their identifiers with
consistent casing. Which leaves everyone wondering what the point is of
being case-insensitive - it's just a historical mistake that can't be
changed.

> Or that can use CamelCase if they like that, but someone importing such
> a function can just write camelcase if they hate the style.
>
> I use upper case when writing debug code so that I can instantly
> identify it.
>
>
>> They do exist, yes.  That does not make them a good idea.
>
> Yes, it does. How do you explain to somebody why using exact case is
> absolutely essential, when it clearly shouldn't matter?
>
>
> Look: I create my own languages, yes? And I could have chosen at any
> time to make them case sensitive, yes?
>
> So why do you think I would choose to make like an 'absolute nightmare'
> for myself?
>

You didn't use Unicode - which is where the implementation gets hard.
There's no difficulty in implementing case insensitive keywords and
identifiers in plain ASCII - there's just no advantage to it (unless you
call being able to write an inconsistent mess an advantage).

David Brown

unread,
Nov 24, 2022, 10:07:36 AM11/24/22
to
That would be a Jaguar with half the engine cylinders removed. Still
very comfortable and stylish, but a good deal less power :-)

And current AMD and Intel chips are bathtubs with wheels and rocket engines!

(The great thing about car analogies is how much they can be abused...)




Dmitry A. Kazakov

unread,
Nov 24, 2022, 11:22:16 AM11/24/22
to
On 2022-11-24 16:07, David Brown wrote:

> And current AMD and Intel chips are bathtubs with wheels and rocket
> engines!

Judging by how they screech ... there is no wheels. (:-))

> (The great thing about car analogies is how much they can be abused...)

OT. I remember the story of a guy who installed the rocket engine on a,
I believe, VW Beetle and honorably died riding his invention. Death by
Rock and Roll, as Pretty Reckless sung...
/OT

Bart

unread,
Nov 24, 2022, 11:55:07 AM11/24/22
to
Allowing int Int INT to be three distinct types, or to represent types,
variables, functions etc, is perfectly fine?

int Int = INT;

You can make a case for case-sensitivity within the confines of a
programming language syntax which will have lots of other rules too.

But on the other side of the user interface where it applies to user
commands, user inputs and file systems, it makes a lot less sense and
becomes user-unfriendly. People unnecessarily need to remember the exact
capitalisation of that file, otherwise they might never find it again.


Here's what I don't like about case-sensitivity:

* Somebody else makes capitalisation style choices I don't like, but I
have use exactly the same style

* I have to remember the exact capitalisation used, instead of just
remembering the sound of the identifier used, which very often I can't,
I have to keep referring back to see what it was

* With poor choices of capitalisation, source code can look chaotic (I
mentioned elsewhere that it can look like Mime64 encoded text)

* Often the same letters are used for distinct identifiers which differ
only in capitalisation, sometimes very subtly (I can give loads of
examples).

* Often identifiers are used that are the same as reserved words, but
again differ only in case

* I can't use my way of writing temporary and/or debug code in capitals.


If you don't like the idea that case-insensitivity allows people to use
inconsistent case to refer to the same identifier like abc, Abc, aBC
(which rarely happens except for all-caps), then a compiler could
enforce consistent case.

With the important difference from case-sensitivity in that you can't write:

int Int = INT;

You have to be a bit more creative.

James Harris

unread,
Nov 24, 2022, 12:58:56 PM11/24/22
to
On 24/11/2022 16:55, Bart wrote:
> On 24/11/2022 15:03, David Brown wrote:
>> On 23/11/2022 23:42, Bart wrote:
>
>>> Using case is not a spelling mistake; it's a style. In my languages,
>>> someone can write 'int', 'Int' or 'INT' according to preference.
>>>
>>
>> No, it is a mess.
>
> Allowing int Int INT to be three distinct types, or to represent types,
> variables, functions etc, is perfectly fine?
>
>     int Int = INT;

Contrast

MyVal := a
myVal := myval + b

Are you happy for a language to allow so much inconsistency?


--
James Harris


Dmitry A. Kazakov

unread,
Nov 24, 2022, 1:02:26 PM11/24/22
to
Make it

MyVal := a
myVal := MyVal + b

better be case-sensitive?

James Harris

unread,
Nov 24, 2022, 1:07:40 PM11/24/22
to
My point (to you and Bart) is that programmers can choose identifier
names so the latter example need not arise unless it is written
deliberately; but if the compiler folds case then programmers can
/mistype/ names accidentally, leading to the messy inconsistency
mentioned above.


--
James Harris


Dmitry A. Kazakov

unread,
Nov 24, 2022, 1:39:53 PM11/24/22
to
On 2022-11-24 19:07, James Harris wrote:
> On 24/11/2022 18:02, Dmitry A. Kazakov wrote:
>> On 2022-11-24 18:56, James Harris wrote:
>>> On 24/11/2022 16:55, Bart wrote:
>>>> On 24/11/2022 15:03, David Brown wrote:
>>>>> On 23/11/2022 23:42, Bart wrote:
>>>>
>>>>>> Using case is not a spelling mistake; it's a style. In my
>>>>>> languages, someone can write 'int', 'Int' or 'INT' according to
>>>>>> preference.
>>>>>>
>>>>>
>>>>> No, it is a mess.
>>>>
>>>> Allowing int Int INT to be three distinct types, or to represent
>>>> types, variables, functions etc, is perfectly fine?
>>>>
>>>>      int Int = INT;
>>>
>>> Contrast
>>>
>>>    MyVal := a
>>>    myVal := myval + b
>>>
>>> Are you happy for a language to allow so much inconsistency?
>>
>> Make it
>>
>>      MyVal := a
>>      myVal := MyVal + b
>>
>> better be case-sensitive?
>
> My point (to you and Bart) is that programmers can choose identifier
> names so the latter example need not arise unless it is written
> deliberately;

Why did you suggest an error? The point is, you could not know. Nobody
could.

> but if the compiler folds case then programmers can
> /mistype/ names accidentally, leading to the messy inconsistency
> mentioned above.

Same question. Why do you think that the example I gave was mistyped?

In a case-insensitive language mistyping the case has no effect on the
program legality. Any decent IDE enforces preferred case style.

Moreover, tools for the case-sensitive languages like C++ do just the
same. You cannot have reasonable names in C++ anymore. There would be
lurking clang-format or SonarQube configured to force something a three
year old suffering dyslexia would pen... (:-))

Bart

unread,
Nov 24, 2022, 1:42:53 PM11/24/22
to
This is inconsistency of style, something which doesn't affect the
meaning of the code. Languages already allow that:

MyVal:=a
myVal := myval +b

Maybe there can be a tool to warn about this or tidy it up for you, but
I don't believe it should be the job of the language, or compiler.

However I did suggest that a case-sensitive language /could/ enforce
consistency across identifiers intended to be identical.

And as DAK said, you can have inconsistencies in case-sensitive code
that are actually dangerous. It took me a few seconds to realise the
second `MyVal` had a small `m` so would be a different identifier.

In a language with declarations, perhaps that would be picked up (unless
it was Go, then := serves to creat declare a new variable). In dynamic
ones, `myVal` would be silently created as a fresh variable.

That can happen with case-insensitivity too, but you have to actually
misspell the name, not just use the wrong capitalisation.


Here are some examples from sqlite3.c of names which are identical
except for subtle differences of case:

(walCkptInfo,WalCkptInfo)
(walIndexHdr,WalIndexHdr)
(wrflag,wrFlag)
(writeFile,WriteFile)
(xHotSpot,xHotspot)
(yHotspot,yHotSpot)
(yymajor,yyMajor)
(yyminor,yyMinor)
(zErrMsg,zErrmsg)
(zSql,zSQL)

Try to spot the differences. Remember that in a real program, it will be
much busier, and these names haven't been pre-selected and helpfully
placed side by side! Usually you will see them in isolation.

James Harris

unread,
Nov 24, 2022, 1:55:26 PM11/24/22
to
All of the above are examples of poor code - from Int, INT, int to
MyVal, myVal, myval. The difference is that in a case-sensitive language
(such as C) a programmer would have deliberately to choose daft names to
engineer the mess; whereas in a language which ignores case (such as
Ada) the mess can come about accidentally, via typos.

>
> In a case-insensitive language mistyping the case has no effect on the
> program legality. Any decent IDE enforces preferred case style.
>
> Moreover, tools for the case-sensitive languages like C++ do just the
> same. You cannot have reasonable names in C++ anymore. There would be
> lurking clang-format or SonarQube configured to force something a three
> year old suffering dyslexia would pen... (:-))

As you suggest, for languages which ignore case extra tools are needed
to help tidy up the code.


--
James Harris


Bart

unread,
Nov 24, 2022, 2:07:04 PM11/24/22
to
On 24/11/2022 18:55, James Harris wrote:
> On 24/11/2022 18:39, Dmitry A. Kazakov wrote:
>> On 2022-11-24 19:07, James Harris wrote:

>>> My point (to you and Bart) is that programmers can choose identifier
>>> names so the latter example need not arise unless it is written
>>> deliberately;
>>
>> Why did you suggest an error? The point is, you could not know. Nobody
>> could.
>>
>>> but if the compiler folds case then programmers can /mistype/ names
>>> accidentally, leading to the messy inconsistency mentioned above.
>>
>> Same question. Why do you think that the example I gave was mistyped?
>
> All of the above are examples of poor code - from Int, INT, int to
> MyVal, myVal, myval. The difference is that in a case-sensitive language
> (such as C) a programmer would have deliberately to choose daft names to
> engineer the mess;

They do. I gave examples in my other post. But this kind of idiom I find
annoying:

Image image;
Colour colour; //(At least it's not colour color!)
Matrix matrix;

(Actual examples from the Raylib API. Which also cause grief when ported
to my case-insensitive syntax, yet another problem.)

> whereas in a language which ignores case (such as
> Ada) the mess can come about accidentally, via typos.

Using the wrong case isn't really a typo. A real typo would yield the
wrong letters

Using the wrong case is harmless. At some point, the discrepancy in
style, if not intentional, will be discovered and fixed.


>>
>> In a case-insensitive language mistyping the case has no effect on the
>> program legality. Any decent IDE enforces preferred case style.
>>
>> Moreover, tools for the case-sensitive languages like C++ do just the
>> same. You cannot have reasonable names in C++ anymore. There would be
>> lurking clang-format or SonarQube configured to force something a
>> three year old suffering dyslexia would pen... (:-))
>
> As you suggest, for languages which ignore case extra tools are needed
> to help tidy up the code.

Possibly; I've never actually needed to in 46 years of case-insensitive
coding. But I also use upper case for emphasis.


David Brown

unread,
Nov 24, 2022, 2:23:39 PM11/24/22
to
Of course you could know, if the language requires variables to be
declared before usage. Using C syntax for consistency here:

int Myval = 1;
myval = 2;

In a case-sensitive language, that is clearly a typo by the programmer,
and it is a compile-time error. In a case-insensitive language, it's an
inconsistent mess that is perfectly acceptable to the compiler and no
one can tell if it is intentional or not because the language is quite
happy with different choices of cases.

int Myval = 1;
int myval = 2;

In a case-sensitive language, it is legal but written by an
intentionally bad programmer - and no matter how hard you try, bad
programmers will find a way to write bad code. In a case-insensitive
language, it is an error written intentionally by a bad programmer.

Give me the language that helps catch typos, not the language that is
happy with an inconsistent jumble.

>
>> but if the compiler folds case then programmers can /mistype/ names
>> accidentally, leading to the messy inconsistency mentioned above.
>
> Same question. Why do you think that the example I gave was mistyped?
>
> In a case-insensitive language mistyping the case has no effect on the
> program legality.

I prefer mistypes to be considered errors where possible.

> Any decent IDE enforces preferred case style.
>

A good IDE is nice - a good language choice is better.

> Moreover, tools for the case-sensitive languages like C++ do just the
> same. You cannot have reasonable names in C++ anymore. There would be
> lurking clang-format or SonarQube configured to force something a three
> year old suffering dyslexia would pen... (:-))
>

Some people know how to use tools properly.



David Brown

unread,
Nov 24, 2022, 2:28:35 PM11/24/22
to
On 24/11/2022 20:07, Bart wrote:
> On 24/11/2022 18:55, James Harris wrote:
>> On 24/11/2022 18:39, Dmitry A. Kazakov wrote:
>>> On 2022-11-24 19:07, James Harris wrote:
>
>>>> My point (to you and Bart) is that programmers can choose identifier
>>>> names so the latter example need not arise unless it is written
>>>> deliberately;
>>>
>>> Why did you suggest an error? The point is, you could not know.
>>> Nobody could.
>>>
>>>> but if the compiler folds case then programmers can /mistype/ names
>>>> accidentally, leading to the messy inconsistency mentioned above.
>>>
>>> Same question. Why do you think that the example I gave was mistyped?
>>
>> All of the above are examples of poor code - from Int, INT, int to
>> MyVal, myVal, myval. The difference is that in a case-sensitive
>> language (such as C) a programmer would have deliberately to choose
>> daft names to engineer the mess;
>
> They do. I gave examples in my other post. But this kind of idiom I find
> annoying:
>
>     Image image;
>     Colour colour;     //(At least it's not colour color!)
>     Matrix matrix;
>

I have never come across any programmer for any language that does not
find some commonly-used idioms or coding styles annoying.

I bet that even you, who are the only programmer for the languages you
yourself designed, can look back at old code and think some of your own
idioms are annoying.

> (Actual examples from the Raylib API. Which also cause grief when ported
> to my case-insensitive syntax, yet another problem.)
>

Do not blame C for the deficiencies in your language or your use of it!

>> whereas in a language which ignores case (such as Ada) the mess can
>> come about accidentally, via typos.
>
> Using the wrong case isn't really a typo. A real typo would yield the
> wrong letters

The wrong case is either a typo, or appalling lack of attention to
detail and care of code quality.

>
> Using the wrong case is harmless. At some point, the discrepancy in
> style, if not intentional, will be discovered and fixed.
>

With a decent language it will be discovered as soon as you compile (if
not before, when you use a good editor).

David Brown

unread,
Nov 24, 2022, 2:33:29 PM11/24/22
to
On 24/11/2022 17:22, Dmitry A. Kazakov wrote:
> On 2022-11-24 16:07, David Brown wrote:
>
>> And current AMD and Intel chips are bathtubs with wheels and rocket
>> engines!
>
> Judging by how they screech ... there is no wheels. (:-))
>
>> (The great thing about car analogies is how much they can be abused...)
>
> OT. I remember the story of a guy who installed the rocket engine on a,
> I believe, VW Beetle and honorably died riding his invention. Death by
> Rock and Roll, as Pretty Reckless sung...
> /OT
>

Died honourably, or died horribly? It's easy to mix these up!

<https://darwinawards.com/darwin/darwin1995-04.html>



Bart

unread,
Nov 24, 2022, 3:01:01 PM11/24/22
to
On 24/11/2022 19:28, David Brown wrote:
> On 24/11/2022 20:07, Bart wrote:

>> They do. I gave examples in my other post. But this kind of idiom I
>> find annoying:
>>
>>      Image image;
>>      Colour colour;     //(At least it's not colour color!)
>>      Matrix matrix;
>>
>
> I have never come across any programmer for any language that does not
> find some commonly-used idioms or coding styles annoying.

I can port all my identifiers to a case-sensitive language with no
clashes. I can't guarantee no clashes when porting from case-sensitive
to case-insensitive.

Which would be less hassle?

I don't like the C idiom because if I read it in my head it sounds stupid.

>
> I bet that even you, who are the only programmer for the languages you
> yourself designed, can look back at old code and think some of your own
> idioms are annoying.

Some of my code layout styles (like 1-space indents) look dreadful, yes.
But I can fix that with two keypresses.


>> (Actual examples from the Raylib API. Which also cause grief when
>> ported to my case-insensitive syntax, yet another problem.)
>>
>
> Do not blame C for the deficiencies in your language or your use of it!
>
>>> whereas in a language which ignores case (such as Ada) the mess can
>>> come about accidentally, via typos.
>>
>> Using the wrong case isn't really a typo. A real typo would yield the
>> wrong letters
>
> The wrong case is either a typo, or appalling lack of attention to
> detail and care of code quality.
>
>>
>> Using the wrong case is harmless. At some point, the discrepancy in
>> style, if not intentional, will be discovered and fixed.
>>
>
> With a decent language it will be discovered as soon as you compile (if
> not before, when you use a good editor).

You're assuming it's an error. But when I call the Windows MessageBoxA
function from my language, I write it like this:

messageboxa(message:"Hello")


In C, it /must/ be written as:

MessageBoxA(0, "Hello", "Caption etc", 0);

So I use my choice of capitalisation (which is usually none; I just
don't care), and I've added keyword parameters to the declaration.

That means that, given a choice of what to with lower and upper case
letters, I've selected different priorities, since I place little value
on writing code like this:

struct Foo Foo[FOO] = {foo};

Clearly, you have a different opinion.


James Harris

unread,
Nov 24, 2022, 3:13:05 PM11/24/22
to
Well, some of those would happen regardless of the case sensitivity of
the language. For example, in the version of sqlite3.c I found online I
saw that some routines use wrFlag and others use wrflag. From a quick
look I cannot see any routine which uses both. Such a discrepancy
wouldn't be picked up whether the language was case sensitive or not.
Also, it looks as though zSQL is used only in comments. I don't know any
language which would check that names in comments match those in code.

Nevertheless, I take your point. A programmer /could/ unwisely choose to
use names which differed only by the case of one letter.

Here's a suggestion: make the language case sensitive and have the
compiler reject programs which give access to two names with no changes
other than case, such that Myvar and myvar could not be simultaneously
accessible.


--
James Harris


Bart

unread,
Nov 24, 2022, 3:52:04 PM11/24/22
to
On 24/11/2022 20:13, James Harris wrote:
> On 24/11/2022 18:42, Bart wrote:

>> Here are some examples from sqlite3.c of names which are identical
>> except for subtle differences of case:
>>
>> (walCkptInfo,WalCkptInfo)
>> (walIndexHdr,WalIndexHdr)
>> (wrflag,wrFlag)
>> (writeFile,WriteFile)
>> (xHotSpot,xHotspot)
>> (yHotspot,yHotSpot)
>> (yymajor,yyMajor)
>> (yyminor,yyMinor)
>> (zErrMsg,zErrmsg)
>> (zSql,zSQL)
>>
>> Try to spot the differences. Remember that in a real program, it will
>> be much busier, and these names haven't been pre-selected and
>> helpfully placed side by side! Usually you will see them in isolation.
>
> Well, some of those would happen regardless of the case sensitivity of
> the language. For example, in the version of sqlite3.c I found online  I
> saw that some routines use wrFlag and others use wrflag. From a quick
> look I cannot see any routine which uses both. Such a discrepancy
> wouldn't be picked up whether the language was case sensitive or not.
> Also, it looks as though zSQL is used only in comments.

zSQL occurs here:

SQLITE_API int sqlite3_declare_vtab(sqlite3*, const char *zSQL);

which is between two block comments but is not itself insid a comment.

I have done analysis in the past which tried to detect whether any of
those pairs occurred within the same function; I think one or two did,
but is too much effort to repeat now.

(The whole list is 200 entries; 3 of them have 3 variations on the same
name:

(hmenu,hMenu,HMENU)
(next,nExt,Next)
(short,Short,SHORT)
)

But this is just to show that such variances can occur, especially in
the longer names where the difference is subtle.

These are likely to create a lot of confusion, if you type the wrong
capitalisation because you assume xHotSpot style rather then xHotspot.
Or even if you're just browsing the code: was this the same name I saw a
minute ago? No; one has a small s the other a big S, they just sound the
same when you say them out loud (or in your head!).

Such confusion /has/ to be less when xHotSpot, xHotspot, plus the other
62 (I think) variations have to be the same identifier, 'xhotspot' when
normalised.

> Nevertheless, I take your point. A programmer /could/ unwisely choose to
> use names which differed only by the case of one letter.

In C this happens /all the time/. It's almost a requirement. When I
translated OpenGL headers, then many macro names shared the same
identifers with functions if you took away case.

>
> Here's a suggestion: make the language case sensitive and have the
> compiler reject programs which give access to two names with no changes
> other than case, such that Myvar and myvar could not be simultaneously
> accessible.

Apart from not being able to do this:

Colour colour;

what would be the point of case sensitivity in this case? Or would the
restriction not apply to types? What about a variable that clashes with
a reserved word when case is ignored?

(BTW my syntax can represent the above as:

`Colour colour

The backtick is case-preserving, and also allows names that clash with
reserved words. But I don't want to have to write that in code; this is
for automatic translation tools, or for a one-off.)

Dmitry A. Kazakov

unread,
Nov 24, 2022, 4:35:58 PM11/24/22
to
On 2022-11-24 19:55, James Harris wrote:

> All of the above are examples of poor code - from Int, INT, int to
> MyVal, myVal, myval.

Which you want to make legal, right? Again,

If int and INT shall never mean two different entities, why do you let them?

> The difference is that in a case-sensitive language
> (such as C) a programmer would have deliberately to choose daft names to
> engineer the mess; whereas in a language which ignores case (such as
> Ada) the mess can come about accidentally, via typos.

That is evidently wrong. Why exactly

int INT;

must be legal?

>> Moreover, tools for the case-sensitive languages like C++ do just the
>> same. You cannot have reasonable names in C++ anymore. There would be
>> lurking clang-format or SonarQube configured to force something a
>> three year old suffering dyslexia would pen... (:-))
>
> As you suggest, for languages which ignore case extra tools are needed
> to help tidy up the code.

While 99% of all these tools were developed specifically for
case-sensitive languages? Come on!

James Harris

unread,
Nov 24, 2022, 4:47:07 PM11/24/22
to
Yes, although it's a function declaration; the presumably incorrectly
typed identifier zSQL is ignored. It's simply not used. Two comments. IMO:

1. Forward declarations should not be needed.

2. Parameter names should be part of the interface.

>
> I have done analysis in the past which tried to detect whether any of
> those pairs occurred within the same function; I think one or two did,
> but is too much effort to repeat now.
>
> (The whole list is 200 entries; 3 of them have 3 variations on the same
> name:
>
>    (hmenu,hMenu,HMENU)
>    (next,nExt,Next)
>    (short,Short,SHORT)
> )
>
> But this is just to show that such variances can occur, especially in
> the longer names where the difference is subtle.

Similar could be said for any names which differed only slightly.

>
> These are likely to create a lot of confusion, if you type the wrong
> capitalisation because you assume xHotSpot style rather then xHotspot.
> Or even if you're just browsing the code: was this the same name I saw a
> minute ago? No; one has a small s the other a big S, they just sound the
> same when you say them out loud (or in your head!).
>
> Such confusion /has/ to be less when xHotSpot, xHotspot, plus the other
> 62 (I think) variations have to be the same identifier, 'xhotspot' when
> normalised.

I think I'd prefer the compiler to reject the code. Then one would know
that compilable code had no such problems - whether the language was
case sensitive or case insensitive.

>
>> Nevertheless, I take your point. A programmer /could/ unwisely choose
>> to use names which differed only by the case of one letter.
>
> In C this happens /all the time/. It's almost a requirement. When I
> translated OpenGL headers, then many macro names shared the same
> identifers with functions if you took away case.

One cannot stop programmers doing daft things. For example, a programmer
could declare names such as

CreateTableForwandReference

and

createtableforwardrefarence

The differences are not obvious. Nor would it be easy to get a compiler
to complain about the similarity.

IOW we can help but we cannot stop programmers doing unwise things.

>
>>
>> Here's a suggestion: make the language case sensitive and have the
>> compiler reject programs which give access to two names with no
>> changes other than case, such that Myvar and myvar could not be
>> simultaneously accessible.
>
> Apart from not being able to do this:
>
>     Colour colour;
>
> what would be the point of case sensitivity in this case? Or would the
> restriction not apply to types? What about a variable that clashes with
> a reserved word when case is ignored?

I'd better not comment on type names here. It is a big issue, and one on
which I may be coming round to a different point of view. Identifiers
which clashed with reserved words sounds like a good thing to prohibit
on one condition: there is some way to add reserved words to a later
version of the language without potentially breaking lots of existing code.

Maybe the best a language designer can do for cases such as this is to
help reduce the number of different names a programmer would have to
define in any given location.


--
James Harris


James Harris

unread,
Nov 24, 2022, 4:50:35 PM11/24/22
to
On 24/11/2022 21:35, Dmitry A. Kazakov wrote:
> On 2022-11-24 19:55, James Harris wrote:
>
>> All of the above are examples of poor code - from Int, INT, int to
>> MyVal, myVal, myval.
>
> Which you want to make legal, right?

No.

...

> That is evidently wrong. Why exactly
>
>    int INT;
>
> must be legal?

I didn't say it should be.

>>> Moreover, tools for the case-sensitive languages like C++ do just the
>>> same. You cannot have reasonable names in C++ anymore. There would be
>>> lurking clang-format or SonarQube configured to force something a
>>> three year old suffering dyslexia would pen... (:-))
>>
>> As you suggest, for languages which ignore case extra tools are needed
>> to help tidy up the code.
>
> While 99% of all these tools were developed specifically for
> case-sensitive languages? Come on!

It's a personal view but IMO a language should be independent of, and
should not rely on, IDEs or special editors.


--
James Harris


Dmitry A. Kazakov

unread,
Nov 24, 2022, 4:58:15 PM11/24/22
to
I meant some fancy language where no declarations needed. But OK, take this:

int MyVal = a;
int myVal = MyVal + b;

How do you know?

>     int Myval = 1;
>     int myval = 2;
>
> In a case-sensitive language, it is legal but written by a > intentionally bad programmer - and no matter how hard you try, bad
> programmers will find a way to write bad code.  In a case-insensitive
> language, it is an error written intentionally by a bad programmer.
>
> Give me the language that helps catch typos, not the language that is
> happy with an inconsistent jumble.

declare
Myval : Integer := 1;
myval : Integer := 2;
begin

This is illegal in Ada.

>>> but if the compiler folds case then programmers can /mistype/ names
>>> accidentally, leading to the messy inconsistency mentioned above.

A programmer cannot mistype names if the language is case-sensitive?

Purely statistically your argument makes no sense. Since the set of
unique identifiers in a case-insensitive language is by order of
magnitude narrower, any probability of mess/error etc is also less under
equivalent conditions.

The only reason to have case-sensitive identifiers is for having
homographs = for producing mess.

> I prefer mistypes to be considered errors where possible.

And I gave more or less formal proof why case-insensitive languages are
better here.

>> Moreover, tools for the case-sensitive languages like C++ do just the
>> same. You cannot have reasonable names in C++ anymore. There would be
>> lurking clang-format or SonarQube configured to force something a
>> three year old suffering dyslexia would pen... (:-))
>
> Some people know how to use tools properly.

These people don't buy them and thus do not count... (:-))

Dmitry A. Kazakov

unread,
Nov 24, 2022, 5:00:57 PM11/24/22
to
On 2022-11-24 22:50, James Harris wrote:
> On 24/11/2022 21:35, Dmitry A. Kazakov wrote:
>> On 2022-11-24 19:55, James Harris wrote:
>>
>>> All of the above are examples of poor code - from Int, INT, int to
>>> MyVal, myVal, myval.
>>
>> Which you want to make legal, right?
>
> No.
>
> ...
>
>> That is evidently wrong. Why exactly
>>
>>     int INT;
>>
>> must be legal?
>
> I didn't say it should be.

But it is. q.e.d.

>>>> Moreover, tools for the case-sensitive languages like C++ do just
>>>> the same. You cannot have reasonable names in C++ anymore. There
>>>> would be lurking clang-format or SonarQube configured to force
>>>> something a three year old suffering dyslexia would pen... (:-))
>>>
>>> As you suggest, for languages which ignore case extra tools are
>>> needed to help tidy up the code.
>>
>> While 99% of all these tools were developed specifically for
>> case-sensitive languages? Come on!
>
> It's a personal view but IMO a language should be independent of, and
> should not rely on, IDEs or special editors.

Yet you must rely on them in order to prevent:

int INT;

Bart

unread,
Nov 24, 2022, 5:51:36 PM11/24/22
to
On 24/11/2022 21:47, James Harris wrote:
> On 24/11/2022 20:52, Bart wrote:

> Yes, although it's a function declaration; the presumably incorrectly
> typed identifier zSQL is ignored.

We don't know the purpose of zSQL. But the point is it is there, a
slightly differently-case version of with the same name, which I can't
for the life of me recall right now. That is the problem.

(If I look back, it is zSql. But if now encounter these even in 10
minutes time, which one would be which? I would forget.)



> Similar could be said for any names which differed only slightly.

Say short, Short and SHORT out loud; any difference?

You're debugging some code and need to print out the value of hmenu. Or
is hMenu or Hmenu? Personally I am half-blind to case usage because it
so commonly irrelevant and ignored in English.

I would be constantly double-checking and constantly getting it wrong
too. And that's with just one of these three in place.

Differences in spelling are another matter; I'm a good speller.

You might notice when the cup you're handed in Starbucks has Janes
rather than James and would want to check it is yours; but you probably
wouldn't care if its james or James or JAMES because that is just style.
You know they are all the same name.

But also, just because people can make typos by pressing the wrong
letter or being the wrong length doesn't make allowing 2**N more
incorrect possibilities acceptable.

>> In C this happens /all the time/. It's almost a requirement. When I
>> translated OpenGL headers, then many macro names shared the same
>> identifers with functions if you took away case.
>
> One cannot stop programmers doing daft things. For example, a programmer
> could declare names such as
>
>   CreateTableForwandReference
>
> and
>
>   createtableforwardrefarence
>
> The differences are not obvious.

So to fix it, we allow

CreateTableForwardRefarence

createtableforwandreference

as synonyms? Case sensitive, you have subtle differences in letters
/plus/ subtle differences in case!

> Maybe the best a language designer can do for cases such as this is to
> help reduce the number of different names a programmer would have to
> define in any given location.

Given the various restrictions you've mentioned that you'd want even
with case sensitive names, is there any point to having case
sensitivity? What would it be used for; what would it allow?

I had a scripting language that shipped with my applications. While case
insensitive, I would usually write keywords in lower case as if, then,
while.

But some users would write If, Then, While, and make more use in
identifiers of mixed case. And they would capitalise global variables
that I defined in lower case.

It provided a choice.


David Brown

unread,
Nov 25, 2022, 2:52:16 AM11/25/22
to
On 24/11/2022 21:01, Bart wrote:
> On 24/11/2022 19:28, David Brown wrote:
>> On 24/11/2022 20:07, Bart wrote:
>
>>> They do. I gave examples in my other post. But this kind of idiom I
>>> find annoying:
>>>
>>>      Image image;
>>>      Colour colour;     //(At least it's not colour color!)
>>>      Matrix matrix;
>>>
>>
>> I have never come across any programmer for any language that does not
>> find some commonly-used idioms or coding styles annoying.
>
> I can port all my identifiers to a case-sensitive language with no
> clashes. I can't guarantee no clashes when porting from case-sensitive
> to case-insensitive.
>
> Which would be less hassle?
>

I realise this is not the answer you want, but here goes - nobody cares!

It is not the fault of /C/ that /you/ have made a language that does not
support direct literal translations from C.

Honestly, of all your arguments against C (some of which are valid and
reasonable), and all your arguments against case sensitivity, this is
the most pathetic. Get over yourself - the world does not revolve
around you or your language, and nobody gives a **** if you have to put
slightly more effort into your porting tasks.

> I don't like the C idiom because if I read it in my head it sounds stupid.
>

Even that ranks miles above whinging about porting.

> That means that, given a choice of what to with lower and upper case
> letters, I've selected different priorities, since I place little value
> on writing code like this:
>
>     struct Foo Foo[FOO] = {foo};
>
> Clearly, you have a different opinion.

Clearly you prefer to form your own opinions on people rather than
bothering to read anything.

David Brown

unread,
Nov 25, 2022, 3:18:27 AM11/25/22
to
On 24/11/2022 22:35, Dmitry A. Kazakov wrote:
> On 2022-11-24 19:55, James Harris wrote:
>
>> All of the above are examples of poor code - from Int, INT, int to
>> MyVal, myVal, myval.
>
> Which you want to make legal, right? Again,
>
> If int and INT shall never mean two different entities, why do you let
> them?

If int and INT are poor style when referring to the same entities, why
do you let them?


Either choice of case sensitivity or case insensitivity allows abuuses.
But case sensitive makes accidental misuse far more likely to be
caught by the compiler, and it allows more possibilities. In comparison
to case insensitive languages, it is one step back and two steps forward
- a clear win.


Of course there are other options as well, which are arguably better
than either of these. One is to say you have to get the case right for
consistency, but disallow identifiers that differ only in case. (I
think that's what you get with Ada along with appropriate tools or
compiler warning flags. There are also C tools for spotting confusing
identifiers.)

Another is to say that the case is significant. This is often done in C
by convention - all-caps for macros is a very common convention, and in
C++ it is quite common to use initial caps for classes. Some languages
enforce such rules, making "Int int" perfectly clear as "Int" is
guaranteed to be a type, while "int" is guaranteed to be an object.


>
>> The difference is that in a case-sensitive language (such as C) a
>> programmer would have deliberately to choose daft names to engineer
>> the mess; whereas in a language which ignores case (such as Ada) the
>> mess can come about accidentally, via typos.
>
> That is evidently wrong.

What exactly is wrong about my statement? "int INT;" is an example of
deliberately daft names. Legislating against stupidity or malice is
/very/ difficult. Legislating against accidents and inconsistency is
easier, and a better choice.

> Why exactly
>
>    int INT;
>
> must be legal?

If a language can make such things illegal, great - but /not/ at the
cost of making "int a; INT b; inT c = A + B;" legal.


>
>>> Moreover, tools for the case-sensitive languages like C++ do just the
>>> same. You cannot have reasonable names in C++ anymore. There would be
>>> lurking clang-format or SonarQube configured to force something a
>>> three year old suffering dyslexia would pen... (:-))
>>
>> As you suggest, for languages which ignore case extra tools are needed
>> to help tidy up the code.
>
> While 99% of all these tools were developed specifically for
> case-sensitive languages? Come on!
>

I see no problem with using extra tools, or extra compiler warnings, to
improve code quality or catch errors. Indeed, I am a big fan of them.
As a fallible programmer I like all the help I can get, and I like it as
early in the process as possible (such as smart editors or IDEs).

However, I am aware that not all programmers are equally concerned with
writing good code, so the more a language enforces good quality, the better.

(And before Bart chimes in with examples of the nonsense gcc accepts as
"valid C" when no flags are given, I would prefer toolchains had
stringent extra checks by default and only allow technically legal but
poor style code if given explicit flags.)

David Brown

unread,
Nov 25, 2022, 3:21:05 AM11/25/22
to
On 24/11/2022 23:00, Dmitry A. Kazakov wrote:
> On 2022-11-24 22:50, James Harris wrote:
>> On 24/11/2022 21:35, Dmitry A. Kazakov wrote:
>>> On 2022-11-24 19:55, James Harris wrote:
>>>
>>>> All of the above are examples of poor code - from Int, INT, int to
>>>> MyVal, myVal, myval.
>>>
>>> Which you want to make legal, right?
>>
>> No.
>>
>> ...
>>
>>> That is evidently wrong. Why exactly
>>>
>>>     int INT;
>>>
>>> must be legal?
>>
>> I didn't say it should be.
>
> But it is. q.e.d.

It is legal in C - but if James doesn't want it to be legal in /his/
language, then it won't be.

>
>>>>> Moreover, tools for the case-sensitive languages like C++ do just
>>>>> the same. You cannot have reasonable names in C++ anymore. There
>>>>> would be lurking clang-format or SonarQube configured to force
>>>>> something a three year old suffering dyslexia would pen... (:-))
>>>>
>>>> As you suggest, for languages which ignore case extra tools are
>>>> needed to help tidy up the code.
>>>
>>> While 99% of all these tools were developed specifically for
>>> case-sensitive languages? Come on!
>>
>> It's a personal view but IMO a language should be independent of, and
>> should not rely on, IDEs or special editors.
>
> Yet you must rely on them in order to prevent:
>
>    int INT;
>

No, you don't - not if a language or compiler is designed to prevent
them. (And it can still be case-sensitive.)

David Brown

unread,
Nov 25, 2022, 3:45:03 AM11/25/22
to
It is unavoidable in any language, with any rules, that people will be
able to write confusing code, or that people will be able to make
mistakes that compilers and tools can't catch. No matter how smart you
make the language or the tools, that will /always/ be possible.

Thus there is no benefit in any discussion in stretching examples to
that point.

Given the code above, it is clear that it is not the language that is
flawed, or the tools, or the code - it is the programmer that is flawed.


>>      int Myval = 1;
>>      int myval = 2;
>>
>> In a case-sensitive language, it is legal but written by a >
>> intentionally bad programmer - and no matter how hard you try, bad
>> programmers will find a way to write bad code.  In a case-insensitive
>> language, it is an error written intentionally by a bad programmer.
>>
>> Give me the language that helps catch typos, not the language that is
>> happy with an inconsistent jumble.
>
>    declare
>       Myval : Integer := 1;
>       myval : Integer := 2;
>    begin
>
> This is illegal in Ada.

Great. Ada catches some mistakes. It lets others through. That's life
in programming.

>
>>>> but if the compiler folds case then programmers can /mistype/ names
>>>> accidentally, leading to the messy inconsistency mentioned above.
>
> A programmer cannot mistype names if the language is case-sensitive?
>

Sure - but at least some typos are more likely to be caught.

> Purely statistically your argument makes no sense. Since the set of
> unique identifiers in a case-insensitive language is by order of
> magnitude narrower, any probability of mess/error etc is also less under
> equivalent conditions.

"Purely statistically" you are talking drivel and comparing one
countably infinite set with a different countably infinite set.

There are some programmers who appear to pick identifiers by letting
their cat walk at random over the keyboard. Most don't. Programmers
mostly pick the same identifiers regardless of case sensitivity, and
mostly pick identifiers that differ in more than just case. Baring
abusing programmers, the key exception is idioms such as "Point point"
where "Point" is a type and "point" is an object of that type.

>
> The only reason to have case-sensitive identifiers is for having
> homographs = for producing mess.
>

No, it /avoids/ mess. And case insensitivity does not avoid homographs
- HellO and He110 are homographs in simple fonts, despite being
different identifiers regardless of case sensitivity. "Int" and "int"
are not homographs in any font. "Ρο" and "Po" are homographs,
regardless of case sensitivity, despite being completely different
Unicode identifiers (the first uses Greek letters).

("Homograph" means they look much the same, but are actually different -
not that the cases are different.)


The key benefit of case sensitivity is disallowing inconsistent cases,
rather than because it allows identifiers that differ in case.


>> I prefer mistypes to be considered errors where possible.
>
> And I gave more or less formal proof why case-insensitive languages are
> better here.
>

Really? I must have missed that the "more or less formal" proof. I saw
some arguments, and I don't disagree that case sensitivity has a
disadvantage in allowing some new kinds of intentional abuse. But
that's all.

>>> Moreover, tools for the case-sensitive languages like C++ do just the
>>> same. You cannot have reasonable names in C++ anymore. There would be
>>> lurking clang-format or SonarQube configured to force something a
>>> three year old suffering dyslexia would pen... (:-))
>>
>> Some people know how to use tools properly.
>
> These people don't buy them and thus do not count... (:-))
>

I don't follow. I make a point of learning how to use my tools as best
I can, whether they are commercial paid-for tools or zero cost price.

But if you mean that the programmers who could most benefit from good
tools to check style and code quality are precisely the ones that don't
use them, I agree. Usually they don't even have to buy them or acquire
them - they already have tools they could use, but don't use them properly.

If I were making a compiler, all its warnings would be on by default,
and you'd have to use the flag "-old-bad-code" to disable them.

David Brown

unread,
Nov 25, 2022, 3:55:49 AM11/25/22
to
On 24/11/2022 21:52, Bart wrote:
> On 24/11/2022 20:13, James Harris wrote:
>> On 24/11/2022 18:42, Bart wrote:

>> Nevertheless, I take your point. A programmer /could/ unwisely choose
>> to use names which differed only by the case of one letter.
>
> In C this happens /all the time/. It's almost a requirement. When I
> translated OpenGL headers, then many macro names shared the same
> identifers with functions if you took away case.
>

Please stop cherry-picking code you don't like and assuming all C code
is like that.

Oh, and examples like "Point point;" are idiomatic in C. That means
anyone who understands C will have no problem following the code. It is
not an issue or bad code. (This is unlike having both "myVar" and
"MyVar" as identifiers in the same scope - I think everyone agrees that
that /is/ bad code.)

>>
>> Here's a suggestion: make the language case sensitive and have the
>> compiler reject programs which give access to two names with no
>> changes other than case, such that Myvar and myvar could not be
>> simultaneously accessible.
>
> Apart from not being able to do this:
>
>     Colour colour;
>
> what would be the point of case sensitivity in this case? Or would the
> restriction not apply to types? What about a variable that clashes with
> a reserved word when case is ignored?
>
> (BTW my syntax can represent the above as:
>
>     `Colour colour
>
> The backtick is case-preserving, and also allows names that clash with
> reserved words. But I don't want to have to write that in code; this is
> for automatic translation tools, or for a one-off.)

(SQL has something similar. Table and column names inside quotation
marks are case sensitive and can contain spaces or match keywords.)


Case sensitivity is primarily about enforcing consistency, and only
secondarily about allowing identifiers that differ only in case.

As for enforcing rules that prevent identifiers that differ only in
case, there are many sub-options you could have. (James - these are
ideas and suggestions, not necessarily recommendations. You pick for
your language.)

You could have different namespaces for types, functions, and objects
(and maybe other entities). So you could have "Point" as a type and
"point" as an object, but not both "Point" and "point" as types or
objects. (It's not really any different from allowing an identifier for
a field in a structure to also be a function name - namespaces are vital.)

You could enforce a convention on capitalisation, such as types must
start with a capital and objects start with a lower case letter.
Whether you also allow "pointa" and "pointA" as separate objects is
another choice.

You could say capitals are only allowed at the start of an identifier,
or after an underscore - "pointA" is not allowed but "point_A" is.

Dmitry A. Kazakov

unread,
Nov 25, 2022, 4:13:19 AM11/25/22
to
On 2022-11-25 09:18, David Brown wrote:
> On 24/11/2022 22:35, Dmitry A. Kazakov wrote:
>> On 2022-11-24 19:55, James Harris wrote:
>>
>>> All of the above are examples of poor code - from Int, INT, int to
>>> MyVal, myVal, myval.
>>
>> Which you want to make legal, right? Again,
>>
>> If int and INT shall never mean two different entities, why do you let
>> them?
>
> If int and INT are poor style when referring to the same entities, why
> do you let them?

Any how exactly case-sensitiveness would not let them? So far the outcome:

case-insensitive: illegal
case-sensitive: OK

> But case sensitive makes accidental misuse far more likely to be
> caught by the compiler,

Example please. Typing 'i' instead of 'I' is not a misuse to me.

> Of course there are other options as well, which are arguably better
> than either of these.  One is to say you have to get the case right for
> consistency, but disallow identifiers that differ only in case.

You could do that. You can even say that identifiers must be in italics
and keywords in bold Arial and then apply all your arguments to font
shapes, sizes, orientation etc. Why not?

One of the reasons Ada did not do this and many filesystems as well,
because one might wish to be able to convert names to some canonical
form without changing the meaning. After all this is how the letter case
appeared in European languages in the first place - to beautify written
text.

If you do not misuse the concept that a program is a text, you should
have no problem with the idea that text appearance may vary. Never
changed IDE fonts? (:-))

>>> The difference is that in a case-sensitive language (such as C) a
>>> programmer would have deliberately to choose daft names to engineer
>>> the mess; whereas in a language which ignores case (such as Ada) the
>>> mess can come about accidentally, via typos.
>>
>> That is evidently wrong.
>
> What exactly is wrong about my statement?  "int INT;" is an example of
> deliberately daft names.

What's wrong with the name int? Let's take

integer Integer;

> Legislating against stupidity or malice is
> /very/ difficult.

There is no malice, it is quite common practice to do things like:

void Boo::Foo (Object * object) {
int This = this->idx;

etc.

  Legislating against accidents and inconsistency is
> easier, and a better choice.
>
>> Why exactly
>>
>>     int INT;
>>
>> must be legal?
>
> If a language can make such things illegal, great - but /not/ at the
> cost of making "int a; INT b; inT c = A + B;" legal.

I don't see any cost here, because int, INT, inT is the same word to me.
It boils down to how you choose identifiers. If an identifier is a
combination of dictionary words case/font/size-insensitivity is the most
natural choice. If the idea is to obfuscate the meaning, then it quickly
becomes pointless since there is no way you could defeat ill intents.

>>>> Moreover, tools for the case-sensitive languages like C++ do just
>>>> the same. You cannot have reasonable names in C++ anymore. There
>>>> would be lurking clang-format or SonarQube configured to force
>>>> something a three year old suffering dyslexia would pen... (:-))
>>>
>>> As you suggest, for languages which ignore case extra tools are
>>> needed to help tidy up the code.
>>
>> While 99% of all these tools were developed specifically for
>> case-sensitive languages? Come on!
>
> I see no problem with using extra tools, or extra compiler warnings, to
> improve code quality or catch errors.  Indeed, I am a big fan of them.
> As a fallible programmer I like all the help I can get, and I like it as
> early in the process as possible (such as smart editors or IDEs).

It is OK, James argued that these tools somewhat exist because of Ada's
case-insensitivity! (:-))

(To me a tool is an indicator of a problem, but that is another story)

David Brown

unread,
Nov 25, 2022, 4:24:17 AM11/25/22
to
On 24/11/2022 22:47, James Harris wrote:

> 1. Forward declarations should not be needed.
>

Usually not, for functions. But sometimes you will need them for
mutually recursive functions, and I think it makes sense to have some
kind of module interface definition with a list of declared functions
(and other entities). In other words, a function should not be exported
from a module just by writing "export" at the definition. You should
have an interface section with the declarations (like Pascal), or a
separate interface file (like Modula-2).

> 2. Parameter names should be part of the interface.
>
I agree - though not everyone does, so there are choices here too. Some
people like to write a declaration such as :

void rectangle(int top, int_left, int width, int height);

and then the definition :

void rectangle(t, l, w, h) { ... }


(In C, there is a big complication. When you need to be most flexible,
such as for standard libraries, you can't declare a function like "void
* malloc(size_t size);", because some twat might have defined a macro
called "size" before including the header. This means declarations
often have "funny" parameter names with underscores, or no name at all.
Obviously you will avoid this possibility in your own language!)


Sometimes a function will have a parameter whose value is not used - it
can be good to leave it unnamed. That can happen for historical reasons
as code changes over time. It can also be useful for "tag types" that
carry no value but are useful for typing, particularly in connection
with overloaded functions.

So you might have (this is rough C++ rather than C, since C does not
have function overloads) :

struct Rect { ... }; // To save writing top, left, etc.

struct Fill {}; // A type with no content
constexpr Fill fill; // An object of that type

struct NoFill {}; // A type with no content
constexpr NoFill nofill; // An object of that type


void draw_rectangle(Rect rect, Fill);
void draw_rectangle(Rect rect, NoFill);


Now the user picks the actual function by calling :

draw_rectangle(r, fill);

or

draw_rectangle(r, nofill);

This is clearer and less error-prone than using "bool fill" as a
parameter as it conveys more information explicitly when the function is
called.

But as a tag type with no value, there is no benefit in naming the
parameter - all the information is carried in the type at compile-time,
not in a value at run-time.

Dmitry A. Kazakov

unread,
Nov 25, 2022, 4:48:32 AM11/25/22
to
On 2022-11-25 09:43, David Brown wrote:
> On 24/11/2022 22:58, Dmitry A. Kazakov wrote:

>> I meant some fancy language where no declarations needed. But OK, take
>> this:
>>
>>     int MyVal = a;
>>     int myVal = MyVal + b;
>>
>> How do you know?
>
> It is unavoidable in any language, with any rules, that people will be
> able to write confusing code, or that people will be able to make
> mistakes that compilers and tools can't catch.  No matter how smart you
> make the language or the tools, that will /always/ be possible.

Sill, the above is illegal in Ada and legal in C.

>>> Give me the language that helps catch typos, not the language that is
>>> happy with an inconsistent jumble.
>>
>>     declare
>>        Myval : Integer := 1;
>>        myval : Integer := 2;
>>     begin
>>
>> This is illegal in Ada.
>
> Great.  Ada catches some mistakes.  It lets others through.  That's life
> in programming.

No. James want to give an example how case-insensitivity may introduce
bugs, and failed.

>>>>> but if the compiler folds case then programmers can /mistype/ names
>>>>> accidentally, leading to the messy inconsistency mentioned above.
>>
>> A programmer cannot mistype names if the language is case-sensitive?
>>
>
> Sure - but at least some typos are more likely to be caught.
>
>> Purely statistically your argument makes no sense. Since the set of
>> unique identifiers in a case-insensitive language is by order of
>> magnitude narrower, any probability of mess/error etc is also less
>> under equivalent conditions.
>
> "Purely statistically" you are talking drivel and comparing one
> countably infinite set with a different countably infinite set.

The probability theory deals with infinite sets. Sets must be
measurable, not countable.

But the set of identifiers is of course countable, since no human and no
FSM can deploy infinite identifiers.

> There are some programmers who appear to pick identifiers by letting
> their cat walk at random over the keyboard.  Most don't.  Programmers
> mostly pick the same identifiers regardless of case sensitivity, and
> mostly pick identifiers that differ in more than just case.  Baring
> abusing programmers, the key exception is idioms such as "Point point"
> where "Point" is a type and "point" is an object of that type.

It is a bad idiom. Spoken languages use articles and other grammatical
means to disambiguate classes and instances of. A programming language
may also have different name spaces for different categories of entities
(hello, first-class types, functions etc (:-)). Writing "Point point"
specifically in C++ is laziness, stupidity and abuse.

>> The only reason to have case-sensitive identifiers is for having
>> homographs = for producing mess.
>
> No, it /avoids/ mess.  And case insensitivity does not avoid homographs
> - HellO and He110 are homographs in simple fonts, despite being
> different identifiers regardless of case sensitivity.  "Int" and "int"
> are not homographs in any font.  "Ρο" and "Po" are homographs,
> regardless of case sensitivity, despite being completely different
> Unicode identifiers (the first uses Greek letters).
>
> ("Homograph" means they look much the same, but are actually different -
> not that the cases are different.)

Cannot avoid some homographs, let's introduce more?

> The key benefit of case sensitivity is disallowing inconsistent cases,
> rather than because it allows identifiers that differ in case.

How "point" is disallowed by being different from "Point"?

>>> Some people know how to use tools properly.
>>
>> These people don't buy them and thus do not count... (:-))
>>
>
> I don't follow.  I make a point of learning how to use my tools as best
> I can, whether they are commercial paid-for tools or zero cost price.
>
> But if you mean that the programmers who could most benefit from good
> tools to check style and code quality are precisely the ones that don't
> use them, I agree.  Usually they don't even have to buy them or acquire
> them - they already have tools they could use, but don't use them properly.
>
> If I were making a compiler, all its warnings would be on by default,
> and you'd have to use the flag "-old-bad-code" to disable them.

Ideally, you should not need a tool if your primary instrument (the
language) works well.

David Brown

unread,
Nov 25, 2022, 4:52:19 AM11/25/22
to
On 25/11/2022 10:13, Dmitry A. Kazakov wrote:
> On 2022-11-25 09:18, David Brown wrote:
>> On 24/11/2022 22:35, Dmitry A. Kazakov wrote:
>>> On 2022-11-24 19:55, James Harris wrote:
>>>
>>>> All of the above are examples of poor code - from Int, INT, int to
>>>> MyVal, myVal, myval.
>>>
>>> Which you want to make legal, right? Again,
>>>
>>> If int and INT shall never mean two different entities, why do you
>>> let them?
>>
>> If int and INT are poor style when referring to the same entities, why
>> do you let them?
>
> Any how exactly case-sensitiveness would not let them? So far the outcome:
>
> case-insensitive: illegal
> case-sensitive:   OK

You misunderstood my question.

You dislike case sensitivity because it lets you have two different
identifiers written "int" and "INT". That is a fair point, and a clear
disadvantage of case sensitivity.

But if you have a case insensitive language, it lets you write "int" and
"INT" for the /same/ identifier, despite written differences. That is a
clear disadvantage of case /insensitivity/.


>
>>  But case sensitive makes accidental misuse far more likely to be
>> caught by the compiler,
>
> Example please. Typing 'i' instead of 'I' is not a misuse to me.

If I accidentally type "I" instead of "i", a C compiler will catch the
error. "for (int i = 0; I < 10; i++) ..." It's an error in C.

>
>> Of course there are other options as well, which are arguably better
>> than either of these.  One is to say you have to get the case right
>> for consistency, but disallow identifiers that differ only in case.
>
> You could do that. You can even say that identifiers must be in italics
> and keywords in bold Arial and then apply all your arguments to font
> shapes, sizes, orientation etc. Why not?

Sorry, I was only giving sensible suggestions.

>
> One of the reasons Ada did not do this and many filesystems as well,
> because one might wish to be able to convert names to some canonical
> form without changing the meaning. After all this is how the letter case
> appeared in European languages in the first place - to beautify written
> text.

There is a very simple canonical form for ASCII text - leave it alone.
For Unicode, there is a standard normalisation procedure (converting
combining diacriticals into single combination codes where applicable).

Ada has its roots in a time when many programming languages were
all-caps, at least for their keywords, and significant computer systems
were still using punched cards, 6-bit character sizes, and other
limitations. If you wanted a language that could be used widely (and
that was one of Ada's aim) without special requirements, you had to
accept that some people would be using all-caps. At the same time, it
was clear by then that all-caps was ugly and people preferred to use
small letters when possible. The obvious solution was to make the
language case-insensitive, like many other languages of that time (such
as Pascal, which was a big influence for Ada). It was a /practical/
decision, not made because someone thought being case-insensitive made
the language inherently better.

>
> If you do not misuse the concept that a program is a text, you should
> have no problem with the idea that text appearance may vary. Never
> changed IDE fonts? (:-))
>
>>>> The difference is that in a case-sensitive language (such as C) a
>>>> programmer would have deliberately to choose daft names to engineer
>>>> the mess; whereas in a language which ignores case (such as Ada) the
>>>> mess can come about accidentally, via typos.
>>>
>>> That is evidently wrong.
>>
>> What exactly is wrong about my statement?  "int INT;" is an example of
>> deliberately daft names.
>
> What's wrong with the name int? Let's take
>
>    integer Integer;

OK, so I assume you now agree there was nothing wrong with my statement,
since you can't say what you thought was wrong.

>
>> Legislating against stupidity or malice is /very/ difficult.
>
> There is no malice, it is quite common practice to do things like:
>
>    void Boo::Foo (Object * object) {
>       int This = this->idx;
>
> etc.


As has been said, again and again, writing something like "Object
object" is a common idiom and entirely clear to anyone experienced as a
C or C++ programmer. It is less common to see a pointer involved
(idiomatic C++ would likely have "object" as a reference or const
reference here). I can't remember ever seeing a capitalised keyword
used as an identifier - it is /far/ from common practice. It counts as
stupidity, not malice.

>
>   Legislating against accidents and inconsistency is
>> easier, and a better choice.
>>
>>> Why exactly
>>>
>>>     int INT;
>>>
>>> must be legal?
>>
>> If a language can make such things illegal, great - but /not/ at the
>> cost of making "int a; INT b; inT c = A + B;" legal.
>
> I don't see any cost here, because int, INT, inT is the same word to me.

They are so visually distinct that there is a higher cognitive cost in
reading them - that makes them bad, even when you know they mean the
same thing. (The same applies to confusingly similar but distinct
identifiers, regardless of case sensitivity - they require more brain
effort to comprehend.) Higher cognitive costs translates to more
effort, lower productivity, and higher error rates - you make more
errors when you type, and you spot fewer errors when you read.

James Harris

unread,
Nov 25, 2022, 5:12:55 AM11/25/22
to
On 24/11/2022 22:00, Dmitry A. Kazakov wrote:
> On 2022-11-24 22:50, James Harris wrote:
>> On 24/11/2022 21:35, Dmitry A. Kazakov wrote:
>>> On 2022-11-24 19:55, James Harris wrote:
>>>
>>>> All of the above are examples of poor code - from Int, INT, int to
>>>> MyVal, myVal, myval.
>>>
>>> Which you want to make legal, right?
>>
>> No.
>>
>> ...
>>
>>> That is evidently wrong. Why exactly
>>>
>>>     int INT;
>>>
>>> must be legal?
>>
>> I didn't say it should be.
>
> But it is. q.e.d.

Not necessarily. As I said before, within a scope names which vary only
by case could be prohibited.

>
>>>>> Moreover, tools for the case-sensitive languages like C++ do just
>>>>> the same. You cannot have reasonable names in C++ anymore. There
>>>>> would be lurking clang-format or SonarQube configured to force
>>>>> something a three year old suffering dyslexia would pen... (:-))
>>>>
>>>> As you suggest, for languages which ignore case extra tools are
>>>> needed to help tidy up the code.
>>>
>>> While 99% of all these tools were developed specifically for
>>> case-sensitive languages? Come on!
>>
>> It's a personal view but IMO a language should be independent of, and
>> should not rely on, IDEs or special editors.
>
> Yet you must rely on them in order to prevent:
>
>    int INT;

No, the compiler could detect it. No need for special tools.


--
James Harris


James Harris

unread,
Nov 25, 2022, 5:31:41 AM11/25/22
to
On 25/11/2022 08:18, David Brown wrote:
> On 24/11/2022 22:35, Dmitry A. Kazakov wrote:

...

>> Why exactly
>>
>>     int INT;
>>
>> must be legal?
>
> If a language can make such things illegal, great - but /not/ at the
> cost of making "int a; INT b; inT c = A + B;" legal.

Well put! It's that kind of mess which makes me dislike the idea of a
language ignoring case. I don't understand how anyone can think that a
compiler actually allowing such a jumble and viewing it as legal is a
good idea.

Anyone reading or having to maintain code with such a mixture of cases
would be justified in thinking that either there was some unofficial
stropping scheme that he was supposed to adhere to or the programmer who
wrote it was sloppy.


--
James Harris


Dmitry A. Kazakov

unread,
Nov 25, 2022, 5:37:21 AM11/25/22
to
On 2022-11-25 10:52, David Brown wrote:
> On 25/11/2022 10:13, Dmitry A. Kazakov wrote:
>> On 2022-11-25 09:18, David Brown wrote:
>>> On 24/11/2022 22:35, Dmitry A. Kazakov wrote:
>>>> On 2022-11-24 19:55, James Harris wrote:
>>>>
>>>>> All of the above are examples of poor code - from Int, INT, int to
>>>>> MyVal, myVal, myval.
>>>>
>>>> Which you want to make legal, right? Again,
>>>>
>>>> If int and INT shall never mean two different entities, why do you
>>>> let them?
>>>
>>> If int and INT are poor style when referring to the same entities,
>>> why do you let them?
>>
>> Any how exactly case-sensitiveness would not let them? So far the
>> outcome:
>>
>> case-insensitive: illegal
>> case-sensitive:   OK
>
> You misunderstood my question.
>
> You dislike case sensitivity because it lets you have two different
> identifiers written "int" and "INT".  That is a fair point, and a clear
> disadvantage of case sensitivity.
>
> But if you have a case insensitive language, it lets you write "int" and
> "INT" for the /same/ identifier, despite written differences.  That is a
> clear disadvantage of case /insensitivity/.

Only when identifiers are not supposed to mean anything, which is not
how I want programs to be. So to me, in context of programming as an
activity to communicate ideas written in programs, this disadvantage
does not exist.

>>>  But case sensitive makes accidental misuse far more likely to be
>>> caught by the compiler,
>>
>> Example please. Typing 'i' instead of 'I' is not a misuse to me.
>
> If I accidentally type "I" instead of "i", a C compiler will catch the
> error.  "for (int i = 0; I < 10; i++) ..."  It's an error in C.

But this is no error to me, because there cannot be two different object
named i and I.

>>> Of course there are other options as well, which are arguably better
>>> than either of these.  One is to say you have to get the case right
>>> for consistency, but disallow identifiers that differ only in case.
>>
>> You could do that. You can even say that identifiers must be in
>> italics and keywords in bold Arial and then apply all your arguments
>> to font shapes, sizes, orientation etc. Why not?
>
> Sorry, I was only giving sensible suggestions.

Why distinction of case is sensible and distinction of fonts is not?

>> One of the reasons Ada did not do this and many filesystems as well,
>> because one might wish to be able to convert names to some canonical
>> form without changing the meaning. After all this is how the letter
>> case appeared in European languages in the first place - to beautify
>> written text.
>
> There is a very simple canonical form for ASCII text - leave it alone.

No, regarding identifiers the alphabet is not ASCII, never was. At best
you can say let identifiers be Latin letters plus some digits, maybe
some binding signs. ASCII provides means to encode, in particular, Latin
letters. Letters can be encoded in a great number of ways.

> For Unicode, there is a standard normalisation procedure (converting
> combining diacriticals into single combination codes where applicable).
>
> Ada has its roots in a time when many programming languages were
> all-caps, at least for their keywords, and significant computer systems
> were still using punched cards, 6-bit character sizes, and other
> limitations.  If you wanted a language that could be used widely (and
> that was one of Ada's aim) without special requirements, you had to
> accept that some people would be using all-caps.  At the same time, it
> was clear by then that all-caps was ugly and people preferred to use
> small letters when possible.  The obvious solution was to make the
> language case-insensitive, like many other languages of that time (such
> as Pascal, which was a big influence for Ada).  It was a /practical/
> decision, not made because someone thought being case-insensitive made
> the language inherently better.

Ada 83 style used bold lower case letters for keywords and upper case
letters for identifiers.

>>> Legislating against stupidity or malice is /very/ difficult.
>>
>> There is no malice, it is quite common practice to do things like:
>>
>>     void Boo::Foo (Object * object) {
>>        int This = this->idx;
>>
>> etc.
>
> As has been said, again and again, writing something like "Object
> object" is a common idiom and entirely clear to anyone experienced as a
> C or C++ programmer.

"it should be entirely clear for anyone..." is no argument.

> I can't remember ever seeing a capitalised keyword
> used as an identifier - it is /far/ from common practice.  It counts as
> stupidity, not malice.

Why using properly spelt words is stupidity? (:-))

>>    Legislating against accidents and inconsistency is
>>> easier, and a better choice.
>>>
>>>> Why exactly
>>>>
>>>>     int INT;
>>>>
>>>> must be legal?
>>>
>>> If a language can make such things illegal, great - but /not/ at the
>>> cost of making "int a; INT b; inT c = A + B;" legal.
>>
>> I don't see any cost here, because int, INT, inT is the same word to me.
>
> They are so visually distinct that there is a higher cognitive cost in
> reading them - that makes them bad, even when you know they mean the
> same thing.

Come on, they are not visually distinct, just open any book and observe
capital letters at the beginning of every sentence!

If there is any cost then keeping in mind artificially introduced
differences.

Dmitry A. Kazakov

unread,
Nov 25, 2022, 5:41:05 AM11/25/22
to
On 2022-11-25 11:12, James Harris wrote:
> On 24/11/2022 22:00, Dmitry A. Kazakov wrote:
>> On 2022-11-24 22:50, James Harris wrote:
>>> On 24/11/2022 21:35, Dmitry A. Kazakov wrote:
>>>> On 2022-11-24 19:55, James Harris wrote:
>>>>
>>>>> All of the above are examples of poor code - from Int, INT, int to
>>>>> MyVal, myVal, myval.
>>>>
>>>> Which you want to make legal, right?
>>>
>>> No.
>>>
>>> ...
>>>
>>>> That is evidently wrong. Why exactly
>>>>
>>>>     int INT;
>>>>
>>>> must be legal?
>>>
>>> I didn't say it should be.
>>
>> But it is. q.e.d.
>
> Not necessarily. As I said before, within a scope names which vary only
> by case could be prohibited.

You can introduce such rules, but so could a case-insensitive language
as well. The rule as it is agnostic to the choice.

>> Yet you must rely on them in order to prevent:
>>
>>     int INT;
>
> No, the compiler could detect it.

How? Without additional rules (see above) this is perfectly legal in a
case-sensitive language.

Bart

unread,
Nov 25, 2022, 6:51:03 AM11/25/22
to
On 25/11/2022 09:52, David Brown wrote:
> On 25/11/2022 10:13, Dmitry A. Kazakov wrote:

>> Any how exactly case-sensitiveness would not let them? So far the
>> outcome:
>>
>> case-insensitive: illegal
>> case-sensitive:   OK
>
> You misunderstood my question.
>
> You dislike case sensitivity because it lets you have two different
> identifiers written "int" and "INT".  That is a fair point, and a clear
> disadvantage of case sensitivity.

But this happens in real code. For example `enum (INT, FLOAT, DOUBLE)`,
plus of course `Image image`.


> But if you have a case insensitive language, it lets you write "int" and
> "INT" for the /same/ identifier, despite written differences.  That is a
> clear disadvantage of case /insensitivity/.

This could happen in real code, but it very rarely does.

So one is a real disadvantage, the other only a perceived one. Here's
another issue:

zfail
zFar
zNear
zpass

These don't clash. But there are two patterns here: small z following by
either a capitalised word or non-capitalised. How do you remember which
is which? With case-sensitive, you /have/ to get it right.

With case-insensitive, if these identifiers were foisted on you, you can
choose to use more consistent capitalisation.

>>>  But case sensitive makes accidental misuse far more likely to be
>>> caught by the compiler,
>>
>> Example please. Typing 'i' instead of 'I' is not a misuse to me.
>
> If I accidentally type "I" instead of "i", a C compiler will catch the
> error.  "for (int i = 0; I < 10; i++) ..."  It's an error in C.

(This isn't:)

for (int i=0; i<M; ++i)
for (int j=0; j<N; ++i)

I can't reproduce your exact example, because my loop headers only
mention the index once, but I can write this:

for i to n do
a[I] := 0
end

It's not an error, so no harm done. At some point it will be noticed
that one of those has the wrong case, and it will be fixed.

It is a complete non-issue.

> There is a very simple canonical form for ASCII text - leave it alone.
> For Unicode, there is a standard normalisation procedure (converting
> combining diacriticals into single combination codes where applicable).
>
> Ada has its roots in a time when many programming languages were
> all-caps, at least for their keywords, and significant computer systems
> were still using punched cards, 6-bit character sizes, and other
> limitations.  If you wanted a language that could be used widely (and
> that was one of Ada's aim) without special requirements, you had to
> accept that some people would be using all-caps.  At the same time, it
> was clear by then that all-caps was ugly


Yes, it is, in the wrong font. Which I take advantage of by writing
debugging code in all-caps. Even commented out, it stands out so it is
clear which comments could be deleted, and which comments contain code
that is temporarily out of use or not ready.

I also tend to write such code unindented, which further highlights it
and saves effort. But I couldn't do that in Python: all code must be the
proper case, and properly indented. There is no redundancy at all.


> and people preferred to use
> small letters when possible.  The obvious solution was to make the
> language case-insensitive, like many other languages of that time (such
> as Pascal, which was a big influence for Ada).  It was a /practical/
> decision, not made because someone thought being case-insensitive made
> the language inherently better.

Where did the fad for case-sensitivity really come from? Was it someone
just being lazy, because in a lexer it is easier to process A-Z and a-z
as distinct letters rather than convert to some canonical form (eg. I
store as all-lower-case).

Not just in source, but for text everywhere in a computer system, even
user-facing code. But I guess in the early days, everyone involved would
be some technical member of staff. The trouble that eventually, these
systems would have non-technical users.

I'd imagine that not many men-in-the-street directly used Unix in the
70s and 80s, so I don't know what they would have made of
case-sensitivity eveywhere.

But millions of ordinary people did use OSes like CP/M and DOS, which
thank god were case-insensitive.

I spent a year or two doing technical support on the phone for customers
using our computers; I dread what things could have been like!

Now ordinary users normally use GUIs, gestures etc, even voice, which
largely insulates them from the underlying case-sensitivity of the
machine (which don't tell me, is based on Linux and written in C).

As I said elsewhere, normal people only really come across it with
passwords, or the latter part of URLs since those are generally part of
a Linux file path.

But I think it is generally understood that case-sensitivity is bad for
ordinalry users.

>> There is no malice, it is quite common practice to do things like:
>>
>>     void Boo::Foo (Object * object) {
>>        int This = this->idx;
>>
>> etc.
>
>
> As has been said, again and again, writing something like "Object
> object" is a common idiom and entirely clear to anyone experienced as a
> C or C++ programmer.  It is less common to see a pointer involved
> (idiomatic C++ would likely have "object" as a reference or const
> reference here).  I can't remember ever seeing a capitalised keyword
> used as an identifier - it is /far/ from common practice.  It counts as
> stupidity, not malice.

This is from a project called "c4", a C compiler in 500 lines and 4
functions:

enum {
Num = 128, Fun, Sys, Glo, Loc, Id,
Char, Else, Enum, If, Int, Return, Sizeof, While,
Assign, Cond, Lor, Lan, Or, Xor, And, Eq, Ne, Lt, Gt, Le, Ge,
Shl, Shr, Add, Sub, Mul, Div, Mod, Inc, Dec, Brak
};

enum { CHAR, INT, PTR };

Both Int and INT are used.

James Harris

unread,
Nov 25, 2022, 6:55:51 AM11/25/22
to
On 24/11/2022 22:51, Bart wrote:
> On 24/11/2022 21:47, James Harris wrote:
>> On 24/11/2022 20:52, Bart wrote:
>
>> Yes, although it's a function declaration; the presumably incorrectly
>> typed identifier zSQL is ignored.
>
> We don't know the purpose of zSQL. But the point is it is there, a
> slightly differently-case version of with the same name, which I can't
> for the life of me recall right now. That is the problem.

You brought up sqlite3.c but if it is over 100,000 lines of code in one
file (Andy would not care much for it...) I'm not sure that it's a valid
example for anything!

Nevertheless, you mentioned the purpose of the capitalised zSQL. It
appears only at the end of the declaration of sqlite3_declare_vtab:

SQLITE_API SQLITE_EXPERIMENTAL int sqlite3_declare_vtab(sqlite3*,
const char *zSQL);

SQLITE_EXPERIMENTAL is defined to be blank so the declaration matches
with the definition

SQLITE_API int sqlite3_declare_vtab(sqlite3 *db, const char
*zCreateTable){

The final parameter is still a const char *. AIUI the name in the
prototype, the zSQL you mentioned as anomalous, is ignored which is
presumably why the programmer left it with a case mismatch.

So the problem here is not case but that in such circumstances C
requires forward declarations and, because the parameter names are
ignored, the compiler does not have to check for a mismatch.


>
> (If I look back, it is zSql. But if now encounter these even in 10
> minutes time, which one would be which? I would forget.)
>
>
>
>> Similar could be said for any names which differed only slightly.
>
> Say short, Short and SHORT out loud; any difference?

Yes, they get louder. ;-)

>
> You're debugging some code and need to print out the value of hmenu. Or
> is hMenu or Hmenu? Personally I am half-blind to case usage because it
> so commonly irrelevant and ignored in English.

Do you not use a consistent scheme? Perhaps when a compiler ignores case
it encourages programmers to be inconsistent. I don't think inconsistent
capitalisation is a good thing but I accept that YMMV.

>
> I would be constantly double-checking and constantly getting it wrong
> too. And that's with just one of these three in place.
>
> Differences in spelling are another matter; I'm a good speller.
>
> You might notice when the cup you're handed in Starbucks has Janes
> rather than James and would want to check it is yours; but you probably
> wouldn't care if its james or James or JAMES because that is just style.
> You know they are all the same name.

I wouldn't write all three variants in one program. Programmers should
be consistent, IMO, and the compiler should check.

>
> But also, just because people can make typos by pressing the wrong
> letter or being the wrong length doesn't make allowing 2**N more
> incorrect possibilities acceptable.

With case-sensitive languages programmers just need to follow sensible
conventions and to keep to them consistently.

>
>>> In C this happens /all the time/. It's almost a requirement. When I
>>> translated OpenGL headers, then many macro names shared the same
>>> identifers with functions if you took away case.
>>
>> One cannot stop programmers doing daft things. For example, a
>> programmer could declare names such as
>>
>>    CreateTableForwandReference
>>
>> and
>>
>>    createtableforwardrefarence
>>
>> The differences are not obvious.
>
> So to fix it, we allow
>
>     CreateTableForwardRefarence
>
>     createtableforwandreference
>
> as synonyms? Case sensitive, you have subtle differences in letters
> /plus/ subtle differences in case!

You have not solved the problem. It's still impossible to prevent evil
programmers from writing confusing code if they are determined to do so
- or are administrators rather than programmers!

>
>> Maybe the best a language designer can do for cases such as this is to
>> help reduce the number of different names a programmer would have to
>> define in any given location.
>
> Given the various restrictions you've mentioned that you'd want even
> with case sensitive names, is there any point to having case
> sensitivity? What would it be used for; what would it allow?

Yes. The problem I have with case insensitivity is that it /allows/
inconsistent coding. I'd rather the compiler refused to compile code
which uses myVar, MyVar, myvar and MYVAR for the same thing.

Case can be indicative. For example, consider the identifier

barbend

Does it mean the bend in a bar or the end of a barb...? With
capitalisation or underscores (or hyphens in languages which support
them) the meaning can be made clear.

>
> I had a scripting language that shipped with my applications. While case
> insensitive, I would usually write keywords in lower case as if, then,
> while.
>
> But some users would write If, Then, While, and make more use in
> identifiers of mixed case. And they would capitalise global variables
> that I defined in lower case.
As I said before, a compiler can prohibit names which fold to the same
string. (Whether they should do or not is an open question, however.)


--
James Harris


Bart

unread,
Nov 25, 2022, 7:08:31 AM11/25/22
to
On 25/11/2022 09:24, David Brown wrote:
> On 24/11/2022 22:47, James Harris wrote:
>
>> 1. Forward declarations should not be needed.
>>
>
> Usually not, for functions.  But sometimes you will need them for
> mutually recursive functions,

Not even then. Modern languages seems to deal with out-of-order
functions without needing special declarations.

> and I think it makes sense to have some
> kind of module interface definition with a list of declared functions
> (and other entities).  In other words, a function should not be exported
> from a module just by writing "export" at the definition.

Why not?

>  You should
> have an interface section with the declarations (like Pascal), or a
> separate interface file (like Modula-2).

Then you have the same information repeated in two places.

If you need a summary of the interface without exposing the
implementaton, this can be done automatically by a compiler, which can
be done for those functions marked with 'export'.

(In my languages, which use whole-program compilers, such an exports
file is only needed to export names from the whole program, when it
forms a complete library.

Plus I need to create such a file to create bindings in my language to
FFI libraries. But there I don't have the sources of those libraries)

>> 2. Parameter names should be part of the interface.
>>
> I agree - though not everyone does, so there are choices here too.

Since in my languages you only ever specify the function header in one
place - where it's defined - parameter names are mandatory. And there is
only ever one set.

  Some
> people like to write a declaration such as :
>
>     void rectangle(int top, int_left, int width, int height);
>
> and then the definition :
>
>     void rectangle(t, l, w, h) { ... }

That's how my systems language worked for 20 years. I find it
astonishing now that I tolerated it for so long.

Well, actually not quite: the declaration listed only types; the
definition only names. Names in the declaration had no use.


James Harris

unread,
Nov 25, 2022, 7:13:57 AM11/25/22
to
On 25/11/2022 10:41, Dmitry A. Kazakov wrote:
> On 2022-11-25 11:12, James Harris wrote:
>> On 24/11/2022 22:00, Dmitry A. Kazakov wrote:
>>> On 2022-11-24 22:50, James Harris wrote:
>>>> On 24/11/2022 21:35, Dmitry A. Kazakov wrote:
>>>>> On 2022-11-24 19:55, James Harris wrote:
>>>>>
>>>>>> All of the above are examples of poor code - from Int, INT, int to
>>>>>> MyVal, myVal, myval.
>>>>>
>>>>> Which you want to make legal, right?
>>>>
>>>> No.
>>>>
>>>> ...
>>>>
>>>>> That is evidently wrong. Why exactly
>>>>>
>>>>>     int INT;
>>>>>
>>>>> must be legal?
>>>>
>>>> I didn't say it should be.
>>>
>>> But it is. q.e.d.
>>
>> Not necessarily. As I said before, within a scope names which vary
>> only by case could be prohibited.
>
> You can introduce such rules, but so could a case-insensitive language
> as well. The rule as it is agnostic to the choice.

No, I'm arguing for consistency - and consistency that can be enforced
by the compiler. The thing I dislike is the inconsistency allowed by
case insensitivity.

>
>>> Yet you must rely on them in order to prevent:
>>>
>>>     int INT;
>>
>> No, the compiler could detect it.
>
> How? Without additional rules (see above) this is perfectly legal in a
> case-sensitive language.

As I say, names which fold to the same string can be detected and
prohibited by the compiler.


--
James Harris


Bart

unread,
Nov 25, 2022, 7:40:31 AM11/25/22
to
On 25/11/2022 10:31, James Harris wrote:
> On 25/11/2022 08:18, David Brown wrote:
>> On 24/11/2022 22:35, Dmitry A. Kazakov wrote:
>
> ...
>
>>> Why exactly
>>>
>>>     int INT;
>>>
>>> must be legal?
>>
>> If a language can make such things illegal, great - but /not/ at the
>> cost of making "int a; INT b; inT c = A + B;" legal.
>
> Well put! It's that kind of mess which makes me dislike the idea of a
> language ignoring case.

But, that never happens! If it does, you can change it to how you like;
the program still works; that is the advantage.

C allows that line to be written like this:

i\
n\
t\

a\
;\

i\
n\
t\

b\
;\

i\
n\
t\

c
=
a
+
b;


In C, any *token* can be split across multiple lines using line
continuation, even ones like '//', and string literals (and in the
middle of string escape sequences).

But have you ever seen that? Should you disallow this feature?

Plus, you can make that original line legal in C too:

#define INT int
#define inT int
#define A a
#define B a

int a; INT b; inT c = A + B;

So, what do we ban here? C also allows this:

Point:; struct Point Point;

Here you don't even need to change case!

All sorts of nonsense can written legally, some of it more dangerous
than being lax about letter case.

You might know that A[i] can be written as i[A]; did you know you can
also write i[A][A][A][A][A][A][A][A]?

Did you know that you can write a simple function pointer as:

void(*)(void)

Oh, hang on, that's how C works anyway!


> I don't understand how anyone can think that a
> compiler actually allowing such a jumble and viewing it as legal is a
> good idea.

Because it is blind to case? In the same way it doesn't see extra or
misleading white space which can lead to even worse jumbles.

The solution is easy: just make your language case-sensitive if that is
your preference.

Others may make theirs case-insensitive. Because it doesn't look like
anyone is going to change their mind about this stuff.

It is loading more messages.
0 new messages