Best term for a pointer which is null/nil/none etc

James Harris

unread,

Aug 16, 2021, 11:37:43 AM8/16/21

to

What's the best term for what might be called a null or nil pointer? In
a recent thread it turned out that there were various preferences and
various names that people are familiar with.

I am thinking to use as a keyword something like one of these:

null
nil
nullptr
ptr_null
none
empty
void
nothing
nowhere

As context imagine that you wanted to initialise a child node n with

n.left = X
n.right = X
n.data = 0

where X is one of the keywords above or some other that I've not listed.
The question is, which X would be best?

If it doesn't muddy the waters too much I should say that in addition to
'a pointer to no object' I /might/ also need a name for a pointer value
which has not been defined. If I do go that way then I'll need not one
but two names: one for a pointer which has been set to point to no
object and one for a pointer which has never been initialised. That's
why I added ptr_null into the above list - so that there would also be a
similar-looking ptr_undef or ptr_undefined. If you prefer, say, 'none'
for an explicitly set pointer to no object what name would you use for
undefined?

As a related matter, what capitalisation do you prefer for
language-defined constants such as the above and for 'true' and 'false'?
Do you prefer to see them have all lower case, all upper case, or to
capitalise just the first letter?

I know that a particular name is a minor matter but as I have to choose
I wondered what you guys find most intuitive or natural. Perhaps that
depends on whether one is thinking of the pointer or the referent. For
example, if one is thinking of the pointer then

nil

might be most natural as in "the pointer /is/ nil" whereas if one is
thinking of the referent then

nothing

may be better in the sense that "the pointer /is pointing at/ nothing".
Not sure.

Either way, how do the options look to you and what keyword or keywords
would you prefer to see in a piece of code? Also, are there any you
really dislike?!

--
James Harris

Bart

unread,

Aug 16, 2021, 12:41:15 PM8/16/21

to

On 16/08/2021 16:37, James Harris wrote:
> What's the best term for what might be called a null or nil pointer? In
> a recent thread it turned out that there were various preferences and
> various names that people are familiar with.
>
> I am thinking to use as a keyword something like one of these:
>
> null
> nil
> nullptr
> ptr_null
> none
> empty
> void
> nothing
> nowhere

In static code I use 'nil' as a built-in named constant of type 'ref
void' (void* in C), which is compatible with any pointer type.

There is also 'empty' and 'clear', which are interchangeable and can be
used as nouns or verbs, but only in the context of initialising a
variable or assigning to an expression; it is not a value:

int a:=empty
clear a # same as a:=empty
empty a # same as a:=empty

This was intended for array/record types, but works with any type
including pointers, where it will set them to nil.

It will clear the object to all-zeros, so nil must be all-zeros too.

In dynamic code, I also have 'void', but this simply means 'unassigned'.
All objects start off as void, but they can be set manually back to void
too:

a := 100
a := void

('void' is actually a type, but for convenience, it is treated as a
value - of type void - in source code. I have to use void.type for the
other meaning.)

>
> If it doesn't muddy the waters too much I should say that in addition to
> 'a pointer to no object' I /might/ also need a name for a pointer value
> which has not been defined.

See my 'void' above. However that only applies to dynamic code.
Elsewhere I would need to invent some suitable value:

int dummy # outside a function
ref void undefined = &dummy

ref byte p := undefined # inside a function

if p=undefined then ...

The undefined value should work for any pointer type.

> As a related matter, what capitalisation do you prefer for
> language-defined constants such as the above and for 'true' and 'false'?

That's up to my 'users'. My languages are case-insensitive, so they can
can choose truE and falsE if they like.

> I know that a particular name is a minor matter but as I have to choose
> I wondered what you guys find most intuitive or natural. Perhaps that
> depends on whether one is thinking of the pointer or the referent. For
> example, if one is thinking of the pointer then
>
> nil
>
> might be most natural as in "the pointer /is/ nil" whereas if one is
> thinking of the referent then
>
> nothing

You can give a choice maybe? Allow both null and nil for example.

> Either way, how do the options look to you and what keyword or keywords
> would you prefer to see in a piece of code? Also, are there any you
> really dislike?!

ptr_null

Anything with embedded underscore (shifted on my keyboard) in general.

David Brown

unread,

Aug 17, 2021, 5:47:05 AM8/17/21

to

It might depend on how you use the pointer in the language. For
languages that implicitly dereference pointers to objects, something
denoting "nothing", "none", or "empty" makes sense - by writing "p =
empty" you are saying that the object referred to by "p" is empty or
non-existent. (For prior art, Python uses "None".)

For languages where you really think of "p" as a a pointer, and are
interested in the pointer rather than just the thing it points to,
something denoting "zero" is the popular choice - "null" and "nil" are
commonly used, with "null" being a little more common AFAICS. (C++ now
uses "nullptr", but that's because they needed a new name and "null" was
taken, and they didn't want something that was likely to be an existing
identifier.)

Some languages allow you to think in both ways - having both pointers
and references.

>
> As a related matter, what capitalisation do you prefer for
> language-defined constants such as the above and for 'true' and 'false'?
> Do you prefer to see them have all lower case, all upper case, or to
> capitalise just the first letter?
>

I personally dislike anything being in all-caps. I prefer keywords to
be small letters.

Sometimes a language has a system (either voluntarily by convention, or
enforced by the language) with identifiers being different categories
depending on whether they start with a capital or with a small letter.

One thing I would advise against is making a language case insensitive -
that's just a license for programmers to be inconsistent and confusing.

James Harris

unread,

Aug 17, 2021, 10:29:01 AM8/17/21

to

On 16/08/2021 17:41, Bart wrote:
> On 16/08/2021 16:37, James Harris wrote:

>> What's the best term for what might be called a null or nil pointer?

...

>> I know that a particular name is a minor matter but as I have to
>> choose I wondered what you guys find most intuitive or natural.
>> Perhaps that depends on whether one is thinking of the pointer or the
>> referent. For example, if one is thinking of the pointer then
>>
>> nil
>>
>> might be most natural as in "the pointer /is/ nil" whereas if one is
>> thinking of the referent then
>>
>> nothing
>
>
> You can give a choice maybe? Allow both null and nil for example.

Am not a fan of arbitrary choices. They can lead to friction if one
programmer has to maintain the code written by another and their
personal preferences differ. I think I would need to pick one word.

>
>
>> Either way, how do the options look to you and what keyword or
>> keywords would you prefer to see in a piece of code? Also, are there
>> any you really dislike?!
>
> ptr_null
>
> Anything with embedded underscore (shifted on my keyboard) in general.

OK. Is it the shift keying you don't like? I know you write expressions
with no spaces either side of operators but if

ptr-null

were a single permitted name (which didn't require shift) how would your
view change, if at all?

--
James Harris

James Harris

unread,

Aug 17, 2021, 10:43:46 AM8/17/21

to

On 17/08/2021 10:47, David Brown wrote:
> On 16/08/2021 17:37, James Harris wrote:

>> What's the best term for what might be called a null or nil pointer?

...

> It might depend on how you use the pointer in the language. For
> languages that implicitly dereference pointers to objects, something
> denoting "nothing", "none", or "empty" makes sense - by writing "p =
> empty" you are saying that the object referred to by "p" is empty or
> non-existent. (For prior art, Python uses "None".)

Maybe that's true even without automatic dereferencing. For example, in
C one might write

n->left = newnode;

where newnode is really a pointer. Correspondingly, in

n->right = Nothing

In that, Nothing would also be a pointer even though the form is
ostensibly saying that there's nothing on the right rather than that the
right-side pointer is a certain value.

>
> For languages where you really think of "p" as a a pointer, and are
> interested in the pointer rather than just the thing it points to,
> something denoting "zero" is the popular choice - "null" and "nil" are
> commonly used, with "null" being a little more common AFAICS. (C++ now
> uses "nullptr", but that's because they needed a new name and "null" was
> taken, and they didn't want something that was likely to be an existing
> identifier.)

OK.

>
> Some languages allow you to think in both ways - having both pointers
> and references.

Something to come back to!

>
>>
>> As a related matter, what capitalisation do you prefer for
>> language-defined constants such as the above and for 'true' and 'false'?
>> Do you prefer to see them have all lower case, all upper case, or to
>> capitalise just the first letter?
>>
>
> I personally dislike anything being in all-caps. I prefer keywords to
> be small letters.

OK.

It occurs to me that one area where an initial cap can be useful is when
including keywords in written text. For example, if I write that
something is False with an initial capital letter then it more clearly
shows that I am referring to a keyword rather than to a constant or a
concept. That would make the keywords

True
False
None (or Nothing or Null or Nil etc)

ISTM that (as I think you suggested) they look better than all caps.

--
James Harris

Bart

unread,

Aug 17, 2021, 11:51:32 AM8/17/21

to

On 17/08/2021 15:28, James Harris wrote:
> On 16/08/2021 17:41, Bart wrote:
>> On 16/08/2021 16:37, James Harris wrote:
>
>>> What's the best term for what might be called a null or nil pointer?
>
> ...
>
>>> I know that a particular name is a minor matter but as I have to
>>> choose I wondered what you guys find most intuitive or natural.
>>> Perhaps that depends on whether one is thinking of the pointer or the
>>> referent. For example, if one is thinking of the pointer then
>>>
>>> nil
>>>
>>> might be most natural as in "the pointer /is/ nil" whereas if one is
>>> thinking of the referent then
>>>
>>> nothing
>>
>>
>> You can give a choice maybe? Allow both null and nil for example.
>
> Am not a fan of arbitrary choices. They can lead to friction if one
> programmer has to maintain the code written by another and their
> personal preferences differ. I think I would need to pick one word.

C allows both NULL and 0. Plus any expression that yields 0.

>>
>>
>>> Either way, how do the options look to you and what keyword or
>>> keywords would you prefer to see in a piece of code? Also, are there
>>> any you really dislike?!
>>
>> ptr_null
>>
>> Anything with embedded underscore (shifted on my keyboard) in general.
>
> OK. Is it the shift keying you don't like? I know you write expressions
> with no spaces either side of operators but if
>
> ptr-null
>
> were a single permitted name (which didn't require shift) how would your
> view change, if at all?

ptr-null is better. Although I'd start wondering why you need the 'ptr'
part, if 'null' is not used in other contexts.

James Harris

unread,

Aug 17, 2021, 12:18:12 PM8/17/21

to

On 17/08/2021 16:51, Bart wrote:
> On 17/08/2021 15:28, James Harris wrote:
>> On 16/08/2021 17:41, Bart wrote:
>>> On 16/08/2021 16:37, James Harris wrote:

>>>> What's the best term for what might be called a null or nil pointer?

...

>>> You can give a choice maybe? Allow both null and nil for example.
>>
>> Am not a fan of arbitrary choices. They can lead to friction if one
>> programmer has to maintain the code written by another and their
>> personal preferences differ. I think I would need to pick one word.
>
> C allows both NULL and 0. Plus any expression that yields 0.

Indeed, though I'm not planning to copy that approach. I'd probably
prohibit comparisons against zero. Something like the following.

if p eq 0 [prohibited]
if p eq Undef
if p eq Nil
if p [true if p is valid (i.e. neither Undef nor Nil)]

In those, Undef would be all-bits-zero but would be of a type which
could be compared against a pointer whereas an integer could not.
Further, if p were to be converted to False/True as in the last line
then False would mean "either Undef or Nil".

This is all speculation at the moment. Am just throwing around some ideas.

>
>>>
>>>> Either way, how do the options look to you and what keyword or
>>>> keywords would you prefer to see in a piece of code? Also, are there
>>>> any you really dislike?!
>>>
>>> ptr_null
>>>
>>> Anything with embedded underscore (shifted on my keyboard) in general.
>>
>> OK. Is it the shift keying you don't like? I know you write
>> expressions with no spaces either side of operators but if
>>
>> ptr-null
>>
>> were a single permitted name (which didn't require shift) how would
>> your view change, if at all?
>
> ptr-null is better.

OK.

>
> Although I'd start wondering why you need the 'ptr'
> part, if 'null' is not used in other contexts.

I was thinking that if I had more than one reserved pointer value that
it may be better to give them a common form. Instead of, for example,

Undef
Nil

there would be

ptr-undef
ptr-nil

That would conceptually group similar constants together and take fewer
words away from those the programmer could define.

If there ends up being just one reserved pointer value then it would not
be a good idea.

If there end up being just two reserved pointer values then it may or
may not be worth it.

But if I were to end up adding a number of other reserved pointer values
then it might be better for the language overall if they were to have a
common appearance.

--
James Harris

David Brown

unread,

Aug 17, 2021, 12:56:11 PM8/17/21

to

On 17/08/2021 17:51, Bart wrote:
> On 17/08/2021 15:28, James Harris wrote:
>> On 16/08/2021 17:41, Bart wrote:
>>> On 16/08/2021 16:37, James Harris wrote:
>>
>>>> What's the best term for what might be called a null or nil pointer?
>>
>> ...
>>
>>>> I know that a particular name is a minor matter but as I have to
>>>> choose I wondered what you guys find most intuitive or natural.
>>>> Perhaps that depends on whether one is thinking of the pointer or
>>>> the referent. For example, if one is thinking of the pointer then
>>>>
>>>> nil
>>>>
>>>> might be most natural as in "the pointer /is/ nil" whereas if one is
>>>> thinking of the referent then
>>>>
>>>> nothing
>>>
>>>
>>> You can give a choice maybe? Allow both null and nil for example.
>>
>> Am not a fan of arbitrary choices. They can lead to friction if one
>> programmer has to maintain the code written by another and their
>> personal preferences differ. I think I would need to pick one word.
>
> C allows both NULL and 0. Plus any expression that yields 0.
>

/Every/ language allows arbitrary choices in all sorts of places. That
does not mean you have to encourage it from the outset. James is right
here - it matters little whether he picks "null" or "nil", but either is
far better than having both.

>>>
>>>
>>>> Either way, how do the options look to you and what keyword or
>>>> keywords would you prefer to see in a piece of code? Also, are there
>>>> any you really dislike?!
>>>
>>> ptr_null
>>>
>>> Anything with embedded underscore (shifted on my keyboard) in general.
>>
>> OK. Is it the shift keying you don't like? I know you write
>> expressions with no spaces either side of operators but if
>>
>> ptr-null
>>
>> were a single permitted name (which didn't require shift) how would
>> your view change, if at all?
>
> ptr-null is better. Although I'd start wondering why you need the 'ptr'
> part, if 'null' is not used in other contexts.
>

Programmers read code far more than they type it. If the programming
language designer here thinks "ptr_null" is the clearest way for a null
pointer to be expressed in the language, then that preference totally
dominates over one single person's complaints about the hardships of
using the shift key.

And Bart, if underscore is so diffult for you (perhaps you have
arthritis or other challenges), I'd recommend looking at different
keyboards, or enabling "sticky shift keys" or similar aids supported by
your OS of choice.

Certainly the use of "shift" is not relevant to language design.

Bart

unread,

Aug 17, 2021, 4:00:45 PM8/17/21

to

On 17/08/2021 17:56, David Brown wrote:
> On 17/08/2021 17:51, Bart wrote:
>> On 17/08/2021 15:28, James Harris wrote:
>>> On 16/08/2021 17:41, Bart wrote:
>>>> On 16/08/2021 16:37, James Harris wrote:
>>>
>>>>> What's the best term for what might be called a null or nil pointer?
>>>
>>> ...
>>>
>>>>> I know that a particular name is a minor matter but as I have to
>>>>> choose I wondered what you guys find most intuitive or natural.
>>>>> Perhaps that depends on whether one is thinking of the pointer or
>>>>> the referent. For example, if one is thinking of the pointer then
>>>>>
>>>>> nil
>>>>>
>>>>> might be most natural as in "the pointer /is/ nil" whereas if one is
>>>>> thinking of the referent then
>>>>>
>>>>> nothing
>>>>
>>>>
>>>> You can give a choice maybe? Allow both null and nil for example.
>>>
>>> Am not a fan of arbitrary choices. They can lead to friction if one
>>> programmer has to maintain the code written by another and their
>>> personal preferences differ. I think I would need to pick one word.
>>
>> C allows both NULL and 0. Plus any expression that yields 0.
>>
>
> /Every/ language allows arbitrary choices in all sorts of places. That
> does not mean you have to encourage it from the outset. James is right
> here - it matters little whether he picks "null" or "nil", but either is
> far better than having both.

I sometimes allow a choice if I can't make up my mind about a feature or
a keyword. Then I can try it out and see which one feels better or looks
better, or which is used more often. Or I might use one form privately,
and another for shared code.

Here however, both null and nil are commonly used in programming
languages for the same thing. So why not allow both? If someone uses two
languages, one uses NULL, the other nil, if would be really convenient
to not have to keep thinking about which one you should be using.

(Although having said that, mine don't allow null! But then I am the
only user.)

>
> And Bart, if underscore is so diffult for you (perhaps you have
> arthritis or other challenges), I'd recommend looking at different
> keyboards, or enabling "sticky shift keys" or similar aids supported by
> your OS of choice.
>
> Certainly the use of "shift" is not relevant to language design.

Simplest of all is having alphunumeric identifiers not requiring you to
pause in the middle to deal with case or shift changes.

This especially applies to keywords that you can't do anything about.

So I'd say it's very relevant to not having a language that is a pita to
use.

David Brown

unread,

Aug 17, 2021, 4:59:34 PM8/17/21

to

I do think "True" is better than "TRUE". But I think "true" is best :-)

Initial capitals won't mark a keyword unless you use capitals for /all/
keywords, and that will quickly get tedious and ugly. It's better to
use syntax highlighting in an editor that will mark the keywords in some
way (such as bold, or a particular colour). When writing code by hand,
I usually underline the keywords for clarity - but I wouldn't want to
use initial capitals.

David Brown

unread,

Aug 17, 2021, 5:08:45 PM8/17/21

to

Keeping the choice open while prototyping, developing and testing makes
sense - that's fair enough. But once your language has solidified
somewhat, then it's good to fix these things. (Though there is always a
trade-off between keeping consistency between versions and being able to
correct mistakes or sub-optimal decisions with later versions. A good
period of trial and testing helps here.)

>
> Here however, both null and nil are commonly used in programming
> languages for the same thing. So why not allow both? If someone uses two
> languages, one uses NULL, the other nil, if would be really convenient
> to not have to keep thinking about which one you should be using.
>
> (Although having said that, mine don't allow null! But then I am the
> only user.)
>
>>
>> And Bart, if underscore is so diffult for you (perhaps you have
>> arthritis or other challenges), I'd recommend looking at different
>> keyboards, or enabling "sticky shift keys" or similar aids supported by
>> your OS of choice.
>>
>> Certainly the use of "shift" is not relevant to language design.
>
> Simplest of all is having alphunumeric identifiers not requiring you to
> pause in the middle to deal with case or shift changes.
>

Some people like camelCase for multi-word identifiers, some prefer
underscore_separation. I can't imagine many people dislike underscore
purely because of using the shift key (though some /do/ dislike it
because they find the underscore hard to see in some circumstances).

There are a few languages that allow multi-word identifiers separated by
spaces, or allow hyphens as "letters", but those are rare, and likely to
cause confusion.

> This especially applies to keywords that you can't do anything about.
>

Certainly it makes sense to have shorter and simpler keywords, at least
for those that are commonly used. And there is no point in having extra
underscores for no purpose. I might not object to underscores as much
as you do, but I see no benefit of "null_ptr" over "nullptr".

> So I'd say it's very relevant to not having a language that is a pita to
> use.

Well, I guess the OP will collect opinions, and use that to help make
his decisions.

Bart

unread,

Aug 17, 2021, 6:52:18 PM8/17/21

to

On 17/08/2021 22:08, David Brown wrote:

> On 17/08/2021 22:00, Bart wrote:

>> I sometimes allow a choice if I can't make up my mind about a feature or
>> a keyword. Then I can try it out and see which one feels better or looks
>> better, or which is used more often. Or I might use one form privately,
>> and another for shared code.
>
> Keeping the choice open while prototyping, developing and testing makes
> sense - that's fair enough. But once your language has solidified
> somewhat, then it's good to fix these things. (Though there is always a
> trade-off between keeping consistency between versions and being able to
> correct mistakes or sub-optimal decisions with later versions. A good
> period of trial and testing helps here.)

Another area where I like to have alternatives is basic types; the
choices on each line all refer to the same type:

byte word8 u8
word word64 u64
int int64 i64
real real64 r64 float64
int16 i16

The ones in the third column are universally understood, and I use them
for generated or shared code, but in normal source I use ones from the
first column, if they exist, or second if the width is significant or
there is no colloquial form.

The 'float64' I'd forgotten about; I guess that'll be coming out soon.

C of course famously has dozens of ways of writing some types (partly
due to them requiring multiple tokens, some optional, and which can be
in any order).

It's not surprising that so many applications define their own sets of
types. That's a worse problem than the language allowing a choice of 2 or 3.

>
> Some people like camelCase for multi-word identifiers, some prefer
> underscore_separation. I can't imagine many people dislike underscore
> purely because of using the shift key

I dislike them also because I can never remember if there is a
underscore or not.

David Brown

unread,

Aug 18, 2021, 4:40:09 AM8/18/21

to

On 18/08/2021 00:52, Bart wrote:
> On 17/08/2021 22:08, David Brown wrote:
>> On 17/08/2021 22:00, Bart wrote:
>
>>> I sometimes allow a choice if I can't make up my mind about a feature or
>>> a keyword. Then I can try it out and see which one feels better or looks
>>> better, or which is used more often. Or I might use one form privately,
>>> and another for shared code.
>>
>> Keeping the choice open while prototyping, developing and testing makes
>> sense - that's fair enough. But once your language has solidified
>> somewhat, then it's good to fix these things. (Though there is always a
>> trade-off between keeping consistency between versions and being able to
>> correct mistakes or sub-optimal decisions with later versions. A good
>> period of trial and testing helps here.)
>
> Another area where I like to have alternatives is basic types; the
> choices on each line all refer to the same type:
>
> byte   word8   u8
> word   word64 u64
> int    int64   i64
> real   real64 r64 float64
>        int16   i16
>
> The ones in the third column are universally understood, and I use them
> for generated or shared code, but in normal source I use ones from the
> first column, if they exist, or second if the width is significant or
> there is no colloquial form.

When people say "universally understood", they usually mean "I like them
and don't much care about the rest of the universe". As you know, I
think these extremely short names are horrible in many ways, and I
totally disagree that they are "universally understood". With enough
context I expect people can figure out what they are, but that's another
matter entirely - and it applies equally to any naming scheme that
includes bit sizes explicitly.

You spend a significant amount of time posting on c.l.c. about how
terrible it is when people use different names for the same type,
regardless of how vital type names are to program clarity and code
flexibility. And now you are recommending multiple names for the same
type that give absolutely /no/ advantages or benefits. You also
regularly complain that in C, fundamental types like "int" and "long
int" are poorly defined and unclear, and how it is better to have
explicitly sized types (and then you won't use C's explicitly sized
types, because that would mean you couldn't whine about them). And now
you want to tell us that it's great for a language to have "word"
meaning the same thing as "word64" and "u64"!

>
> The 'float64' I'd forgotten about; I guess that'll be coming out soon.

So one of the great things about having lots of different ways to write
exactly the same thing in a language is that the language's designer,
implementer and /single/ user can't remember them all.

When planning a new language, it's good to learn about existing
languages so that you can be inspired by parts that work well, and avoid
ideas that work badly. From that second viewpoint, I think you are
helping the OP significantly.

>
> C of course famously has dozens of ways of writing some types (partly
> due to them requiring multiple tokens, some optional, and which can be
> in any order).

I would not advise copying C's system for fundamental types any more
than I would recommend copying your multiple different names. But
unless you are trying to make a programming language less user-friendly
than Forth, and less portable than assembly, a language needs a way to
name types for use in particular cases.

>
> It's not surprising that so many applications define their own sets of
> types. That's a worse problem than the language allowing a choice of 2
> or 3.

C is far from perfect here (and no one claims otherwise). What /is/
surprising is that you would suggest that the answer is to have the
language start off with multiple names so that programmers get mixed up
and inconsistent before they even start writing their own code.

>
>>
>> Some people like camelCase for multi-word identifiers, some prefer
>> underscore_separation. I can't imagine many people dislike underscore
>> purely because of using the shift key
>
> I dislike them also because I can never remember if there is a
> underscore or not.
>

You can't even remember how your own languages work, despite having
written and implemented them and apparently written lots of code in them.

Bart

unread,

Aug 18, 2021, 6:59:31 AM8/18/21

to

No, I mean that they are used everywhere and widely understood. They're
even used in Linux kernel, see here under Typedefs:

https://www.kernel.org/doc/html/v4.10/process/coding-style.html

They are the primary types in Rust.

I believe they are used in the MSVC C compiler as suffixes for integer
literals (1234i32 and 1234ui32).

This all means they are useful as language-independent ways of refering
to such types in forums like this, because either everyone will already
know what they mean, or they can make a pretty good guess.

They are after all just contractions of C's int32_t-style family of
types, with extraneous letters (and underscores!) elided.

In my case they started off as being internal representations, then were
used in generated code, then, in order to be able to read that code back
in, they were made part of the language.

> And now
> you want to tell us that it's great for a language to have "word"
> meaning the same thing as "word64" and "u64"!

'word' is specific to my languages to mean 'unsigned integer'. The
formal, consistently named set of unsigned types go from word1 to
word128, which have the parallel naming scheme u1 to u128.

word64 also has the informal, "don't care" synonym 'word' (like 'int' is
a synonym for int64').

And 'byte' is a synonym for 'word8'.

>
>>
>> The 'float64' I'd forgotten about; I guess that'll be coming out soon.
>
> So one of the great things about having lots of different ways to write
> exactly the same thing in a language is that the language's designer,
> implementer and /single/ user can't remember them all.

I wanted something more mainstream since 'real' is not well known these
days. Probably the idea was to be be able to use it when sharing code,
so as not to have to explain when real was. But that didn't happen.

David Brown

unread,

Aug 18, 2021, 11:16:21 AM8/18/21

to

They are most certainly /not/ used everywhere. They are used in a
number of programs and a number of languages, but that is not
/everywhere/ or even remotely close to a measurable percentage of
"everywhere" in programming.

In context, if it is clear you are talking about a type, then I agree
that the short names are easily understood even if they are not names
you commonly use. But they are always less clear than something like
"int32".

> They're
> even used in Linux kernel, see here under Typedefs:
>
> https://www.kernel.org/doc/html/v4.10/process/coding-style.html
>

No sane programmer has ever used the Linux kernel as an example of good
style for general purpose coding. It is a highly specialised program,
with a unique background (and an old background - it comes from a time
before the standardised C types like int32_t). And while Linus Torvalds
has many fine qualities, there are many of his preferred styles and
other opinions on programming and languages that are, to put it mildly,
controversial.

> They are the primary types in Rust.

IMHO that was a silly decision. I haven't used Rust for anything more
than very simple testing, but to me it seems mostly a wasted
opportunity. It has several nice ideas, but I can't see it having any
significant benefit over C++ for my kind of use. For people who can't
get their pointers right in C, and refuse to use smart pointers in C++,
then perhaps Rust's system is safer. And it has some nice things, like
pattern matching. But its various types of macros and generics are way
weaker than C++'s templates. Maybe it will gain features to compete
well with C++ - maybe C++ will gain features to compete with Rust.

>
> I believe they are used in the MSVC C compiler as suffixes for integer
> literals (1234i32 and 1234ui32).

In that use-case, I can see some advantages. (I could also see them
being the basis for printf format specifiers, rather than C's rather
ugly <inttype.h> macros. But in a new language, I'd rather see a better
system than printf anyway.)

>
> This all means they are useful as language-independent ways of refering
> to such types in forums like this, because either everyone will already
> know what they mean, or they can make a pretty good guess.
>
> They are after all just contractions of C's int32_t-style family of
> types, with extraneous letters (and underscores!) elided.
>

int32 and uint32 are simple, clear, unambiguous, easy to type, and
cannot seriously be mistaken for anything else. I personally like the
_t suffixes, but I'm quite happy to accept that others have different
opinions.

> In my case they started off as being internal representations, then were
> used in generated code, then, in order to be able to read that code back
> in, they were made part of the language.
>
>
>> And now
>> you want to tell us that it's great for a language to have "word"
>> meaning the same thing as "word64" and "u64"!
>
> 'word' is specific to my languages to mean 'unsigned integer'. The
> formal, consistently named set of unsigned types go from word1 to
> word128, which have the parallel naming scheme u1 to u128.
>
> word64 also has the informal, "don't care" synonym 'word' (like 'int' is
> a synonym for int64').
>

The trouble with "word" is the size is seriously ambiguous. I'd say it
is worse than "int" in that respect.

> And 'byte' is a synonym for 'word8'.

"byte" is fair enough - I think it's reasonable to say that the meaning
of "smallest addressable unit of memory" is outdated. But I would not
use that for an 8-bit number, I'd use it for raw memory access that does
not have any semantic information.

>
>>
>>>
>>> The 'float64' I'd forgotten about; I guess that'll be coming out soon.
>>
>> So one of the great things about having lots of different ways to write
>> exactly the same thing in a language is that the language's designer,
>> implementer and /single/ user can't remember them all.
>
> I wanted something more mainstream since 'real' is not well known these
> days. Probably the idea was to be be able to use it when sharing code,
> so as not to have to explain when real was. But that didn't happen.
>

I've nothing against "float32", "float64", etc., as type names for
floating point data. (I'd add a "_t", of course!)

I'd be okay with "real32", "real64", etc., as well - but I think "float"
is more accurate (floating point numbers do not exactly represent real
numbers). And like "word", the name "real" is a long outdated name
without clear rules on its size.

Bart

unread,

Aug 18, 2021, 2:02:32 PM8/18/21

to

On 18/08/2021 16:16, David Brown wrote:

> On 18/08/2021 12:59, Bart wrote:

> int32 and uint32 are simple, clear, unambiguous, easy to type, and
> cannot seriously be mistaken for anything else.

int32 is OK, that's what I use mostly when I need that specific,
narrower type.

But I need a bigger difference between signed and unsigned integers.
Just sticking a 'u' at the front doesn't cut it, unless it replaces 'i'
with 'u' as happens with i32 and u32. With int32/uint32, the difference
is too subtle, and uint is unpleasant to type.

>> word64 also has the informal, "don't care" synonym 'word' (like 'int' is
>> a synonym for int64').
>>
>
> The trouble with "word" is the size is seriously ambiguous. I'd say it
> is worse than "int" in that respect.

Denotations like 'int' and 'word' are supposed to be /slightly/
ambiguous. They are used when you don't care about the width, but expect
the default to be sufficient.

The default on my current languages is to make then 64 bits wide, so
it's unlikely to be insufficient.

40 years ago they were 16 bits, and some 20 years ago they become 32
bits. I don't really see a need for default 128-bit integers even in 20
years from now.

Most languages appear to be stuck with a default 32-bit int type, which
is now too small for memory sizes, large object sizes, file sizes and
many other things.

As for 'word', I've used that to mean an unsigned version of 'int' since
the 80s, although it was then 16 bits. (It still is in my x64 assembler
in the form of DW directives and register names like W3.)

> I'd be okay with "real32", "real64", etc., as well - but I think "float"
> is more accurate (floating point numbers do not exactly represent real
> numbers). And like "word", the name "real" is a long outdated name
> without clear rules on its size.

I used 'real' in Algol, Fortran and Pascal, and I've used it in my own
languages since 1981 (when it was implemented as an f24 type). So I
don't care that it's outdated. Just that I might need to keep explaining
what it means!

David Brown

unread,

Aug 19, 2021, 7:13:23 AM8/19/21

to

On 18/08/2021 20:02, Bart wrote:
> On 18/08/2021 16:16, David Brown wrote:
>> On 18/08/2021 12:59, Bart wrote:
>
>> int32 and uint32 are simple, clear, unambiguous, easy to type, and
>> cannot seriously be mistaken for anything else.
>
> int32 is OK, that's what I use mostly when I need that specific,
> narrower type.
>
> But I need a bigger difference between signed and unsigned integers.
> Just sticking a 'u' at the front doesn't cut it, unless it replaces 'i'
> with 'u' as happens with i32 and u32. With int32/uint32, the difference
> is too subtle, and uint is unpleasant to type.

So it is obvious to you (so obvious that you think it is universally
obvious) that "u32" is "unsigned integer 32-bit", equally obvious that
"int32" is "signed integer 32-bit", and yet "uint32" is too subtle? I
can't help feeling there is an inconsistency here somewhere...

Let's just say I think that a programming language should have one
single standard method for naming these fixed-size types. Multiple
small variations for a type whose name means basically the same thing,
and which will be used in the same circumstances, does not help anyone.
Different names with different meanings, and used in different
circumstances, are another matter - even if they happen to have the same
size in a particular use-case.

These names don't have to be keywords or fundamental types in the
language. A language could have a completely flexible system for
integer types, so that "int32" is defined in the language standard
library as "type_alias int32 = builtin::integer<4, signed>", or whatever
syntax or features you pick. But thereafter, programmers should stick
to the standard names unless they have need of a specific name for the
type. (Thus in C, the fundamental boolean type is "_Bool" - but the
standard name is "bool". And for size-specific types, you should use
"int32_t" and friends, as those are the standard names. It doesn't
matter that some people use other names, for good or bad reasons - those
are still the names you should use.)

As for the details of the names - the language designed should pick
names that he/she likes, consulting with others in the project. Then
during alpha testing they should collect feedback from other users and
interested parties who are looking at the language.

>
>>> word64 also has the informal, "don't care" synonym 'word' (like 'int' is
>>> a synonym for int64').
>>>
>>
>> The trouble with "word" is the size is seriously ambiguous. I'd say it
>> is worse than "int" in that respect.
>
> Denotations like 'int' and 'word' are supposed to be /slightly/
> ambiguous. They are used when you don't care about the width, but expect
> the default to be sufficient.

I think it's fair to expect "int" to mean "a type meant to hold
integers". If you are used to C, or if the language is fairly low
level, you could assume it also means a small and efficient type. If
you are used to high level languages, you might take it to mean
unlimited range. But amongst anyone that has worked with low-level
programming, or who knows what processor they are targetting, "word"
means "machine word" at is tightly connected to the processor - with the
definition and size varying hugely. To someone without low-level
experience or knowledge, it might make no sense at all.

Thus I think "word" is a particularly bad choice of names - it has been
used and abused too much and has no real meaning left. I'd put it as a
lot worse than "int" in that respect.

I think there are times where a generic "number" type could be very
convenient. In simple languages with few types, it makes a lot of sense
(perhaps even more so in interpreted or bytecode-compiled languages).
Just have one type "num" that is a signed integer of the biggest size
that works efficiently for the processor. There would be no point in
having signed and unsigned versions here. This could be simple and
convenient for local variables, but I would not want to allow it for
types that are used in interfaces - you'd want it for limited scope use
where the compiler can see the ranges needed. ("int" in C is a little
like this in its original intention, but that has got lost somewhere
along the line as "int" has been used inappropriately when more tightly
specified types would be better, and as implementations have failed to
make "int" 64-bit on 64-bit systems.)

>
> The default on my current languages is to make then 64 bits wide, so
> it's unlikely to be insufficient.
>
> 40 years ago they were 16 bits, and some 20 years ago they become 32
> bits. I don't really see a need for default 128-bit integers even in 20
> years from now.
>
> Most languages appear to be stuck with a default 32-bit int type, which
> is now too small for memory sizes, large object sizes, file sizes and
> many other things.
>

32-bit is big enough for almost every situation for memory sizes, file
sizes, etc. Not /every/ situation - but almost all. But if you want a
type that can handle everything and be efficient on PC's, then 64-bit is
the choice.

> As for 'word', I've used that to mean an unsigned version of 'int' since
> the 80s, although it was then 16 bits. (It still is in my x64 assembler
> in the form of DW directives and register names like W3.)
>

Yes, I understand that. But programming languages should not be
designed around the experiences and preferences of one single
programmer. The name "word" should not be used in a language that has
ambitions beyond a small hobby language, precisely because it means so
many different things to different people or in different contexts, and
is thus meaningless and confusing.

>
>> I'd be okay with "real32", "real64", etc., as well - but I think "float"
>> is more accurate (floating point numbers do not exactly represent real
>> numbers). And like "word", the name "real" is a long outdated name
>> without clear rules on its size.
>
> I used 'real' in Algol, Fortran and Pascal, and I've used it in my own
> languages since 1981 (when it was implemented as an f24 type). So I
> don't care that it's outdated. Just that I might need to keep explaining
> what it means!
>

Sometimes it is hard to be objective about things that have been
familiar for so long. I've been familiar with "real numbers" as a named
mathematical concept since I was perhaps 10 years old. So it is hard to
imagine that someone might not know what "real" means.

But it is certainly easy to imagine that the size of a type "real" is
not clearly defined - unlike "float" and "double", it has never been
standardised and different sizes of "real" have been in common use.

James Harris

unread,

Aug 19, 2021, 8:51:48 AM8/19/21

to

On 17/08/2021 22:08, David Brown wrote:

...

> Certainly it makes sense to have shorter and simpler keywords, at least
> for those that are commonly used. And there is no point in having extra
> underscores for no purpose. I might not object to underscores as much
> as you do, but I see no benefit of "null_ptr" over "nullptr".

To be clear, I don't think I personally suggested null_ptr for the
following reasons. What I had in mind was that if there were going to be
two or more reserved pointer values then it might make sense for them to
be clearly related. And so I suggested ptr as a prefix - something like

ptr_undef
ptr_null

By contrast, if I were to use the nullptr that you mentioned then the
corresponding keywords would be

undefptr
nullptr

which is perhaps getting to be a bit hard to read. The situation would
be worse if there were many such reserved names as in

undefptr
nullptr
debugptr
signalptr
chainendptr

I know that discussion of what name to use for a reserved pointer value
is somewhat about minutiae but as I think the list shows even choices
such as this can make a difference to the readability of the program.

Not to be overlooked is the words which get reserved by the language and
so become unavailable for the programmer to use as identifier names. For
that reason, I think a better list than the above would be

ptr_undef
ptr_null
ptr_debug
ptr_signal
ptr_chainend

Although they are arguably more ugly such names would, perhaps, be
easier to read and recognise, and that would keep the reserved words as
having a common prefix. A programmer scanning the text of the program
and seeing the prefix would be able to immediately recognise it as a
special pointer value and then would only really need to notice the
specific if it was relevant to why he was looking at the code.

As I say, minor points.

...

> Well, I guess the OP will collect opinions, and use that to help make
> his decisions.
>

Yes. Opinions are always welcome.

--
James Harris

Bart

unread,

Aug 19, 2021, 8:58:00 AM8/19/21

to

On 19/08/2021 12:13, David Brown wrote:

> On 18/08/2021 20:02, Bart wrote:

>> But I need a bigger difference between signed and unsigned integers.
>> Just sticking a 'u' at the front doesn't cut it, unless it replaces 'i'
>> with 'u' as happens with i32 and u32. With int32/uint32, the difference
>> is too subtle, and uint is unpleasant to type.
>
> So it is obvious to you (so obvious that you think it is universally
> obvious) that "u32" is "unsigned integer 32-bit", equally obvious that
> "int32" is "signed integer 32-bit", and yet "uint32" is too subtle? I
> can't help feeling there is an inconsistency here somewhere...

uint just looks like a typo to me. (On UK keyboards, u and i are
adjacent so typing 'ui' with one press is not uncommon.)

If you actually make such a typo when writing i32 or u32, then the
difference is more apparent.

> These names don't have to be keywords or fundamental types in the
> language. A language could have a completely flexible system for
> integer types, so that "int32" is defined in the language standard
> library as "type_alias int32 = builtin::integer<4, signed>", or whatever
> syntax or features you pick. But thereafter, programmers should stick
> to the standard names unless they have need of a specific name for the
> type. (Thus in C, the fundamental boolean type is "_Bool" - but the
> standard name is "bool".

C is not the language to set examples from. A C implementation is quite
likely to define 'int32_t' on top of 'int', and 'uint32_t' on top of
'unsigned'! Somewhat circular definitions...

And for size-specific types, you should use
> "int32_t" and friends, as those are the standard names. It doesn't
> matter that some people use other names, for good or bad reasons - those
> are still the names you should use.)

I used to have such schemes; my earlier languages used the following
(inspired by Fortran) on top of which the colloquial aliases were defined:

int*N for signed integers
byte*N for unsigned integers
real*N for floats

as fundamental types, where N is a byte-size. I also played with int:32
and byte:64. Then I realised I was never going to use int*7 or byte:23,
and just used hardcodes names (and saved typing those shifted "*" and ":"!)

It's a not really a problem what a language uses; people will write
whatever the language requires [except in C where people are more apt to
use typedefs for basic types].

Outside of a specific language, I might use int32 or i32 or u64. Nobody
has ever asked me what they mean (except in c.l.c.)

> Thus I think "word" is a particularly bad choice of names - it has been
> used and abused too much and has no real meaning left. I'd put it as a
> lot worse than "int" in that respect.

OK. You don't need to use my language, or if you do, you can choose to
use 'u64' for a specific size, or create your own aliases, like I do in C.

Personally I prefer to type 'word' over 'unsigned long long int' or even
'uint64_t'.

>> Most languages appear to be stuck with a default 32-bit int type, which
>> is now too small for memory sizes, large object sizes, file sizes and
>> many other things.
>>
>
> 32-bit is big enough for almost every situation for memory sizes, file
> sizes, etc. Not /every/ situation - but almost all.

Not enough for you to forget completely about it's limitations. Have a
simple loop summing filesizes, and you are likely to overflow an int32
range; certainly you have to keep it very much in mind.

This is the whole reason for the ugly size_t in C. If 'int' was 64 bits,
you could forget about size_t, off_t, time_t, clock_t, and all the rest
of that zoo.

(If you are implementing 64-bit compilers, assemblers, linkers,
interpreters and runtimes, then you /need/ a 64-bit int!)

James Harris

unread,

Aug 19, 2021, 9:03:46 AM8/19/21

to

On 18/08/2021 19:02, Bart wrote:
> On 18/08/2021 16:16, David Brown wrote:

>> int32 and uint32 are simple, clear, unambiguous, easy to type, and
>> cannot seriously be mistaken for anything else.
>
> int32 is OK, that's what I use mostly when I need that specific,
> narrower type.
>
> But I need a bigger difference between signed and unsigned integers.
> Just sticking a 'u' at the front doesn't cut it, unless it replaces 'i'
> with 'u' as happens with i32 and u32. With int32/uint32, the difference
> is too subtle, and uint is unpleasant to type.

Did you really say you don't like uint because it's "unpleasant to type"?

:-o

...

> Denotations like 'int' and 'word' are supposed to be /slightly/
> ambiguous. They are used when you don't care about the width, but expect
> the default to be sufficient.

That's fine but only if you are (1) restricted to certain hardware or
(2) your language allows you to specify either the default or a range in
which the default will be required to be.

--
James Harris

Bart

unread,

Aug 19, 2021, 9:32:32 AM8/19/21

to

On 19/08/2021 14:03, James Harris wrote:
> On 18/08/2021 19:02, Bart wrote:
>> On 18/08/2021 16:16, David Brown wrote:
>
>
>>> int32 and uint32 are simple, clear, unambiguous, easy to type, and
>>> cannot seriously be mistaken for anything else.
>>
>> int32 is OK, that's what I use mostly when I need that specific,
>> narrower type.
>>
>> But I need a bigger difference between signed and unsigned integers.
>> Just sticking a 'u' at the front doesn't cut it, unless it replaces
>> 'i' with 'u' as happens with i32 and u32. With int32/uint32, the
>> difference is too subtle, and uint is unpleasant to type.
>
> Did you really say you don't like uint because it's "unpleasant to type"?
>
> :-o

Yeah. I just don't like it.

> ...
>
>> Denotations like 'int' and 'word' are supposed to be /slightly/
>> ambiguous. They are used when you don't care about the width, but
>> expect the default to be sufficient.
>
> That's fine but only if you are (1) restricted to certain hardware or
> (2) your language allows you to specify either the default or a range in
> which the default will be required to be.

I usually set the default to the target machine word size.

This works for me since, once my languages target 64 bits for example,
they're unlikely to still target 32 bits, which mainstream ones still
have to support.

I can still run on 32 bits (eg. via a C target), but it will be less
efficient as many operations will be unnecessarily 64 bits.

There could be a dedicated language version where int might be 32 or 16
bits, but I can't guarantee the same programs still working, as they may
assume the wider int type:

int worldpop = 7500 million

David Brown

unread,

Aug 19, 2021, 10:09:32 AM8/19/21

to

On 19/08/2021 14:57, Bart wrote:
> On 19/08/2021 12:13, David Brown wrote:
>> On 18/08/2021 20:02, Bart wrote:
>
>>> But I need a bigger difference between signed and unsigned integers.
>>> Just sticking a 'u' at the front doesn't cut it, unless it replaces 'i'
>>> with 'u' as happens with i32 and u32. With int32/uint32, the difference
>>> is too subtle, and uint is unpleasant to type.
>>
>> So it is obvious to you (so obvious that you think it is universally
>> obvious) that "u32" is "unsigned integer 32-bit", equally obvious that
>> "int32" is "signed integer 32-bit", and yet "uint32" is too subtle? I
>> can't help feeling there is an inconsistency here somewhere...
>
> uint just looks like a typo to me. (On UK keyboards, u and i are
> adjacent so typing 'ui' with one press is not uncommon.)

They are adjacent on most Latin alphabet keyboard layouts, I think.

>
> If you actually make such a typo when writing i32 or u32, then the
> difference is more apparent.

I find it very hard to believe that "uint32" is a common typo for
"int32" and commonly goes unnoticed, and that "u32" vs "i32" is less
significantly likely to happen or go unnoticed. But unless someone has
statistics on such errors, we'll never know for sure.

>
>> These names don't have to be keywords or fundamental types in the
>> language. A language could have a completely flexible system for
>> integer types, so that "int32" is defined in the language standard
>> library as "type_alias int32 = builtin::integer<4, signed>", or whatever
>> syntax or features you pick. But thereafter, programmers should stick
>> to the standard names unless they have need of a specific name for the
>> type. (Thus in C, the fundamental boolean type is "_Bool" - but the
>> standard name is "bool".
>
> C is not the language to set examples from. A C implementation is quite
> likely to define 'int32_t' on top of 'int', and 'uint32_t' on top of
> 'unsigned'! Somewhat circular definitions...

I used C as an example here, not because I think the details of its
types should be copied. C does things the way it does because they made
sense at the time, and history has passed since its conception. Despite
all your moanings and groanings, C's system was worked well for the last
50 years and continues to work well now - at least for those willing to
accept it and work with it instead of fighting it. But I would not copy
the same system for a /new/ language, nor did I suggest it.

>
> And for size-specific types, you should use
>> "int32_t" and friends, as those are the standard names. It doesn't
>> matter that some people use other names, for good or bad reasons - those
>> are still the names you should use.)
>
> I used to have such schemes; my earlier languages used the following
> (inspired by Fortran) on top of which the colloquial aliases were defined:
>
> int*N               for signed integers
> byte*N              for unsigned integers
> real*N              for floats
>
> as fundamental types, where N is a byte-size. I also played with int:32
> and byte:64. Then I realised I was never going to use int*7 or byte:23,
> and just used hardcodes names (and saved typing those shifted "*" and ":"!)
>
> It's a not really a problem what a language uses; people will write
> whatever the language requires [except in C where people are more apt to
> use typedefs for basic types].

Anybody doing serious programming in a real language is going to make
extensive use of named types - including named scaler types. There is,
of course, a need for simple little languages - not everything has to be
suitable for large-scale coding projects. But if you are going to write
large-scale software, and are interested in writing clear, maintainable
code that minimises the risk of error, then you want good typing.
Ideally there should be support for strong types here, not just aliases.

>
> Outside of a specific language, I might use int32 or i32 or u64. Nobody
> has ever asked me what they mean (except in c.l.c.)

No one in c.l.c. has asked you want they mean, to the best of my
recollection. But many have asked you why you use them, or asked you to
use standard C types when writing C instead of silly, petty
abbreviations. If you had talked about your "prgmming langage", people
would know what you meant - and question your spelling. If you
continued to insist that that's how /you/ prefer to write it, and that
it is superior to the standard spelling, people would think you are
rude, arrogant and ignorant. Unsurprisingly, you evoke similar
reactions when you post your silliness in c.l.c. (And I'd expect the
same results if I posted to comp.lang.rust with code using type "int32_t".)

>
>
>> Thus I think "word" is a particularly bad choice of names - it has been
>> used and abused too much and has no real meaning left. I'd put it as a
>> lot worse than "int" in that respect.
>
> OK. You don't need to use my language, or if you do, you can choose to
> use 'u64' for a specific size, or create your own aliases, like I do in C.
>
> Personally I prefer to type 'word' over 'unsigned long long int' or even
> 'uint64_t'.
>

Well, in your language you can of course use whatever you want (as long
as you can remember what it means) - no one else is looking at your code
or using the language. But here we are not talking about /your/
personal languages - we are giving opinions and ideas to help some one
else with their language. And I assume he has ambitions that it might
be of interest to people other than himself.

>
>>> Most languages appear to be stuck with a default 32-bit int type, which
>>> is now too small for memory sizes, large object sizes, file sizes and
>>> many other things.
>>>
>>
>> 32-bit is big enough for almost every situation for memory sizes, file
>> sizes, etc. Not /every/ situation - but almost all.
>
> Not enough for you to forget completely about it's limitations. Have a
> simple loop summing filesizes, and you are likely to overflow an int32
> range; certainly you have to keep it very much in mind.
>

Sure - if that's the kind of program you are writing.

> This is the whole reason for the ugly size_t in C. If 'int' was 64 bits,
> you could forget about size_t, off_t, time_t, clock_t, and all the rest
> of that zoo.
>

No, you could not. And I assume you are just being your usual perverse
argumentative self, rather than actually wanting to learn anything.
(You have, after all, had this stuff explained patiently and repeatedly
many times.)

> (If you are implementing 64-bit compilers, assemblers, linkers,
> interpreters and runtimes, then you /need/ a 64-bit int!)
>

Certainly a 64-bit integer type is convenient - having it as "int" is
very far from necessary.

David Brown

unread,

Aug 19, 2021, 10:23:20 AM8/19/21

to

On 19/08/2021 14:51, James Harris wrote:
> On 17/08/2021 22:08, David Brown wrote:
>
> ...
>
>> Certainly it makes sense to have shorter and simpler keywords, at least
>> for those that are commonly used. And there is no point in having extra
>> underscores for no purpose. I might not object to underscores as much
>> as you do, but I see no benefit of "null_ptr" over "nullptr".
>
> To be clear, I don't think I personally suggested null_ptr for the
> following reasons. What I had in mind was that if there were going to be
> two or more reserved pointer values then it might make sense for them to
> be clearly related. And so I suggested ptr as a prefix - something like
>
> ptr_undef
> ptr_null
>
> By contrast, if I were to use the nullptr that you mentioned then the
> corresponding keywords would be
>
> undefptr
> nullptr
>
> which is perhaps getting to be a bit hard to read.

I'd question the usefulness of having these as distinct names or values
in the first place (especially when balanced against the run-time cost
of manual or automatic checking of pointer validity - a comparison to 0
is cheap, a comparison to something else is not). And I'd question the
usefulness of having "ptr" as part of the name here at all. Remember,
C++ only has the name "nullptr" because it could not use "null".

You'd perhaps be better having "undefined" as a keyword and allowing it
for value types as well as pointers. Perhaps it would be a meta-value -
generating no code, but being useful for the compiler to check that the
programmer has put a real value in the variable.

> The situation would
> be worse if there were many such reserved names as in
>
> undefptr
> nullptr
> debugptr
> signalptr
> chainendptr
>

These are definitely getting bad. Whatever you are thinking of here,
it's unlikely that making these reserved names is a good idea. A
well-designed language should aim to minimise the reserved name count,
not maximise it.

> I know that discussion of what name to use for a reserved pointer value
> is somewhat about minutiae but as I think the list shows even choices
> such as this can make a difference to the readability of the program.
>
> Not to be overlooked is the words which get reserved by the language and
> so become unavailable for the programmer to use as identifier names. For
> that reason, I think a better list than the above would be
>
> ptr_undef
> ptr_null
> ptr_debug
> ptr_signal
> ptr_chainend
>
> Although they are arguably more ugly such names would, perhaps, be
> easier to read and recognise, and that would keep the reserved words as
> having a common prefix. A programmer scanning the text of the program
> and seeing the prefix would be able to immediately recognise it as a
> special pointer value and then would only really need to notice the
> specific if it was relevant to why he was looking at the code.
>
> As I say, minor points.
>

These are important points that are at the heart of how your language is
read and written.

Things that are used often, should be easy to read and write. Things
that are used rarely, can be hard. I don't know what you want to do
with your language, but for the sake of argument let's guess it should
be useable where C is used today. How often are null pointers used in C
code? Very often - so make it short, simple, and a keyword in your
language (such as "null"). How often are pointers to signals used?
Almost never - so it's fine if the type is pulled in from system
libraries as "system::signals::signal_pointer", and it most certainly
should /not/ be a reserved keyword.

(As a guide rule, never put anything into the language itself if it can
equally well be put in a system library.)

Bart

unread,

Aug 19, 2021, 11:28:38 AM8/19/21

to

On 19/08/2021 15:09, David Brown wrote:
> On 19/08/2021 14:57, Bart wrote:

> I find it very hard to believe that "uint32" is a common typo for
> "int32" and commonly goes unnoticed, and that "u32" vs "i32" is less
> significantly likely to happen or go unnoticed. But unless someone has
> statistics on such errors, we'll never know for sure.

It probably takes me 50% longer to distinguish int32_t from uint32_t,
than i32 and u32. Between int32 and word32, is a bit quicker.

BTW those compact types are also used by Odin and Zig languages, not
just Rust.

> I used C as an example here, not because I think the details of its
> types should be copied. C does things the way it does because they made
> sense at the time, and history has passed since its conception. Despite
> all your moanings and groanings, C's system was worked well for the last
> 50 years and continues to work well now

No, it is still failing now. /You/ might want to adopt the [u]intN_t
types, but you still need to interact with other software that uses char
(especially char*) with its indeterminate signedness; or with int, long
and long long where long may match one of the other two but is
incompatible with neither.

And then you have those off_t types mentioned below.

>> This is the whole reason for the ugly size_t in C. If 'int' was 64 bits,
>> you could forget about size_t, off_t, time_t, clock_t, and all the rest
>> of that zoo.
>>
>
> No, you could not. And I assume you are just being your usual perverse
> argumentative self, rather than actually wanting to learn anything.

I think you can since you just don't see such types anywhere else.

At best there might be a special type such as usize, but that's likely
because many languages are still dominated by a 32-bit int type which is
too small for current data and memory and file sizes.

James Harris

unread,

Aug 19, 2021, 3:25:27 PM8/19/21

to

On 19/08/2021 15:23, David Brown wrote:
> On 19/08/2021 14:51, James Harris wrote:
>> On 17/08/2021 22:08, David Brown wrote:

...

>> ptr_undef
>> ptr_null
>>
>> By contrast, if I were to use the nullptr that you mentioned then the
>> corresponding keywords would be
>>
>> undefptr
>> nullptr
>>
>> which is perhaps getting to be a bit hard to read.
>
> I'd question the usefulness of having these as distinct names or values
> in the first place

So would I. This is just an idea, as yet.

>
> (especially when balanced against the run-time cost
> of manual or automatic checking of pointer validity - a comparison to 0
> is cheap, a comparison to something else is not).

Performance should not be a problem. It will be largely unaffected even
if there are quite a few such constants. For example, say that there
were many (more than two) values of the pointer constants starting with
these

0 = undefined
1 = null
2 = debug
etc

In a paging environment all of them would be in the lowest page. It
would be unmapped. So attempts to dereference any of them would
automatically lead to an exception - at no cost.

Where a bad pointer would have to be detected programmatically (e.g. in
the absence of paging) then instead of the nominal

cmp eax, 0
je badpointer

the generated code could have something like

cmp eax, 16
jb badpointer

Further, many of those tests could be either hoisted to be outside the
inner loop or omitted altogether where it can be proven that the
pointer's value must be in a certain range.

>
> And I'd question the
> usefulness of having "ptr" as part of the name here at all. Remember,
> C++ only has the name "nullptr" because it could not use "null".

I wondered why C++ added nullptr. From what I've found, it seems that
NULL can be automatically converted to an integer and that can cause
problems for C++'s overloading whereas nullptr cannot be so converted. I
expect there's more to it than that but it suggests that a new language
would not have to have both.

>
> You'd perhaps be better having "undefined" as a keyword and allowing it
> for value types as well as pointers. Perhaps it would be a meta-value -
> generating no code, but being useful for the compiler to check that the
> programmer has put a real value in the variable.

I am considering a course which would implement something you suggested
earlier where pointers are declared as either of these:

(pointer to T)
(pointer to T) or (null)

For an identifier declared as the former, setting the pointer to null
would be prohibited. If declared as the latter, however, then
dereferences would essentially need to be wrapped in case statements.

However, that would be part of variant typing where an object is
declared as

(T0) or (T1) or (T2) or (T3) ....

for arbitrary types Tn. Again, uses would need to be wrapped in case
statements. There would only be the one mechanism for variants but it
could be applied to pointers which could potentially be null.

But ATM that's a long way off as it would be complex to implement and I
am at a much earlier stage.

...

>> As I say, minor points.
>>
>
> These are important points that are at the heart of how your language is
> read and written.

Thanks for saying that. It's true that language (and standard library)
design is filled with a myriad of small decisions that each have a
bearing - some large, some small.

...

> (As a guide rule, never put anything into the language itself if it can
> equally well be put in a system library.)

Agreed.

--
James Harris

James Harris

unread,

Aug 19, 2021, 5:06:28 PM8/19/21

to

On 19/08/2021 13:51, James Harris wrote:

...

> ptr_undef
> ptr_null
> ptr_debug
> ptr_signal
> ptr_chainend

On the topic of having more 'bad pointer' identifications than just NULL
I came across this:

https://lwn.net/Articles/236920/

It suggests a distinction over kmalloc returns such as

not initialized
allocation failed due to lack of space
allocation failed due to depletion of slots
allocated OK but was of zero bytes so do not try to dereference

One suggestion "causes kmalloc(0) to return a special ZERO_SIZE_PTR
value. It is a non-NULL value which looks like a legitimate pointer, but
which causes a fault on any attempt at dereferencing it. Any attempt to
call kfree() with this special value will do the right thing".

Bottom line: there /may/ (and I'd put it no higher than that) be good
reason to have more than one pointer value which cannot be dereferenced.
As written in a reply earlier today it looks as though the different
values could be implemented virtually for free.

--
James Harris

Bart

unread,

Aug 19, 2021, 6:42:12 PM8/19/21

to

'equally well'. Things implemented with a library are usually inferior.
Either in how they can be used, or by incurring extra overheads (eg.
slower compilation).

It can also mean users replacing standard features with their own.

David Brown

unread,

Aug 20, 2021, 3:05:32 AM8/20/21

to

On 19/08/2021 17:28, Bart wrote:
> On 19/08/2021 15:09, David Brown wrote:
>> On 19/08/2021 14:57, Bart wrote:
>
>> I find it very hard to believe that "uint32" is a common typo for
>> "int32" and commonly goes unnoticed, and that "u32" vs "i32" is less
>> significantly likely to happen or go unnoticed. But unless someone has
>> statistics on such errors, we'll never know for sure.
>
> It probably takes me 50% longer to distinguish int32_t from uint32_t,
> than i32 and u32. Between int32 and word32, is a bit quicker.
>

Um, okay. I guess I'll take your word for it, rather than asking for
the timing measurements you've made to back up the figures.

> BTW those compact types are also used by Odin and Zig languages, not
> just Rust.

The world is full of obscure and minor programming languages, most of
which almost no one has ever heard of. Some have useful niche
application areas, others are "general purpose" and therefore never used
in practice. Occasionally, one will break out and become popular -
perhaps for good technical reasons, but usually for non-technical
reasons. So who cares how many languages there are on Wikipedia's lists
or Rosetta Stone's language comparisons that happen to use a particular
name for the their types? It's like telling us there is this guy in
Russia who drives a tank to work - therefore a tank is a perfectly
reasonable choice of commuter car.

>
>> I used C as an example here, not because I think the details of its
>> types should be copied. C does things the way it does because they made
>> sense at the time, and history has passed since its conception. Despite
>> all your moanings and groanings, C's system was worked well for the last
>> 50 years and continues to work well now
>
> No, it is still failing now. /You/ might want to adopt the [u]intN_t
> types, but you still need to interact with other software that uses char
> (especially char*) with its indeterminate signedness; or with int, long
> and long long where long may match one of the other two but is
> incompatible with neither.

These are all perfectly good types in their place, and are often exactly
what you want. /Sometimes/ you want size-specific types, but often they
are not necessary and a type that adapts to the target for efficiency is
better for portability. When you are dealing with strings and
characters, signedness doesn't matter - it's only in quite niche
situations that you'd want to do arithmetic with 8-bit types.

The problems come when people misunderstand what a type is, or how it
works, or use the wrong type in the wrong circumstance. This may come
as a revelation to you, but C is not unique here - people who don't
understand the language they are using or the code they are writing, or
who write bad code, get poor results.

The only thing special about C is that there is a /vast/ amount of code
written in the language, and vast amounts of it are available for
scrutiny. If there were any measurable amount of code written in your
languages, and any users other than yourself, you'd find approximately
the same percentage of programmers writing the same percentage of poor
code as with any other language.

>
> And then you have those off_t types mentioned below.

"off_t" is not part of C. But don't let that get in the way of another
rant.

>
>>> This is the whole reason for the ugly size_t in C. If 'int' was 64 bits,
>>> you could forget about size_t, off_t, time_t, clock_t, and all the rest
>>> of that zoo.
>>>
>>
>> No, you could not. And I assume you are just being your usual perverse
>> argumentative self, rather than actually wanting to learn anything.
>
> I think you can since you just don't see such types anywhere else.
>

Ah, ignorance is bliss!

Dmitry A. Kazakov

unread,

Aug 20, 2021, 3:27:10 AM8/20/21

to

On 2021-08-20 09:05, David Brown wrote:

> It's like telling us there is this guy in
> Russia who drives a tank to work - therefore a tank is a perfectly
> reasonable choice of commuter car.

Depends on the line of work, how many checkpoints are on the road, if
there is a road at all. Sorry, could not resist.

P.S. As in any non-free state, gun laws in Russia are very strict. In
Germany, I believe, you could buy a decommissioned disarmed tank, but
you could not certify it for the public road. If Elon Musk built a Tesla
tank...

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

James Harris

unread,

Aug 20, 2021, 3:29:05 AM8/20/21

to

On 18/08/2021 19:02, Bart wrote:

> On 18/08/2021 16:16, David Brown wrote:

>> int32 and uint32 are simple, clear, unambiguous, easy to type, and
>> cannot seriously be mistaken for anything else.
>
> int32 is OK, that's what I use mostly when I need that specific,
> narrower type.
>
> But I need a bigger difference between signed and unsigned integers.
> Just sticking a 'u' at the front doesn't cut it, unless it replaces 'i'
> with 'u' as happens with i32 and u32.

If you like

i32
u32

are you reserving all identifiers of the form iN and uN where N is an
integer?

What if the programmer wants to use, say, i2 and i3 as identifiers?

On the other hand, if the programmer wants to define a 128-bit integer
and a 21-bit unsigned integer would he write

i128
u21

?

If he wants a 1024-bit unsigned integer would be write

u1024

?

IOW the iN and uN forms are tempting but they seem to be rather limiting.

--
James Harris

David Brown

unread,

Aug 20, 2021, 3:33:09 AM8/20/21

to

On 19/08/2021 21:25, James Harris wrote:
> On 19/08/2021 15:23, David Brown wrote:
>> On 19/08/2021 14:51, James Harris wrote:
>>> On 17/08/2021 22:08, David Brown wrote:
>
> ...
>
>>>    ptr_undef
>>>    ptr_null
>>>
>>> By contrast, if I were to use the nullptr that you mentioned then the
>>> corresponding keywords would be
>>>
>>>    undefptr
>>>    nullptr
>>>
>>> which is perhaps getting to be a bit hard to read.
>>
>> I'd question the usefulness of having these as distinct names or values
>> in the first place
>
> So would I. This is just an idea, as yet.
>

Fair enough.

>>
>> (especially when balanced against the run-time cost
>> of manual or automatic checking of pointer validity - a comparison to 0
>> is cheap, a comparison to something else is not).
>
> Performance should not be a problem. It will be largely unaffected even
> if there are quite a few such constants. For example, say that there
> were many (more than two) values of the pointer constants starting with
> these
>
> 0 = undefined
> 1 = null
> 2 = debug
> etc
>
> In a paging environment all of them would be in the lowest page. It
> would be unmapped. So attempts to dereference any of them would
> automatically lead to an exception - at no cost.
>

That has several costs. One is that it requires a paging environment.
That might be fine - I don't know what your targets are here. And it
might need cooperation with the OS if you want to handle these in
different ways than just a program crash. (You've covered the other big
cost yourself below.)

> Where a bad pointer would have to be detected programmatically (e.g. in
> the absence of paging) then instead of the nominal
>
> cmp eax, 0
> je badpointer
>
> the generated code could have something like
>
> cmp eax, 16
> jb badpointer
>
> Further, many of those tests could be either hoisted to be outside the
> inner loop or omitted altogether where it can be proven that the
> pointer's value must be in a certain range.
>

On many processors, the difference is a bigger. (And on x86, "test eax,
eax" is more likely to be used than "cmp eax, 0", because it is smaller
and faster IIRC.) For many RISC processors, you would have to load a
constant value 16 into a register before doing the comparison. Some
processors have a "branch if register is 0" instruction, but no flag
register at all. Even the x86 has this kind of thing internally - the
x86 instruction stream may look similar for both comparisons, but the
way they are handled in modern x86 processors can be very different,
with highly optimised paths for the extremely popular "compare to 0"
pattern.

>>
>> And I'd question the
>> usefulness of having "ptr" as part of the name here at all. Remember,
>> C++ only has the name "nullptr" because it could not use "null".
>
> I wondered why C++ added nullptr. From what I've found, it seems that
> NULL can be automatically converted to an integer and that can cause
> problems for C++'s overloading whereas nullptr cannot be so converted. I
> expect there's more to it than that but it suggests that a new language
> would not have to have both.

That is basically it. It means you can have :

void foo(int);
void foo(char *);

and call "foo(nullptr)" rather than "foo(0)".

Equally, it means you can have :

void bar(int);

and "bar(nullptr)" is a compile-time error, unlike "bar(NULL)".

It also means programmers distinguish their null pointers more clearly
in code, reducing the risk of mistakes and increasing the static
checking that can be done (such as by using gcc's
"-Wzero-as-null-pointer-constant" warning). The use of "0" to mean
either the integer 0 or a null pointer comes from C's history - I
believe in BCPL there was no distinction between integers and pointers
at all. "nullptr" is a step towards improving the language here, though
the historical baggage from C and earlier C++ standards cannot be removed.

>
>>
>> You'd perhaps be better having "undefined" as a keyword and allowing it
>> for value types as well as pointers. Perhaps it would be a meta-value -
>> generating no code, but being useful for the compiler to check that the
>> programmer has put a real value in the variable.
>
> I am considering a course which would implement something you suggested
> earlier where pointers are declared as either of these:
>
> (pointer to T)
> (pointer to T) or (null)
>
> For an identifier declared as the former, setting the pointer to null
> would be prohibited. If declared as the latter, however, then
> dereferences would essentially need to be wrapped in case statements.

I wouldn't use quite that syntax, but I agree with the principle.

>
> However, that would be part of variant typing where an object is
> declared as
>
> (T0) or (T1) or (T2) or (T3) ....
>
> for arbitrary types Tn. Again, uses would need to be wrapped in case
> statements. There would only be the one mechanism for variants but it
> could be applied to pointers which could potentially be null.
>
> But ATM that's a long way off as it would be complex to implement and I
> am at a much earlier stage.
>

Summation types and pattern matching are a very nice feature in a
language, IMHO.

David Brown

unread,

Aug 20, 2021, 3:39:08 AM8/20/21

to

On 20/08/2021 09:27, Dmitry A. Kazakov wrote:
> On 2021-08-20 09:05, David Brown wrote:
>
>> It's like telling us there is this guy in
>> Russia who drives a tank to work - therefore a tank is a perfectly
>> reasonable choice of commuter car.
>
> Depends on the line of work, how many checkpoints are on the road, if
> there is a road at all. Sorry, could not resist.
>
> P.S. As in any non-free state, gun laws in Russia are very strict.

In any /free/ state as well, gun laws are very strict. The only
countries where gun laws are not strict are those without working laws,
and those that misunderstand "free" and think the freedom to shoot
people trumps the freedom to not be shot.

> In
> Germany, I believe, you could buy a decommissioned disarmed tank, but
> you could not certify it for the public road. If Elon Musk built a Tesla
> tank...
>

I should, of course, have used Lithuania as an example for the tank.
After all, even the mayor drives a tank there...

<http://www.baltic-course.com/eng/transport/?doc=46548>

Dmitry A. Kazakov

unread,

Aug 20, 2021, 3:56:23 AM8/20/21

to

On 2021-08-20 09:39, David Brown wrote:
> On 20/08/2021 09:27, Dmitry A. Kazakov wrote:

>> P.S. As in any non-free state, gun laws in Russia are very strict.
>
> In any /free/ state as well, gun laws are very strict. The only
> countries where gun laws are not strict are those without working laws,
> and those that misunderstand "free" and think the freedom to shoot
> people trumps the freedom to not be shot.

You confuse free with a benign state which bestows its subjects with
permissions, allowances and licenses. A free state is where free
citizens decide what the state is allowed to do, not the other way around.

But it becomes off-topic...

Bart

unread,

Aug 20, 2021, 6:47:14 AM8/20/21

to

On 20/08/2021 08:29, James Harris wrote:
> On 18/08/2021 19:02, Bart wrote:
>> On 18/08/2021 16:16, David Brown wrote:
>
>
>>> int32 and uint32 are simple, clear, unambiguous, easy to type, and
>>> cannot seriously be mistaken for anything else.
>>
>> int32 is OK, that's what I use mostly when I need that specific,
>> narrower type.
>>
>> But I need a bigger difference between signed and unsigned integers.
>> Just sticking a 'u' at the front doesn't cut it, unless it replaces
>> 'i' with 'u' as happens with i32 and u32.
>
> If you like
>
> i32
> u32
>
> are you reserving all identifiers of the form iN and uN where N is an
> integer?
>
> What if the programmer wants to use, say, i2 and i3 as identifiers?

I only decided on this scheme (which was int32 and word32) when I
realised I was never going to use anything other than this small set of
power-of-two sizes.

Neither do most other languages of this kind.

Before that I was using general forms like int*4 (i32) or int:64 (i64)
which allowed the possibility of arbitrary sizes. However the language
then needs to decide what to do about int:53 or int:5. Or int*100000.

The compact forms came about later, as I mentioned (but David Brown
doesn't believe) that they are commonly used either in actual languages,
or as colloquial ways of refering to such types.

In my case however I also have bittypes which I call u1, u2 and u4
(which then continue as u8, u16 etc).

Then it sometimes happens that I want variables called t1 and u1, but I
can't!

> On the other hand, if the programmer wants to define a 128-bit integer
> and a 21-bit unsigned integer would he write
>
> i128
> u21
>
> ?
>
> If he wants a 1024-bit unsigned integer would be write
>
> u1024
>
> ?

That's not going to happen. Not in any language of mine. Fixed size
numeric types like that go up to i128/u128 (conceivably one or two
levels further) but that's it. (It's enough of a nightmare just
implementing 128 bits!)

It would anyway need a more general syntax that reserving millions of
identifiers of the form i846464 and u345! Probably int:846464, with the
number part most likely a constant expression.

I do use such syntax for bitfields inside records, for example:

int32 pos : (lineno:23, fileno:9)

Here bitfields are unsigned values.

Anyway, arbitrary-sized integers have been discussed here before.

> IOW the iN and uN forms are tempting but they seem to be rather limiting.

Why, what are you planning? Some languages just give you 'integer' (or
even 'number') and that's it. Specifying 8/16/32/64/128-bit sizes is
usually sufficient.

Anything else sounds like a great idea but is probably not practical and
likely not worthwhile. It's a specialist area that is better covered
with bitfields for 1-127 bits, or arbitrary precision for larger sizes,
each with their own syntax.

Bart

unread,

Aug 20, 2021, 7:18:19 AM8/20/21

to

On 20/08/2021 08:05, David Brown wrote:

> On 19/08/2021 17:28, Bart wrote:

> The world is full of obscure and minor programming languages, most of
> which almost no one has ever heard of.

Rust is well known. Zig comes up frequently in forums (even this one in
past threads). Odin less often, but it is mentioned. You can download
all of them and try them out.

There are some really obscure languages on rosettacode, like kapab or
wisp, but the ones above aren't really that obscure.

> The only thing special about C is that

Its type system is not fit for purpose.

>> And then you have those off_t types mentioned below.
>
> "off_t" is not part of C. But don't let that get in the way of another
> rant.

So what is it a part of? Since it came up extensively in a recent clc
thread.

It's a file-offset type used inside struct stat using in sys.h which is
to do with stat() functions.

If I want to use such functions via a FFI, then I might need to find out
what it actually is. But you say it doesn't exist, so that's OK then!

The funny thing is, if I compile this C program:

#include <stdio.h>
#include <sys/types.h>

int main() {
printf("%d\n",(int)sizeof(off_t));
}

I don't get 'unknown identifier' for off_t or some such message; it
seems to know what it is!

(Here, off_t has a concrete type which is i32, but internally is long
int which is distinct from both int (i32 here) and int32_t (also i32).
That's why no one in their right mind would take inspiration from C.)

>> I think you can since you just don't see such types anywhere else.
>>
>
> Ah, ignorance is bliss!

So, enlighten us. It will most likely be languages where you tie
yourself up in knots having a special type for everything.

James Harris

unread,

Aug 20, 2021, 7:55:24 AM8/20/21

to

On 20/08/2021 11:47, Bart wrote:
> On 20/08/2021 08:29, James Harris wrote:

...

>> are you reserving all identifiers of the form iN and uN where N is an
>> integer?
>>
>> What if the programmer wants to use, say, i2 and i3 as identifiers?
>
> I only decided on this scheme (which was int32 and word32) when I
> realised I was never going to use anything other than this small set of
> power-of-two sizes.
>
> Neither do most other languages of this kind.
>
> Before that I was using general forms like int*4 (i32) or int:64 (i64)
> which allowed the possibility of arbitrary sizes. However the language
> then needs to decide what to do about int:53 or int:5. Or int*100000.
>
> The compact forms came about later, as I mentioned (but David Brown
> doesn't believe) that they are commonly used either in actual languages,
> or as colloquial ways of refering to such types.
>
> In my case however I also have bittypes which I call u1, u2 and u4
> (which then continue as u8, u16 etc).
>
> Then it sometimes happens that I want variables called t1 and u1, but I
> can't!

So under that scheme

i1 is a type name
i2 is a type name
i3 could be an identifier
i4 is a type name
i5 could be an identifier
i6 could be an identifier
etc

?

Naming integers iN was tempting but I felt that it either took away too
much of the namespace or, as illustrated, would be irregular and fiddly.

...

>> IOW the iN and uN forms are tempting but they seem to be rather limiting.
>
> Why, what are you planning?

If possible (and I haven't implemented it yet) I'd rather have the
number of bits as a qualifier which goes after the type name as follows

int 8 a
int 16 b
int 32 c

etc. I think there's little difference in readability compared with

i8 a
i16 b
i32 c

while the former are flexible, regular and consistent, and they consume
less of the namespace which could otherwise be used for identifiers.

I may need to add some punctuation but I'd avoid it if I could so as to
keep the appearance clean.

>
> Some languages just give you 'integer' (or
> even 'number') and that's it. Specifying 8/16/32/64/128-bit sizes is
> usually sufficient.

Agreed, but such sizes could still be specified in a more regular way.

>
> Anything else sounds like a great idea but is probably not practical and
> likely not worthwhile. It's a specialist area that is better covered
> with bitfields for 1-127 bits, or arbitrary precision for larger sizes,
> each with their own syntax.
>

In another comment you said you didn't like uint but I've tried other
names such as

uns - for unsigned
nat - for natural
nneg - for non-negative

and I have to say that after looking at those

uint

doesn't seem quite so bad! That's especially the case if there's a space
between uint and the size so we are not talking about

uint64

but

uint 64

Having said that, what do you make of uns when compared with uint?

--
James Harris

Bart

unread,

Aug 20, 2021, 9:19:18 AM8/20/21

to

I don't use i1 i2 i4, only i8/i16/i32/i64/i128.

Some languages, like Rust, allow i32 to be used as a variable name as
well, since types only appear in certain contexts.

I don't allow that. (I'd need to write `i32 for the variable name. Not
practical for normal coding, but for generated code, it provides a
workaround.)

Many languages now which allow size-specific types will have then as one of:

i32
int32
Int32
int32_t etc

You could say that all these are irregular since int31/int33 are legal
user identifiers, but int32 isn't (well apart from Rust).

This applies to 'int' to:

hnt int jnt ...
ins int inu ...

And actually to most keywords unless the language has a peculiar enough
syntax to allow keywords as identifiers (I think PL/I allowed if if=if ...)

> ...
>
>>> IOW the iN and uN forms are tempting but they seem to be rather
>>> limiting.
>>
>> Why, what are you planning?
>
> If possible (and I haven't implemented it yet) I'd rather have the
> number of bits as a qualifier which goes after the type name as follows
>
> int 8 a
> int 16 b
> int 32 c

This is more flexible (I'd prefer some punctuation or other way of
connecting the number with the type) but as I said, you then have to
deal with extra possibilities:

* Could the number be an expression?

* Could it be the name of a macro or constant that expands to a number?

* If the number is a name, then int a ... becomes ambiguous; are you
defining an int called 'a',or is 'a' a name that expands to '32', and
the actual variable name follows?

* What to do about invalid sizes?

* Could such a number appear also after a user-defined type; for example
if an alias 'T' for 'int' was created, would 'T 8 a' be allowed?

This is where I decided to just define a handful of fixed names and be
done with it.

>

>> Some languages just give you 'integer' (or even 'number') and that's
>> it. Specifying 8/16/32/64/128-bit sizes is usually sufficient.
>
> Agreed, but such sizes could still be specified in a more regular way.

I don't really agree because the possibilties are so small, and
generally hard-coded at each place where they are used. It's not like an
array:

int A[N]

where N can be literally anything, or might not be known until runtime.

However, you only have to look at other languages:

Java Odin Zig Rust C# D Nim Go ...

and they are all follow the same pattern: a set of fixed names, either
like byte/short/int/long, or with size suffixes: int8/int16/int32/int64.

>
>> Anything else sounds like a great idea but is probably not practical
>> and likely not worthwhile. It's a specialist area that is better
>> covered with bitfields for 1-127 bits, or arbitrary precision for
>> larger sizes, each with their own syntax.
>>
>
> In another comment you said you didn't like uint but I've tried other
> names such as
>
> uns - for unsigned
> nat - for natural
> nneg - for non-negative
>
> and I have to say that after looking at those
>
> uint
>
> doesn't seem quite so bad! That's especially the case if there's a space
> between uint and the size so we are not talking about
>
> uint64
>
> but
>
> uint 64
>
> Having said that, what do you make of uns when compared with uint?

Here I agree, uint is better than uns, nat, and nneg! Uint or variations
is also commonly used so that wouldn't be a bad choice.

But in my case, I've always had distinct type names for unsigned integers.

For example, 'byte', 'word', 'long' were all unsigned (these were u8,
u16, u32). Now 'word' is u64, except in my assembler where a 'w' in some
reserved words indicates u16. I no longer use 'long'.

Signed integers have always either included 'int', or 'i' for the
compact form.

Notice however how I'm using u16, u64 etc to clarify exactly what a type
is? So if those forms are that unambiguous in discussions, what not have
them in a language too?

Bart

unread,

Aug 20, 2021, 9:42:50 AM8/20/21

to

On 20/08/2021 08:33, David Brown wrote:
> On 19/08/2021 21:25, James Harris wrote:

>> I wondered why C++ added nullptr. From what I've found, it seems that
>> NULL can be automatically converted to an integer and that can cause
>> problems for C++'s overloading whereas nullptr cannot be so converted. I
>> expect there's more to it than that but it suggests that a new language
>> would not have to have both.
>
> That is basically it. It means you can have :
>
> void foo(int);
> void foo(char *);
>
> and call "foo(nullptr)" rather than "foo(0)".
>
> Equally, it means you can have :
>
> void bar(int);
>
> and "bar(nullptr)" is a compile-time error, unlike "bar(NULL)".
>
> It also means programmers distinguish their null pointers more clearly
> in code, reducing the risk of mistakes and increasing the static
> checking that can be done (such as by using gcc's
> "-Wzero-as-null-pointer-constant" warning). The use of "0" to mean
> either the integer 0 or a null pointer comes from C's history - I
> believe in BCPL there was no distinction between integers and pointers
> at all. "nullptr" is a step towards improving the language here, though
> the historical baggage from C and earlier C++ standards cannot be removed.

This is what I did recently. While I've always had 'nil' for a null
pointer (null is better as an adjective than nil!), integers could be
used freely with pointers, at least on my older implementations.

Now the null pointer value can only be denoted as 'nil', not 0.

David Brown

unread,

Aug 20, 2021, 9:49:29 AM8/20/21

to

On 20/08/2021 13:18, Bart wrote:
> On 20/08/2021 08:05, David Brown wrote:
>> On 19/08/2021 17:28, Bart wrote:
>
>> The world is full of obscure and minor programming languages, most of
>> which almost no one has ever heard of.
>
> Rust is well known. Zig comes up frequently in forums (even this one in
> past threads). Odin less often, but it is mentioned. You can download
> all of them and try them out.

Rust has a lot of hype, and is more known than used. Zig is just
another "safer than Ada, faster than C" wannabe. Odin likewise (though
it hasn't even reached the stage of having a Wikipedia entry).

Maybe one of these will become popular and reach the mainstream. Almost
certainly they all have technical advantages compared to C or existing
mainstream languages - almost certainly they also have disadvantages,
and absolutely certainly they have features that are subjective and
controversial.

None are big enough that compatibility or familiarity with them is
remotely relevant in defining a new language. (But by all means look at
them and copy any good ideas you see.)

>
> There are some really obscure languages on rosettacode, like kapab or
> wisp, but the ones above aren't really that obscure.
>
>> The only thing special about C is that
>
> Its type system is not fit for purpose.
>

I'm sure some of the millions of C programmers would have noticed that
during the last 50 years, and realised if C did not work.

>>> And then you have those off_t types mentioned below.
>>
>> "off_t" is not part of C. But don't let that get in the way of another
>> rant.
>
> So what is it a part of? Since it came up extensively in a recent clc
> thread.

It is part of POSIX. It is used to hold offsets within files, for some
kinds of file functions. (I don't know the details.) On some systems,
it will be 32-bit, and on others it will be 64-bit. It may even be
64-bit on a 32-bit system with C90 and no C99 support - in which case
there are no portable 64-bit types. But because it is a named type
defined in the headers for your POSIX system, it means you can use it to
write portable code that can be compiled on a range of POSIX targets and
work correctly. That would not be possible if you used a fundamental C
type (like "long int"), or a fixed-size type.

>
> It's a file-offset type used inside struct stat using in sys.h which is
> to do with stat() functions.
>
> If I want to use such functions via a FFI, then I might need to find out
> what it actually is. But you say it doesn't exist, so that's OK then!
>
> The funny thing is, if I compile this C program:
>
> #include <stdio.h>
> #include <sys/types.h>
>
> int main() {
> printf("%d\n",(int)sizeof(off_t));
> }
>
> I don't get 'unknown identifier' for off_t or some such message; it
> seems to know what it is!

What part of "#include <sys/types.h>" is confusing you?

>
> (Here, off_t has a concrete type which is i32, but internally is long
> int which is distinct from both int (i32 here) and int32_t (also i32).
> That's why no one in their right mind would take inspiration from C.)
>
>
>>> I think you can since you just don't see such types anywhere else.
>>>
>>
>> Ah, ignorance is bliss!
>
> So, enlighten us. It will most likely be languages where you tie
> yourself up in knots having a special type for everything.
>

Every portable language that has decent wrappings for OS calls will use
OS-specific named types at the lowest level (though they might translate
them to language-specific types in the wrappings). Of course, lots of
language just assume "all the world is 64-bit *nix", or "all the world
is 32-bit windows".

But of course if you think types are a bad idea, and type-safety is for
wimps who are scared of a little program crash, then you will consider
any use of these types as "tying yourself up in knots".

Bart

unread,

Aug 20, 2021, 10:25:28 AM8/20/21

to

On 20/08/2021 14:49, David Brown wrote:
> On 20/08/2021 13:18, Bart wrote:
>> On 20/08/2021 08:05, David Brown wrote:
>>> On 19/08/2021 17:28, Bart wrote:
>>
>>> The world is full of obscure and minor programming languages, most of
>>> which almost no one has ever heard of.
>>
>> Rust is well known. Zig comes up frequently in forums (even this one in
>> past threads). Odin less often, but it is mentioned. You can download
>> all of them and try them out.
>
> Rust has a lot of hype, and is more known than used. Zig is just

The point is, does everyone know or can make an excellent guess as to
what is meant by 'i64' in the context of programming languages and
primitive types?

> It is part of POSIX. It is used to hold offsets within files, for some
> kinds of file functions. (I don't know the details.) On some systems,
> it will be 32-bit, and on others it will be 64-bit. It may even be
> 64-bit on a 32-bit system with C90 and no C99 support - in which case

So is likely to be one of i32 u32 i64 u64.

If I had to write a function right now to return a file offset (that is,
the +ve or -ve difference between two locations in one file), I'd use an
i64 type which covers offsets of +/- 9 billion billion bytes approx.

> What part of "#include <sys/types.h>" is confusing you?

What parts of those angle brackets is confusing /you/?

On Windows systems such a header file is only located within the system
headers of a C implementation.

But if your point is that off_t doesn't appear within the C standard,
then take your pick from size_t, time_t, clock_t, fpos-t, ptrdiff_t,
maxalign_t, wint_t, rsize_t, errno_t, ...

There are actually 100s of such types. I'd say the vast majority are
merely one of the u8-u64 and i8-i64 types.

>>>> I think you can since you just don't see such types anywhere else.
>>>>
>>>
>>> Ah, ignorance is bliss!
>>
>> So, enlighten us. It will most likely be languages where you tie
>> yourself up in knots having a special type for everything.
>>
>
> Every portable language that has decent wrappings for OS calls will use
> OS-specific named types at the lowest level (though they might translate
> them to language-specific types in the wrappings). Of course, lots of
> language just assume "all the world is 64-bit *nix", or "all the world
> is 32-bit windows".

Pretty much every processor including most microcontrollers will be
using power-of-type types. So lots of languages will assume that too;
why not?

> But of course if you think types are a bad idea, and type-safety is for
> wimps who are scared of a little program crash, then you will consider
> any use of these types as "tying yourself up in knots".

There might be some point to it if a language stopped me from doing
time_t + size_t + errno_t, but in the case of C, it doesn't.

It just makes it harder for people using libraries via FFIs when the API
makes use of those highly specific C types.

I have to use elaborate tools to do the conversion. Then that struct
stat which typically uses one custom typedef per member ends up looking
like this:

record stat = $caligned
u32 st_dev
u16 st_ino
u16 st_mode
i16 st_nlink
i16 st_uid
i16 st_gid
u32 st_rdev
u32 st_size
u64 st_atime
u64 st_mtime
u64 st_ctime
end

No other input is now needed. Now go at look at some actual declarations
of 'struct stat', if you can even pinpoint the one used for your
platform of interest, amongst all the multiple declarations and
conditional code blocks, once you've located the header file from the
tower of include statements.

James Harris

unread,

Aug 20, 2021, 10:45:52 AM8/20/21

to

On 20/08/2021 14:19, Bart wrote:
> On 20/08/2021 12:55, James Harris wrote:

...

>>    int 8 a
>>    int 16 b
>>    int 32 c
>
> This is more flexible (I'd prefer some punctuation or other way of
> connecting the number with the type) but as I said, you then have to
> deal with extra possibilities:

...

> * Could such a number appear also after a user-defined type; for example
> if an alias 'T' for 'int' was created, would 'T 8 a' be allowed?

That's a good question and it leads to a bigger topic. As we are already
OT on this one I'll start a new thread.

--
James Harris

James Harris

unread,

Aug 20, 2021, 11:27:54 AM8/20/21

to

On 20/08/2021 14:42, Bart wrote:
> On 20/08/2021 08:33, David Brown wrote:
>> On 19/08/2021 21:25, James Harris wrote:

...

>> The use of "0" to mean
>> either the integer 0 or a null pointer comes from C's history - I
>> believe in BCPL there was no distinction between integers and pointers
>> at all. "nullptr" is a step towards improving the language here, though
>> the historical baggage from C and earlier C++ standards cannot be
>> removed.
>
> This is what I did recently. While I've always had 'nil' for a null
> pointer

> (null is better as an adjective than nil!),

Most apropos!

And "nil points" can sound very French. ;-)

> integers could be
> used freely with pointers, at least on my older implementations.
>
> Now the null pointer value can only be denoted as 'nil', not 0.
>

What if you want to set a pointer to a specific address or you want to
compare addresses? Do you now have conversion functions?

--
James Harris

James Harris

unread,

Aug 20, 2021, 11:49:18 AM8/20/21

to

On 20/08/2021 08:33, David Brown wrote:

...

> The use of "0" to mean
> either the integer 0 or a null pointer comes from C's history - I
> believe in BCPL there was no distinction between integers and pointers
> at all.

Just checked. BCPL treated memory as an array of words. "Pointers to
consecutive words of memory are consecutive integers" and "words of
memory have consecutive integer addresses".

https://www.cl.cam.ac.uk/~mr10/bcplman.pdf

I guess that means that addresses would be scaled by the number of bytes
per word.

--
James Harris

James Harris

unread,

Aug 20, 2021, 11:54:09 AM8/20/21

to

On 20/08/2021 15:25, Bart wrote:
> On 20/08/2021 14:49, David Brown wrote:
>> On 20/08/2021 13:18, Bart wrote:
>>> On 20/08/2021 08:05, David Brown wrote:
>>>> On 19/08/2021 17:28, Bart wrote:
>>>
>>>> The world is full of obscure and minor programming languages, most of
>>>> which almost no one has ever heard of.
>>>
>>> Rust is well known. Zig comes up frequently in forums (even this one in
>>> past threads). Odin less often, but it is mentioned. You can download
>>> all of them and try them out.
>>
>> Rust has a lot of hype, and is more known than used. Zig is just
>
> The point is, does everyone know or can make an excellent guess as to
> what is meant by 'i64' in the context of programming languages and
> primitive types?

Isn't there more to it than that in that if the person guessed correctly
for i64 would he also guess correctly for i48 or i8 or i3. I would
suggest that under your scheme a reasonable person would be likely to
guess some of them wrong! :-)

So as I say it's not as simple as getting someone to guess at an easy name.

--
James Harris

Bart

unread,

Aug 20, 2021, 12:49:22 PM8/20/21

to

Probably an educated guess would be a better term, as they may have come
across i64 etc before, and recognised those power-of-two values as also
used in similar type names.

Not so for i3. i8 is a funny one; I'm a little uneasy with single-digit
widths like this, but there is only u8 and i8.

(In the past I used more byte widths than bit widths, so such names may
have been i1 i2 i4 i8, and actually were, internally. Now I mainly use
bit widths, partly to avoid the crossover point at 'i8', which might
mean 64 bits under the old system, or 8 bits under the new.)

Andy Walker

unread,

Aug 20, 2021, 12:49:41 PM8/20/21

to

On 20/08/2021 16:27, James Harris wrote:
> What if you want to set a pointer to a specific address or you want
> to compare addresses? Do you now have conversion functions?

I've asked this before, and don't recall getting a definitive
answer. Is your language intended to be high-level and portable, which
is what I understood, or not? If so, then "specific address" should
not be part of it, nor should comparing addresses [exc, as in C, for
(in)equality or within one array/structure]. If you need to address,
[eg] the hardware clock, that would need to be done via a library
facility, not [eg] by "clockptr := 123456;". You don't need to ask
other posters about "life imitating Bart".

As for questions about whether "int16" is "reserved", isn't
that best dealt with by a notional outer layer in which standard
library facilities [inc types] are declared? Then programmers who
want to can declare their own versions of [eg] "sqrt" and "int",
without trampling on the library facilities. [Eg, you could want
an own-version of these for debugging purposes while leaving the
underlying program unchanged.]

--
Andy Walker, Nottingham.
Andy's music pages: www.cuboid.me.uk/andy/Music
Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Soler

James Harris

unread,

Aug 20, 2021, 1:19:39 PM8/20/21

to

On 20/08/2021 17:49, Andy Walker wrote:
> On 20/08/2021 16:27, James Harris wrote:

>> What if you want to set a pointer to a specific address or you want
>> to compare addresses? Do you now have conversion functions?
>
> I've asked this before, and don't recall getting a definitive
> answer. Is your language intended to be high-level and portable, which
> is what I understood, or not?

If you are asking that of me rather than Bart then Yes, it's meant to be
/very/ portable :-) but that doesn't prevent it from being used to write
hardware drivers or prevent it from reading in addresses as parameters.

> If so, then "specific address" should
> not be part of it, nor should comparing addresses [exc, as in C, for
> (in)equality or within one array/structure].

Well, take as an example a VGA driver. On a common-or-garden PC which is
running its video in VGA mode there's a standard address for the frame
buffer which is, specifically, 0xb8000. The address might be different
on different hardware but it can be provided as a parameter to the VGA
driver, e.g. from a configuration file. The language would then need to
allow a program to use that address to form a pointer. Hence the need
for a conversion function.

I was asking Bart if he had conversion functions in both directions:
address-to-pointer and pointer-to-address.

> If you need to address,
> [eg] the hardware clock, that would need to be done via a library
> facility, not [eg] by "clockptr := 123456;".

Indeed. But the library facility also needs to be written!

> You don't need to ask
> other posters about "life imitating Bart".

:-)

>
> As for questions about whether "int16" is "reserved", isn't
> that best dealt with by a notional outer layer in which standard
> library facilities [inc types] are declared? Then programmers who
> want to can declare their own versions of [eg] "sqrt" and "int",
> without trampling on the library facilities. [Eg, you could want
> an own-version of these for debugging purposes while leaving the
> underlying program unchanged.]
>

Again, if that's a question for me rather than for Bart I have a plan to
define all types (possibly except boolean) in the standard library
rather than making them part of the language. The language would define
keywords, control structures, operators (without defining their
meanings), and namespace management rules. The standard library (or
libraries) would supply standard types along with their behaviour, and
various data structures. And programmers could supply their own types in
exactly the same way. At least that's the theory!

--
James Harris

James Harris

unread,

Aug 20, 2021, 1:26:09 PM8/20/21

to

On 20/08/2021 17:49, Bart wrote:
> On 20/08/2021 16:54, James Harris wrote:
>> On 20/08/2021 15:25, Bart wrote:

...

>>> The point is, does everyone know or can make an excellent guess as to
>>> what is meant by 'i64' in the context of programming languages and
>>> primitive types?
>>
>> Isn't there more to it than that in that if the person guessed
>> correctly for i64 would he also guess correctly for i48 or i8 or i3. I
>> would suggest that under your scheme a reasonable person would be
>> likely to guess some of them wrong! :-)
>>
>> So as I say it's not as simple as getting someone to guess at an easy
>> name.
>
> Probably an educated guess would be a better term, as they may have come
> across i64 etc before, and recognised those power-of-two values as also
> used in similar type names.
>
> Not so for i3. i8 is a funny one; I'm a little uneasy with single-digit
> widths like this, but there is only u8 and i8.
>
> (In the past I used more byte widths than bit widths, so such names may
> have been i1 i2 i4 i8, and actually were, internally. Now I mainly use
> bit widths, partly to avoid the crossover point at 'i8', which might
> mean 64 bits under the old system, or 8 bits under the new.)

Sounds as though your educated guesser may have to do an awful lot of
educated guessing! ;-)

But I agree with you that bit widths rule the day. I'd even use bits for
allocation of storage. For digital computing bits are fundamental.
Eight-bit bytes, by contrast, may one day turn out to be just what we
used in the 20th and 21st centuries.

--
James Harris

David Brown

unread,

Aug 20, 2021, 2:34:56 PM8/20/21

to

On 20/08/2021 16:25, Bart wrote:
> On 20/08/2021 14:49, David Brown wrote:
>> On 20/08/2021 13:18, Bart wrote:
>>> On 20/08/2021 08:05, David Brown wrote:
>>>> On 19/08/2021 17:28, Bart wrote:
>>>
>>>> The world is full of obscure and minor programming languages, most of
>>>> which almost no one has ever heard of.
>>>
>>> Rust is well known. Zig comes up frequently in forums (even this one in
>>> past threads). Odin less often, but it is mentioned. You can download
>>> all of them and try them out.
>>
>> Rust has a lot of hype, and is more known than used. Zig is just
>
> The point is, does everyone know or can make an excellent guess as to
> what is meant by 'i64' in the context of programming languages and
> primitive types?
>
>
>> It is part of POSIX. It is used to hold offsets within files, for some
>> kinds of file functions. (I don't know the details.) On some systems,
>> it will be 32-bit, and on others it will be 64-bit. It may even be
>> 64-bit on a 32-bit system with C90 and no C99 support - in which case
>
> So is likely to be one of i32 u32 i64 u64.

It is going to be signed (since the POSIX standard says so). But it is
not necessarily the same type as int32_t or int64_t (we are talking
about C here - whether you like them or not, those are the standard
names) even if it is the same size. (Or have you forgotten how type
compatibility works in C?) It is quite possible for it to be an extra
implementation-specific type.

Remember, there can be different integer types with the same size and
signedness which are not compatible with each other. I am quite sure
you don't like that, and will tell us that in /your/ language things are
different. I'm not sure I like it myself, though it is not something
that bothers me in my programming. (I'd prefer them either to be fully
compatible, or strong types that are not compatible at all without
explicit type changes.)

To be fair, however, I agree that it is quite likely that off_t is the
same type as either int32_t or int64_t. I just can't imagine writing
decent code that would rely on that. (I have used systems where int32_t
and uint32_t, etc., are completely independent from the fundamental C
types despite having the same size.)

>
> If I had to write a function right now to return a file offset (that is,
> the +ve or -ve difference between two locations in one file), I'd use an
> i64 type which covers offsets of +/- 9 billion billion bytes approx.
>

Some people, on the other hand, write code that uses the specified types
for the tasks and thus have code that is more portable. You write code
for yourself in your own languages for your own computers, and don't
have to be concerned about the world outside.

>> What part of "#include <sys/types.h>" is confusing you?
>
> What parts of those angle brackets is confusing /you/?
>

Nothing - but then, I know what they mean in C. They don't mean the
file is part of the C implementation. I could give you a reference in
the C standards, but I know that would only annoy you since you want C
to be how /you/ define it, not how anyone else defines it.

> On Windows systems such a header file is only located within the system
> headers of a C implementation.

That may be true on your Windows system. It is not true in general for C.

And even more relevant, the standard headers that are part of the C
standard are all documented in the C standard (oddly enough) - and
"sys/types.h" is not there.

>
> But if your point is that off_t doesn't appear within the C standard,
> then take your pick from size_t, time_t, clock_t, fpos-t, ptrdiff_t,
> maxalign_t, wint_t, rsize_t, errno_t, ...
>

A few of these are actually in the C standard - well done!

> There are actually 100s of such types.> I'd say the vast majority are
> merely one of the u8-u64 and i8-i64 types.
>

There are quite a lot of types defined in the C standard - I doubt if it
as many as 100, but I haven't counted them. The solid majority are not
the same as any of the intNN_t or uintNN_t types, even if you get the
naming right. (clock_t and time_t are floating point types, all the
atomic types are different from the non-atomic types, the div_t types
are structures, etc.)

As for the scalar integer POSIX types, these are most likely to be one
of the intNN_t or uintNN_t types. But they are given names because they
may be of different sizes on different systems. Some will be 32-bit on
one system and 64-bit on another one. Some are different for different
versions of the same OS on the same target. Some are different for
different configuration options on the same version of the same OS on
the same target. So you use the named POSIX types and get the right
size at compile time.

The same applies to other OS's, and other portable libraries. The only
exception is in the DOS and Windows world, where programmers have a long
tradition of writing non-portable shite code that makes unwarranted
assumptions about how /they/ don't need to follow rules about type names
or other portability issues. It's because of cowboy programmers with
their "i32 should be good enough for any use" attitude that mean the
world is stuck with polished turds for computers, because you can't get
their crap code to run on anything better.

>
>
>>>>> I think you can since you just don't see such types anywhere else.
>>>>>
>>>>
>>>> Ah, ignorance is bliss!
>>>
>>> So, enlighten us. It will most likely be languages where you tie
>>> yourself up in knots having a special type for everything.
>>>
>>
>> Every portable language that has decent wrappings for OS calls will use
>> OS-specific named types at the lowest level (though they might translate
>> them to language-specific types in the wrappings). Of course, lots of
>> language just assume "all the world is 64-bit *nix", or "all the world
>> is 32-bit windows".
>
> Pretty much every processor including most microcontrollers will be
> using power-of-type types. So lots of languages will assume that too;
> why not?
>

I didn't say they shouldn't assume power-of-two types.

>
>> But of course if you think types are a bad idea, and type-safety is for
>> wimps who are scared of a little program crash, then you will consider
>> any use of these types as "tying yourself up in knots".
>
> There might be some point to it if a language stopped me from doing
> time_t + size_t + errno_t, but in the case of C, it doesn't.
>

I'd be happier with stronger typing - I use strong types in C++.

But that's not the point of these types.

> It just makes it harder for people using libraries via FFIs when the API
> makes use of those highly specific C types.
>
> I have to use elaborate tools to do the conversion. Then that struct
> stat which typically uses one custom typedef per member ends up looking
> like this:
>
>     record stat = $caligned
>         u32 st_dev
>         u16 st_ino
>         u16 st_mode
>         i16 st_nlink
>         i16 st_uid
>         i16 st_gid
>         u32 st_rdev
>         u32 st_size
>         u64 st_atime
>         u64 st_mtime
>         u64 st_ctime
>     end
>
> No other input is now needed. Now go at look at some actual declarations
> of 'struct stat', if you can even pinpoint the one used for your
> platform of interest, amongst all the multiple declarations and
> conditional code blocks, once you've located the header file from the
> tower of include statements.

If you use a tool to generate system-specific and target-specific
interface wrappers (and that's a perfectly good solution), then what's
your complaint? Is it just that when the C and Unix founding fathers
designed their language and OS, that they didn't have enough
consideration for how those design decisions would cost Bart several
hours of extra effort?

Bart

unread,

Aug 20, 2021, 2:39:43 PM8/20/21

to

It turns out the i32-style is also used by LLVM for the source format of
its intermediate language:

define dso_local void @Main() #0 {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
%3 = alloca i32, align 4
%4 = alloca i32, align 4

store i32 2, i32* %2, align 4
....

(I remember reading that LLVM allows integer types up to 2**23 or
possibly 2**24; I don't know how those would be represented.)

I think even David Brown must have heard of LLVM!

I use them also in my own intermediate language; a new project with a
discrete textual format, where an instruction looks like this:

add i64

This language directly supports arithmetic on i8-128, and i8-u128 types.
Anything else has to be programmed on top. All this IL does is allow you
to specify and manipulate (push/pop/pass/return) block data types in
general, eg. block:72 for an int573 type (types must be byte multiples).

I think if you have in mind completely arbitrary integer widths, and
intend to define an independent IL, then you need to consider what
capabilities /that/ will have, and how much effort it would require.

Bart

unread,

Aug 20, 2021, 2:49:26 PM8/20/21

to

On 20/08/2021 18:19, James Harris wrote:
> On 20/08/2021 17:49, Andy Walker wrote:

>
>> If so, then "specific address" should
>> not be part of it, nor should comparing addresses [exc, as in C, for
>> (in)equality or within one array/structure].
>
> Well, take as an example a VGA driver. On a common-or-garden PC which is
> running its video in VGA mode there's a standard address for the frame
> buffer which is, specifically, 0xb8000. The address might be different
> on different hardware but it can be provided as a parameter to the VGA
> driver, e.g. from a configuration file. The language would then need to
> allow a program to use that address to form a pointer. Hence the need
> for a conversion function.
>
> I was asking Bart if he had conversion functions in both directions:
> address-to-pointer and pointer-to-address.

I don't have anything as fancy as separate pointer and address types.

Just explicit conversions between pointers and integers.

Since everything is 64 bits, mostly they are no-ops.

> Again, if that's a question for me rather than for Bart I have a plan to
> define all types (possibly except boolean) in the standard library
> rather than making them part of the language.

I think Julia does something like that:

primitive type Int32 <: Signed 32 end

(https://docs.julialang.org/en/v1/manual/types/)

As I said, I'm not a fan of that approach, as I think it's simpler,
tidier and more efficient just to build these in.

Dmitry A. Kazakov

unread,

Aug 20, 2021, 2:53:15 PM8/20/21

to

On 2021-08-20 19:19, James Harris wrote:

> Again, if that's a question for me rather than for Bart I have a plan to
> define all types (possibly except boolean) in the standard library
> rather than making them part of the language.

Your language lacks necessary abstraction level for that. Higher level
languages have type-algebraic operations to produce types. Like "long"
in Algol 68, or type range <>, type mod <> in Ada. The syntax of such
operations is less relevant, if you like unreadable languages it could
well be

u(<number>)

where <number> is a static positive-valued expression. Note an
*expression*. If you have that in the language you do not need any
library, if you don't you will have to write your library in C.

Bart

unread,

Aug 20, 2021, 5:01:22 PM8/20/21

to

On 20/08/2021 19:34, David Brown wrote:

> On 20/08/2021 16:25, Bart wrote:

>> So is likely to be one of i32 u32 i64 u64.
>
> It is going to be signed (since the POSIX standard says so). But it is
> not necessarily the same type as int32_t or int64_t (we are talking
> about C here - whether you like them or not, those are the standard
> names) even if it is the same size. (Or have you forgotten how type
> compatibility works in C?) It is quite possible for it to be an extra
> implementation-specific type.
>
> Remember, there can be different integer types with the same size and
> signedness which are not compatible with each other. I am quite sure
> you don't like that, and will tell us that in /your/ language things are
> different. I'm not sure I like it myself, though it is not something
> that bothers me in my programming. (I'd prefer them either to be fully
> compatible, or strong types that are not compatible at all without
> explicit type changes.)
>
> To be fair, however, I agree that it is quite likely that off_t is the
> same type as either int32_t or int64_t. I just can't imagine writing
> decent code that would rely on that.

There will be 100s of 1000s of functions that might be callable in
innumerable libraries. Imagine if each decided to use custom types like
off_t, which was all they exposed in the APIs, and required you to go
hunting for the concrete types.

> That may be true on your Windows system. It is not true in general for C.

I work on Windows.

> And even more relevant, the standard headers that are part of the C
> standard are all documented in the C standard (oddly enough) - and
> "sys/types.h" is not there.

So I can remove that completely from my C compiler, and every program
will still work?

>> There are actually 100s of such types.> I'd say the vast majority are
>> merely one of the u8-u64 and i8-i64 types.
>>
>
> There are quite a lot of types defined in the C standard - I doubt if it
> as many as 100, but I haven't counted them.

There are about 60 types ending with _t in N1570.pdf. Plus loads of
associated macros.

(clock_t and time_t are floating point types

They're usually integer types.

> The same applies to other OS's, and other portable libraries. The only
> exception is in the DOS and Windows world, where programmers have a long
> tradition of writing non-portable shite code that makes unwarranted
> assumptions about how /they/ don't need to follow rules about type names
> or other portability issues.

All the types used by WinAPI are listed here:

https://docs.microsoft.com/en-us/windows/win32/winprog/windows-data-types

This actually provides all the info needed for someone creating bindings
to work across an FFI.

Unlike typical C headers where you have to go down endless rabbit holes
trying to find out all this stuff, and then you find it's all conditional.

>> I have to use elaborate tools to do the conversion. Then that struct
>> stat which typically uses one custom typedef per member ends up looking
>> like this:
>>
>>     record stat = $caligned
>>         u32 st_dev
>>         u16 st_ino
>>         u16 st_mode
>>         i16 st_nlink
>>         i16 st_uid
>>         i16 st_gid
>>         u32 st_rdev
>>         u32 st_size
>>         u64 st_atime
>>         u64 st_mtime
>>         u64 st_ctime
>>     end
>>
>> No other input is now needed. Now go at look at some actual declarations
>> of 'struct stat', if you can even pinpoint the one used for your
>> platform of interest, amongst all the multiple declarations and
>> conditional code blocks, once you've located the header file from the
>> tower of include statements.
>
> If you use a tool to generate system-specific and target-specific
> interface wrappers (and that's a perfectly good solution), then what's
> your complaint?

It's complicated way of doing it that usually doesn't work, because it
cannot translate C code which comes up in macros. Also it cannot
reliably translate conditional code that depends on the C compiler in
use or on D-macros or other inputs.

And, my tool only works from my C compiler. And that compiler has
already done the hard work of constructing a suitable struct stat,
painstakingly working from half a dozen sources all defining it a
different way.

> Is it just that when the C and Unix founding fathers
> designed their language and OS, that they didn't have enough
> consideration for how those design decisions would cost Bart several
> hours of extra effort?

If you wanted to use one of my libraries from your non-C language, then
the information you'd be working with might look like the above.

What /I/ have to work with looks more like the following, which I
tracked down to in _mingw_stat64.h for gcc/tdm; this gives you FIVE
struct stats to choose from:

#ifndef _STAT_DEFINED

#ifdef _USE_32BIT_TIME_T
#define _fstat _fstat32
#define _stat _stat32
#define _wstat _wstat32
#ifdef _UCRT
#define _fstati64 _fstat32i64
#define _stati64 _stat32i64
#define _wstati64 _wstat32i64
#else
#define _fstat32i64 _fstati64
#define _stat32i64 _stati64
#define _wstat32i64 _wstati64
#endif
#else
#define _fstat _fstat64i32
#define _fstati64 _fstat64
#define _stat _stat64i32
#define _stati64 _stat64
#define _wstat _wstat64i32
#define _wstati64 _wstat64
#endif /* _USE_32BIT_TIME_T */

struct _stat32 {
_dev_t st_dev;
_ino_t st_ino;
unsigned short st_mode;
short st_nlink;
short st_uid;
short st_gid;
_dev_t st_rdev;
_off_t st_size;
__time32_t st_atime;
__time32_t st_mtime;
__time32_t st_ctime;
};

#ifndef NO_OLDNAMES
struct stat {
_dev_t st_dev;
_ino_t st_ino;
unsigned short st_mode;
short st_nlink;
short st_uid;
short st_gid;
_dev_t st_rdev;
_off_t st_size;
time_t st_atime;
time_t st_mtime;
time_t st_ctime;
};
#endif /* NO_OLDNAMES */

struct _stat32i64 {
_dev_t st_dev;
_ino_t st_ino;
unsigned short st_mode;
short st_nlink;
short st_uid;
short st_gid;
_dev_t st_rdev;
__MINGW_EXTENSION __int64 st_size;
__time32_t st_atime;
__time32_t st_mtime;
__time32_t st_ctime;
};

struct _stat64i32 {
_dev_t st_dev;
_ino_t st_ino;
unsigned short st_mode;
short st_nlink;
short st_uid;
short st_gid;
_dev_t st_rdev;
_off_t st_size;
__time64_t st_atime;
__time64_t st_mtime;
__time64_t st_ctime;
};

struct _stat64 {
_dev_t st_dev;
_ino_t st_ino;
unsigned short st_mode;
short st_nlink;
short st_uid;
short st_gid;
_dev_t st_rdev;
__MINGW_EXTENSION __int64 st_size;
__time64_t st_atime;
__time64_t st_mtime;
__time64_t st_ctime;
};

#define __stat64 _stat64
#define stat64 _stat64 /* for POSIX */
#define fstat64 _fstat64 /* for POSIX */

#define _STAT_DEFINED
#endif /* _STAT_DEFINED */

And this is is just /one/ possible source.

David Brown

unread,

Aug 21, 2021, 5:02:15 AM8/21/21

to

IIRC they now support any bit size up to 512, targetting things like
programmable logic uses.

> I think even David Brown must have heard of LLVM!

Yes - and their intermediate language is an /intermediate/ language! It
is not a /human/ language. Experts will look at it, just as experts
look at generated assembly, and compiler developers work with it. You
really are scraping the barrel in attempting to justify silly short
names. We understand that you like them. We understand you are not
alone. That doesn't make them a good idea. Now move along, and stop
wasting /your/ time and everyone else's. There's more than enough
/interesting/ things with James' language ideas to give more fruitful
discussion. And there are more than enough /good/ ideas from your
programming languages that could be of help and interest to him -
obsessing about how many people write "i32" detracts greatly from that.

David Brown

unread,

Aug 21, 2021, 5:20:25 AM8/21/21

to

On 20/08/2021 23:01, Bart wrote:
> On 20/08/2021 19:34, David Brown wrote:
>> On 20/08/2021 16:25, Bart wrote:
>
>>> So is likely to be one of i32 u32 i64 u64.
>>
>> It is going to be signed (since the POSIX standard says so). But it is
>> not necessarily the same type as int32_t or int64_t (we are talking
>> about C here - whether you like them or not, those are the standard
>> names) even if it is the same size. (Or have you forgotten how type
>> compatibility works in C?) It is quite possible for it to be an extra
>> implementation-specific type.
>>
>> Remember, there can be different integer types with the same size and
>> signedness which are not compatible with each other. I am quite sure
>> you don't like that, and will tell us that in /your/ language things are
>> different. I'm not sure I like it myself, though it is not something
>> that bothers me in my programming. (I'd prefer them either to be fully
>> compatible, or strong types that are not compatible at all without
>> explicit type changes.)
>>
>> To be fair, however, I agree that it is quite likely that off_t is the
>> same type as either int32_t or int64_t. I just can't imagine writing
>> decent code that would rely on that.
>
> There will be 100s of 1000s of functions that might be callable in
> innumerable libraries. Imagine if each decided to use custom types like
> off_t, which was all they exposed in the APIs, and required you to go
> hunting for the concrete types.

That's why we have headers. And that's why anyone sane uses tools to
generate wrappers and interfaces for foreign functions.

Custom named types are a /good/ thing for people writing the libraries,
and for people using the libraries. It really doesn't matter if they
are a bit of an inconvenience for the tiny, tiny percentage of
programmers who are working on wrappers or foreign function interfaces -
they are irrelevant in the grand scheme of things. (Just as in language
design, it is important how the language works for the programmer - the
effort needed for implementers is irrelevant.)

The world does not revolve around /you/ and the effort /you/ have to
make because you want to use your own personal language.

>
>
>> That may be true on your Windows system. It is not true in general
>> for C.
>
> I work on Windows.

Yes. It shows clearly in your myopic outlook.

>
>> And even more relevant, the standard headers that are part of the C
>> standard are all documented in the C standard (oddly enough) - and
>> "sys/types.h" is not there.
>
> So I can remove that completely from my C compiler, and every program
> will still work?

They are not part of your compiler!

Sometimes trying to explain things to you is like a scene from Monty
Python. You are so utterly bone-headed, so convinced in your own
misconceptions that /nothing/ gets through.

<https://www.youtube.com/watch?v=OdKa9bXVinE>

>
>
>>> There are actually 100s of such types.> I'd say the vast majority are
>>> merely one of the u8-u64 and i8-i64 types.
>>>
>>
>> There are quite a lot of types defined in the C standard - I doubt if it
>> as many as 100, but I haven't counted them.
>
> There are about 60 types ending with _t in N1570.pdf. Plus loads of
> associated macros.
>
> (clock_t and time_t are floating point types
>
> They're usually integer types.

Sorry, yes - I misread. I don't use these in my own coding.

>
>> The same applies to other OS's, and other portable libraries. The only
>> exception is in the DOS and Windows world, where programmers have a long
>> tradition of writing non-portable shite code that makes unwarranted
>> assumptions about how /they/ don't need to follow rules about type names
>> or other portability issues.
>
> All the types used by WinAPI are listed here:
>
> https://docs.microsoft.com/en-us/windows/win32/winprog/windows-data-types
>
> This actually provides all the info needed for someone creating bindings
> to work across an FFI.

These are OS-specific types, and types for libraries provided on the OS
or by the vendor. They are nothing to do with the C language.

>
> Unlike typical C headers where you have to go down endless rabbit holes
> trying to find out all this stuff, and then you find it's all conditional.

It is amazing how you seem to have such endless problems, yet other
people manage fine. Did you ever wonder where the problem /really/ lies?

Bart

unread,

Aug 21, 2021, 6:57:48 AM8/21/21

to

On 21/08/2021 10:02, David Brown wrote:
> On 20/08/2021 20:39, Bart wrote:

>> I think even David Brown must have heard of LLVM!
>
>
> Yes - and their intermediate language is an /intermediate/ language! It
> is not a /human/ language.

It IS human readable, otherwise they'd stick with binary format:

"The llvm-dis command is the LLVM disassembler. It takes an LLVM bitcode
file and converts it into human-readable LLVM assembly language."

That assembly language has file extension .ll, which can also be
directly generated by clang, which is where I got my example. Programs
generating LLVM can choose to directly write .ll source files.

Why didn't they just use int32_t instead of i32? I don't know. Some more
information here:

https://llvm.org/docs/LangRef.html#type-system

> Experts will look at it, just as experts
> look at generated assembly, and compiler developers work with it.

My use of i32 etc was also for use in generated code (specifically, for
automatically generated exports files [like header files] in an early
module system).

But in order to be able to read in those exports files, the language
needs to allow i32 etc in its syntax.

I don't really know what point you're trying to make. You don't like
forms like i32? We get that.

I'm not too keen either, that's why I normally use longer forms in my
own code. But for talking about types, I happen to think that 'u64' is
more universal and less language-specific than 'uint64'.

You think most people will have no idea what they mean? I think you're
wrong. I've given examples of Odin, Zig, Rust, LLVM which have CHOSEN to
use them. You're trying to make out that these are obscure? Not the last
two!

> There's more than enough
> /interesting/ things with James' language ideas to give more fruitful
> discussion.

One of the topics is the syntax for size-specific integer types.

Here are the only type designations for the source format of my
/intermediate/ language:

void

u8
u16
u32
u64
u128

i8
i16
i32
i64
i128

r32
r64

block:N

My regular languages also allow these (except Block) plus the longer and
colloquial forms. They also have extra types such as pointers and
records and arrays, but in the IL, they reduce to the above, and any
differences are taken care of with IL operations.

Bits and bitfields are supported via operations too, not types.

(I wonder what type denotations you would choose for an intermediate
language of /your/ design?)

David Brown

unread,

Aug 21, 2021, 8:55:38 AM8/21/21

to

On 21/08/2021 12:57, Bart wrote:
> On 21/08/2021 10:02, David Brown wrote:
>> On 20/08/2021 20:39, Bart wrote:
>
>>> I think even David Brown must have heard of LLVM!
>>
>>
>> Yes - and their intermediate language is an /intermediate/ language! It
>> is not a /human/ language.
>
> It IS human readable, otherwise they'd stick with binary format:
>

Do you hear that whooshing noise? It's the sound of the point flying
/way/ over your head. I believe that is because you keep ducking
intentionally, not because you could not understand it.

>
> (I wonder what type denotations you would choose for an intermediate
> language of /your/ design?)

I would make the choice almost totally independently of the choice of
type names for a programming language. Programming languages and
intermediary languages serve different purposes - you would expect
things to look different.

Bart

unread,

Aug 21, 2021, 9:08:51 AM8/21/21

to

I no longer support targets other than 64-bit ones, but when I did also
support 32-bit targets (mainly via a C intermediate step), then I had
these 4 conditional types:

intm signed int dependent on machine word size
wordm unsigned int
intp signed int dependent on machine pointer size
wordp unsigned int

I would have had only 2, but I've had experience of 32-bit pointers
within a 64-bit system, so thought I'd keep them separate.

These were typically used in FFIs where an API might specify size_t
(that I'd write wordm or wordp) or other conditional types.

However there is no direct equivalent for C's 'long', which is
apparently 64 bits on 64-bit Linux, and 32 bits everywhere else (on my
x86 and arm targets).

Such a type doesn't really belong in a public API. The only reason it
still exists, is because it still exists.

But you can see that my solution to the problem of non-concrete types is
a neat set of four types, which could probably be reduced to just two.

However types that need to represent file offsets for example, would
probably still need to be signed 64 bits; a 32-bit machine will not
necessarily be full of small files!

Bart

unread,

Aug 21, 2021, 9:23:50 AM8/21/21

to

On 21/08/2021 13:55, David Brown wrote:
> On 21/08/2021 12:57, Bart wrote:
>> On 21/08/2021 10:02, David Brown wrote:
>>> On 20/08/2021 20:39, Bart wrote:
>>
>>>> I think even David Brown must have heard of LLVM!
>>>
>>>
>>> Yes - and their intermediate language is an /intermediate/ language! It
>>> is not a /human/ language.
>>
>> It IS human readable, otherwise they'd stick with binary format:
>>
>
> Do you hear that whooshing noise? It's the sound of the point flying
> /way/ over your head.

A point I still don't get, probably because you don't have one. You just
seem to have an irrational hatred of anything I've ever devised or proposed.

When I point other how the same ideas are used elsewhere, you come up
with excuse after excuse for why they don't count.

>> (I wonder what type denotations you would choose for an intermediate
>> language of /your/ design?)
>
> I would make the choice almost totally independently of the choice of
> type names for a programming language.

So there would be no correlation at all? OK ...

> Programming languages and
> intermediary languages serve different purposes - you would expect
> things to look different.

If they have a text format, then they all count as programming languages
(otherwise Lisp wouldn't be a language).

And actually, any programming language can be used as an intermediate
language too:

/* Generated C */
#include "pclhdr.h"

extern i32 puts(u64);
extern i32 printf(u64,...);
static i64 f_fib(i64 f_fib_n);
void f_start(void);

static i64 f_fib(i64 f_fib_n) {
u64 S1, S2, S3, S4, S5, S6;
S1 = f_fib_n;
S2 = 3;
if ((i64)S1 >= (i64)S2) goto L2;
S1 = 1;
goto L3;
L2:
S1 = f_fib_n;
S2 = 1;
*(i64*)&S1 -= (i64)S2;
S1 = f_fib(S1);
S2 = f_fib_n;
S3 = 2;
*(i64*)&S2 -= (i64)S3;
S2 = f_fib(S2);
*(i64*)&S1 += (i64)S2;
L3:
goto L4;
L4:
return S1;
}

void f_start(void) {
u64 S1, S2, S3, S4, S5, S6;
i64 f_start_i;
L6:
S1 = 1;
f_start_i = S1;
L7:
S1 = f_start_i;
S1 = f_fib(S1);
S2 = f_start_i;
S3 = (u64)"%d %d\n";
printf(S3,S2,S1);
L8:
f_start_i += 1;
if(f_start_i <= 40) goto L7;
L9:
S1 = (u64)"";
puts(S1);
L10:
S1 = 0;
exit(S1);
return;
}

void start(void) {f_start();}

James Harris

unread,

Aug 21, 2021, 9:39:47 AM8/21/21

to

On 20/08/2021 19:53, Dmitry A. Kazakov wrote:
> On 2021-08-20 19:19, James Harris wrote:
>
>> Again, if that's a question for me rather than for Bart I have a plan
>> to define all types (possibly except boolean) in the standard library
>> rather than making them part of the language.
>
> Your language lacks necessary abstraction level for that. Higher level
> languages have type-algebraic operations to produce types.

That made me laugh: I doubt you know my language well enough to make the
comment you made; the language /does have/ algebraic construction of
types; and it has that based on something you said many years ago.

Whether the language has enough of those things, I don't know, but I
doubt you do, either.

> Like "long"
> in Algol 68, or type range <>, type mod <> in Ada. The syntax of such
> operations is less relevant, if you like unreadable languages it could
> well be
>
> u(<number>)
>
> where <number> is a static positive-valued expression. Note an
> *expression*. If you have that in the language you do not need any
> library,

That's not correct, Dmitry. The part of my design which is relevant here
is that the language defines the form of an expression and libraries
define the semantics - essentially because they carry out the type
checking and construct the IR. It's a wonderful system. I think(!) It
will be a fair bit of work to get it to work. I think it will be worth
it. But time will tell.

> if you don't you will have to write your library in C.
>

No, there's nothing to stop the libraries being written in my own
language. Don't forget that I have already bootstrapped it and that it
includes operations which are at at least as low a level as those of C.
So there's no reason I cannot use it to do what's necessary, even though
the bootstrapped version is a little primitive and awkward to work with.

--
James Harris

Dmitry A. Kazakov

unread,

Aug 21, 2021, 9:52:21 AM8/21/21

to

On 2021-08-21 15:39, James Harris wrote:
> On 20/08/2021 19:53, Dmitry A. Kazakov wrote:
>> On 2021-08-20 19:19, James Harris wrote:
>>
>>> Again, if that's a question for me rather than for Bart I have a plan
>>> to define all types (possibly except boolean) in the standard library
>>> rather than making them part of the language.
>>
>> Your language lacks necessary abstraction level for that. Higher level
>> languages have type-algebraic operations to produce types.
>
> That made me laugh: I doubt you know my language well enough to make the
> comment you made; the language /does have/ algebraic construction of
> types; and it has that based on something you said many years ago.

Yes, you claimed that before.

>> Like "long" in Algol 68, or type range <>, type mod <> in Ada. The
>> syntax of such operations is less relevant, if you like unreadable
>> languages it could well be
>>
>> u(<number>)
>>
>> where <number> is a static positive-valued expression. Note an
>> *expression*. If you have that in the language you do not need any
>> library,
>
> That's not correct, Dmitry. The part of my design which is relevant here
> is that the language defines the form of an expression and libraries
> define the semantics - essentially because they carry out the type
> checking and construct the IR.

A collection of nonsensical words [*]

Either a type like modular 64-bit integer can be declared using the
language means or it cannot and must be built-in. No need to mount words
about that.

-------------------
https://en.wikipedia.org/wiki/Semantics_(computer_science)

James Harris

unread,

Aug 21, 2021, 10:29:05 AM8/21/21

to

On 21/08/2021 14:52, Dmitry A. Kazakov wrote:
> On 2021-08-21 15:39, James Harris wrote:
>> On 20/08/2021 19:53, Dmitry A. Kazakov wrote:
>>> On 2021-08-20 19:19, James Harris wrote:

...

>> That's not correct, Dmitry. The part of my design which is relevant
>> here is that the language defines the form of an expression and
>> libraries define the semantics - essentially because they carry out
>> the type checking and construct the IR.
>
> A collection of nonsensical words [*]

Where [*] means "is to follow", presumably, given what follows. ;-)

>
> Either a type like modular 64-bit integer can be declared using the
> language means or it cannot and must be built-in. No need to mount words
> about that.
>
> -------------------
> https://en.wikipedia.org/wiki/Semantics_(computer_science)
>

--

James Harris

David Brown

unread,

Aug 21, 2021, 2:18:04 PM8/21/21

to

On 21/08/2021 15:23, Bart wrote:
> On 21/08/2021 13:55, David Brown wrote:
>> On 21/08/2021 12:57, Bart wrote:
>>> On 21/08/2021 10:02, David Brown wrote:
>>>> On 20/08/2021 20:39, Bart wrote:
>>>
>>>>> I think even David Brown must have heard of LLVM!
>>>>
>>>>
>>>> Yes - and their intermediate language is an /intermediate/
>>>> language! It
>>>> is not a /human/ language.
>>>
>>> It IS human readable, otherwise they'd stick with binary format:
>>>
>>
>> Do you hear that whooshing noise? It's the sound of the point flying
>> /way/ over your head.
>
> A point I still don't get, probably because you don't have one. You just
> seem to have an irrational hatred of anything I've ever devised or
> proposed.

I don't have an irrational hatred of anything here. I have a rational
dislike. And I have regularly noted things about your language(s) that
I /do/ like. However, you always fixate on the things that I strongly
believe are poor design decisions in your languages. This is the place
where you should be espousing other ideas and aspects of your languages,
for giving inspiration and ideas - not beating your dead donkeys yet again.

>
> When I point other how the same ideas are used elsewhere, you come up
> with excuse after excuse for why they don't count.
>
>
>>> (I wonder what type denotations you would choose for an intermediate
>>> language of /your/ design?)
>>
>> I would make the choice almost totally independently of the choice of
>> type names for a programming language.
>
> So there would be no correlation at all? OK ...
>

Exactly. They are for different tasks, used by different people,
dealing with different things at a different level of programming. So
they are different.

>> Programming languages and
>> intermediary languages serve different purposes - you would expect
>> things to look different.
>
> If they have a text format, then they all count as programming languages
> (otherwise Lisp wouldn't be a language).
>
> And actually, any programming language can be used as an intermediate
> language too:
>
> /* Generated C */

<snip>

Yes, indeed - and when you are using C as an intermediate language it is
/fine/ to write it in a way that would be completely unacceptable as
source code. Short names, lots of "goto", no structure, repetitive code
that would be highly error-prone to write by hand, etc. - that's all
fine for an intermediate language. But you wouldn't want it in a source
language.

David Brown

unread,

Aug 21, 2021, 2:24:51 PM8/21/21

to

"long" is 64-bit on every 64-bit architecture except Windows. It is
/always/ Windows that is the odd one out in these things, not Linux.

> Such a type doesn't really belong in a public API. The only reason it
> still exists, is because it still exists.

I agree. Types like "off_t", "gid_t", etc., are much better because
they say what they mean, and can be adapted to suit. (And if fixed
sizes are what are needed, then fixed sizes should be used.)

>
> But you can see that my solution to the problem of non-concrete types is
> a neat set of four types, which could probably be reduced to just two.
>
> However types that need to represent file offsets for example, would
> probably still need to be signed 64 bits; a 32-bit machine will not
> necessarily be full of small files!
>

Equally, it will not necessarily have any need of /large/ files - and
not want the waste of using 64-bit types when 32-bit will do.

And remember that while /you/ have the choice of what systems you want
to support with your languages, C supports almost /everything/. Some
programs written in C are written to support a wide range of systems -
including systems that come from an era where 32-bit was more than
enough for file sizes because disk size was measured in hundreds of
megabytes. Use "off_t" instead of "i64" (or whatever you want to call
it), and the program works on systems written in the 90's as well as
systems written in 2020 - whether the program was written in 1990 or in
2020.

Bart

unread,

Aug 21, 2021, 2:57:32 PM8/21/21

to

On 21/08/2021 19:24, David Brown wrote:
> On 21/08/2021 15:08, Bart wrote:

>> However there is no direct equivalent for C's 'long', which is
>> apparently 64 bits on 64-bit Linux, and 32 bits everywhere else (on my
>> x86 and arm targets).
>>
>
> "long" is 64-bit on every 64-bit architecture except Windows. It is
> /always/ Windows that is the odd one out in these things, not Linux.

You can equally say that "long" is 32-bit on every 32/64-bit
architecture except Linux.

And that "long long" is 64-bit on every 32/64-bit architecture.

From that it follows that "long long" is twice the width of "long" on
every architecture, 32 /and/ 64-bit, except Linux.

It seems to me that Linux is the odd one out!

Actually neither get it right. Probably 'int' should have been 32 bits
on every 32/64-bit system, and 'long' should have been 64 bits on those
same systems.

"long long" would either not be necessary, or could be reserved for 128
bits, and you have int/long types that are guaranteed sizes on those
architectures. (And, in C, could be confident about continuing to use
the standard printf formats and literal suffixes for those types.)

At the moment you have programs where developers have either used "long"
interchangeably with "int" on 32-bit systems with little thought, that
will be inefficient or go subtly wrong on Linux64.

Or developers who have assumed 64-bit 'long' on Linux64 without
realising that on Linux32 or on Windows, it will be 32-bit and their
programs might not work.

The actual situation for me is that I never use 'long' because it is too
badly defined; I need types that are a known size across platforms.

>> However types that need to represent file offsets for example, would
>> probably still need to be signed 64 bits; a 32-bit machine will not
>> necessarily be full of small files!
>>
>
> Equally, it will not necessarily have any need of /large/ files - and
> not want the waste of using 64-bit types when 32-bit will do.
>
> And remember that while /you/ have the choice of what systems you want
> to support with your languages, C supports almost /everything/. Some
> programs written in C are written to support a wide range of systems -
> including systems that come from an era where 32-bit was more than
> enough for file sizes because disk size was measured in hundreds of
> megabytes. Use "off_t" instead of "i64" (or whatever you want to call
> it), and the program works on systems written in the 90's as well as
> systems written in 2020 - whether the program was written in 1990 or in
> 2020.

Except that in 2020 it is likely to be used for much bigger data than in
1990.

In any case, there is always a choice: choose one of 32 bits or 64 bits,
or provide both functions or both sets of data structure. But tell
people what the choice was.

If you need to provide one choice, don't hide it behind an opaque
typedef and then be cagey about what it actually is. APIs need to be
open and transparent, just like a datasheet.

James Harris

unread,

Aug 21, 2021, 3:31:31 PM8/21/21

to

On 20/08/2021 14:49, David Brown wrote:
> On 20/08/2021 13:18, Bart wrote:
>> On 20/08/2021 08:05, David Brown wrote:
>>> On 19/08/2021 17:28, Bart wrote:

...

>>>> And then you have those off_t types mentioned below.
>>>
>>> "off_t" is not part of C. But don't let that get in the way of another
>>> rant.
>>
>> So what is it a part of? Since it came up extensively in a recent clc
>> thread.

>
> It is part of POSIX. It is used to hold offsets within files, for some
> kinds of file functions.

Isn't it odd to have the maximum size of a file decided by a language's
implementation (or an OS's implementation)? An 8Gbyte file is an 8Gbyte
file. It doesn't magically change its size if it is transferred to a
different machine.

Surely a file should be as long as is required /for its contents/ and a
pointer to offsets should be as long as required by the medium on which
the file is being stored.

I remember thinking the same about the standard fseek function. Its
signature is

int fseek(FILE *stream, long offset, int whence);

It seems absurd to have the offset as a long - because a long on a
particular implementation may not be long enough. Similar can be said of
lseek which uses off_t.

Put another way, the C/Posix or whatever concept of off_t seems to me to
be broken.

--
James Harris

Dmitry A. Kazakov

unread,

Aug 21, 2021, 3:59:11 PM8/21/21

to

On 2021-08-21 21:31, James Harris wrote:

> Isn't it odd to have the maximum size of a file decided by a language's
> implementation (or an OS's implementation)?

And foremost by the filesystem implementation.

> An 8Gbyte file is an 8Gbyte
> file. It doesn't magically change its size if it is transferred to a
> different machine.

It certainly does due to the difference in filesystems and file formats
(e.g. record-oriented, journaling, conpressed and so on and so forth)

> Put another way, the C/Posix or whatever concept of off_t seems to me to
> be broken.

POSIX was always broken. The very concept of offset is, because it is
not implementable on certain filesystems/files, like tapes and pipes.

Before the Dark Age of computing, filesystems had various types of
files. If you needed random access, the role of which now is played by
the databases, you would have an interface of block number + offset
inside the block.

UNIX and DOS stroke down all the progress made to the time. POSIX just
parrots UNIX mess.

Ike Naar

unread,

Aug 21, 2021, 4:11:45 PM8/21/21

to

On 2021-08-21, Bart <b...@freeuk.com> wrote:
> On 21/08/2021 19:24, David Brown wrote:
>> "long" is 64-bit on every 64-bit architecture except Windows. It is
>> /always/ Windows that is the odd one out in these things, not Linux.
>
> You can equally say that "long" is 32-bit on every 32/64-bit
> architecture except Linux.

Hmm; long is 64 bits on the amd64 system on my desk,
runing NetBSD (not Windows, nor Linux).

James Harris

unread,

Aug 21, 2021, 4:23:48 PM8/21/21

to

Is Ada any better? It has direct IO with Set_Index but its range is up
to Count'Last ... which is also implementation defined.

https://www.adaic.org/resources/add_content/standards/05rm/html/RM-A-12-1.html

--
James Harris

James Harris

unread,

Aug 21, 2021, 4:27:45 PM8/21/21

to

Perhaps 'long' made sense in the 1970s or perhaps not but I am not sure
there is any justification for using it now. It is just too imprecise
and there are better designations.

--
James Harris

anti...@math.uni.wroc.pl

unread,

Aug 21, 2021, 5:16:26 PM8/21/21

to

Bart <b...@freeuk.com> wrote:
> On 21/08/2021 19:24, David Brown wrote:
> > On 21/08/2021 15:08, Bart wrote:
>
> >> However there is no direct equivalent for C's 'long', which is
> >> apparently 64 bits on 64-bit Linux, and 32 bits everywhere else (on my
> >> x86 and arm targets).
> >>
> >
> > "long" is 64-bit on every 64-bit architecture except Windows. It is
> > /always/ Windows that is the odd one out in these things, not Linux.
>
> You can equally say that "long" is 32-bit on every 32/64-bit
> architecture except Linux.

Do you claim that anyting 64-bit which is not Windows is Linux?

Normal rule is simple

bit size of long = max(32, bit size of machine word)

Unfortunetely, there is one odd case when this rule does
not work...

> And that "long long" is 64-bit on every 32/64-bit architecture.
>
> From that it follows that "long long" is twice the width of "long" on
> every architecture, 32 /and/ 64-bit, except Linux.

Normal rule here is simple too:

bit size of long long = min(64, 2*bit size of long)

In the future it may get more complicated, probably something
like

min(max(64, bit size of long), 2*bit size of long)

but fortunetely, in next few years such extention is pure theory,

> It seems to me that Linux is the odd one out!

Blame GNU folks for oddity. "long long" is GNU invention and
was supposed to be twice of size of "long". But when first
64-bit port of gcc appeared (Alpha port), GNU folks made
"long long" into 64-bit type. After bug report that
"long long" violates its spec GNU folks instead of implementing
128-bit type (which would be right thing to do) changed
the spec. Later "you know who" decieded that 64-bit
system would be desirable in the future and forced
amendment to C to allow them 32-bit "long" on 64-bit
systems.

--
Waldek Hebisch

Bart

unread,

Aug 21, 2021, 5:19:18 PM8/21/21

to

On Windows, there is fseek() and fseeki64().

On Linux, fseek/ftell are limited to long as you say. But the man-page
suggests using fgetpos/fsetpos which use 'fpos_t'; I'm guessing that's a
64-bit type! But I'm only guessing.

With fseeki64, not only is the i64 in the name a clue, but the offset
parameter has type __int64.

It is understandable when an OS API has to use the data-types of a
specific language, especially one which didn't have precise types until
C99, which are not going to get magically bigger as memory, disk and
file sizes get larger.

Using a choice of fseek/fseeki64 seems a reasonable solution, but
ultimately you want to to able to use just fseek in place in fseeki64.

An opportunity to do that comes with switching platform from 32 to 64
bits, as happens with Linux64 but not Win64 if using 'long', but then
you also want the same program that calls fseek() to work on any OS on
the same 5GB file.

My own approach would be to just switch everything, all APIs, to 64
bits, and recompile everything. But that won't work in the real world
where there so much existing code, headers, docs and binaries where
fseek take off_t, whatever that is.

Dmitry A. Kazakov

unread,

Aug 21, 2021, 5:31:17 PM8/21/21

to

On 2021-08-21 22:23, James Harris wrote:

> Is Ada any better?

Ada has separate integer types, so you would not care about the range
unless you convert values to another integer type, which, ideally, you
should never do. In practice, you are forced to convert types when you
use alien libraries and/or communication protocols.

The problem is that practically all library interfaces are low-level.
You cannot pass an integer value as an instance of the class, you must
convert it to a specific type and that includes a specific range.

[Templates/generics do not resolve the problem because they cannot live
in a library]

This is a real programming issue, not the silly things you are chatting
about with Bart.

> It has direct IO with Set_Index but its range is up
> to Count'Last ... which is also implementation defined.

It will fit the target platform and this is the block number, 9 bits
shorter. A closer case is Stream_Element_Offset, that is an equivalent
of fseek's offset. On a 32-bit system it could well be 32-bit. An
equivalent of fseek is Set_Index from Ada.Streams.Stream_IO.

Bart

unread,

Aug 21, 2021, 5:41:15 PM8/21/21

to

On 21/08/2021 22:16, anti...@math.uni.wroc.pl wrote:
> Bart <b...@freeuk.com> wrote:
>> On 21/08/2021 19:24, David Brown wrote:
>>> On 21/08/2021 15:08, Bart wrote:
>>
>>>> However there is no direct equivalent for C's 'long', which is
>>>> apparently 64 bits on 64-bit Linux, and 32 bits everywhere else (on my
>>>> x86 and arm targets).
>>>>
>>>
>>> "long" is 64-bit on every 64-bit architecture except Windows. It is
>>> /always/ Windows that is the odd one out in these things, not Linux.
>>
>> You can equally say that "long" is 32-bit on every 32/64-bit
>> architecture except Linux.
>
> Do you claim that anyting 64-bit which is not Windows is Linux?
>
> Normal rule is simple
>
> bit size of long = max(32, bit size of machine word)

That would be the machine word size of whatever processor is being
emulated. A 32-bit OS these days will likely be running on a 64-bit
processor.

> Unfortunetely, there is one odd case when this rule does
> not work...
>
>> And that "long long" is 64-bit on every 32/64-bit architecture.
>>
>> From that it follows that "long long" is twice the width of "long" on
>> every architecture, 32 /and/ 64-bit, except Linux.
>
> Normal rule here is simple too:
>
> bit size of long long = min(64, 2*bit size of long)

I'm not sure that's all that helpful. I either don't much care how wide
a type is (except it will suit the characteristics of the machine), or
will need it to be exact.

I don't need two or more vaguely sized types! Because if they can be the
same, then what's the point of using two versions?

Sometimes you may not care about the width, but will need a second type
that is exactly double that width, whatever it is. Choosing int/long, or
long/long long, will not guarantee that.

> In the future it may get more complicated, probably something
> like
>
> min(max(64, bit size of long), 2*bit size of long)

I've no idea what that might mean.

Fortunately the type systems of most languages now with fixed-width
types, have those sizes pinned down enough that you can rely on them.

The version of that in C (int64_t etc) is a bolted-on hack which is not
properly supported by other parts of the language.

size_t, off_t, etc belong in that zoo.

And, unfortunately, that language's type system is the one chosen by
most APIs!

> but fortunetely, in next few years such extention is pure theory,
>
>> It seems to me that Linux is the odd one out!
>
> Blame GNU folks for oddity. "long long" is GNU invention and
> was supposed to be twice of size of "long". But when first
> 64-bit port of gcc appeared (Alpha port), GNU folks made
> "long long" into 64-bit type. After bug report that
> "long long" violates its spec GNU folks instead of implementing
> 128-bit type (which would be right thing to do) changed
> the spec. Later "you know who" decieded that 64-bit
> system would be desirable in the future and forced
> amendment to C to allow them 32-bit "long" on 64-bit
> systems.

Actually, the solution is simple, at least for versions of C intended to
run on 32-bit/64-bit consumer machines. For example, define/redefine
these classic C types to be exactly:

char 8 bits (and make it a concrete signed or unsigned type,
preferably the latter)
short 16 bits
int 32 bits
long 64 bits
long long 128 bits

This in theory shouldn't be a problem since they would still be within
the rules for minimum widths and how they relate to each other.

In practice, most C programs would probably stop working.

anti...@math.uni.wroc.pl

unread,

Aug 21, 2021, 8:45:48 PM8/21/21

to

Bart <b...@freeuk.com> wrote:
> On 21/08/2021 22:16, anti...@math.uni.wroc.pl wrote:
> > Bart <b...@freeuk.com> wrote:
> >> On 21/08/2021 19:24, David Brown wrote:
> >>> On 21/08/2021 15:08, Bart wrote:
> >>
> >>>> However there is no direct equivalent for C's 'long', which is
> >>>> apparently 64 bits on 64-bit Linux, and 32 bits everywhere else (on my
> >>>> x86 and arm targets).
> >>>>
> >>>
> >>> "long" is 64-bit on every 64-bit architecture except Windows. It is
> >>> /always/ Windows that is the odd one out in these things, not Linux.
> >>
> >> You can equally say that "long" is 32-bit on every 32/64-bit
> >> architecture except Linux.
> >
> > Do you claim that anyting 64-bit which is not Windows is Linux?
> >
> > Normal rule is simple
> >
> > bit size of long = max(32, bit size of machine word)
>
>
> That would be the machine word size of whatever processor is being
> emulated. A 32-bit OS these days will likely be running on a 64-bit
> processor.

If there is emulation, then of course you take emulated size.
However, you ignore fact that C is used to program 8-bit
machines and that small processors (up to 32-bit) are large
majority of all processors. Your PC probably have single (multicore)
64-bit processor, but several small processors. I regularly
use 4 64-bit machines (2 of which are shared with others),
but have tens of 8-bit processors, several 16-bit and again
tens of small 32-bit ones (most having less than 64k of RAM).

Of course, you may have succesful language supporting only
64-bit machines. But it is unfair to consider support for
wide spectum of machines as disadvantage.

> > Unfortunetely, there is one odd case when this rule does
> > not work...
> >
> >> And that "long long" is 64-bit on every 32/64-bit architecture.
> >>
> >> From that it follows that "long long" is twice the width of "long" on
> >> every architecture, 32 /and/ 64-bit, except Linux.
> >
> > Normal rule here is simple too:
> >
> > bit size of long long = min(64, 2*bit size of long)
>
>
> I'm not sure that's all that helpful. I either don't much care how wide
> a type is (except it will suit the characteristics of the machine), or
> will need it to be exact.

ATM the formula is exact.

> I don't need two or more vaguely sized types! Because if they can be the
> same, then what's the point of using two versions?

Because thay may be different! Of course, simply having size
2 times as big as long would be better, but unfortunalty current
state was written by C prophets on stone tables and it is
practically impossible to change now.

> Sometimes you may not care about the width, but will need a second type
> that is exactly double that width, whatever it is. Choosing int/long, or
> long/long long, will not guarantee that.

Yes.

> > In the future it may get more complicated, probably something
> > like
> >
> > min(max(64, bit size of long), 2*bit size of long)
>
> I've no idea what that might mean.

This is simple formula. If world exists in 2050 and there were
expected progress, then "long" will be bigger than 64-bits
(more likely 128 bits) and you will need this formula.

> Fortunately the type systems of most languages now with fixed-width
> types, have those sizes pinned down enough that you can rely on them.
>
> The version of that in C (int64_t etc) is a bolted-on hack which is not
> properly supported by other parts of the language.
>
> size_t, off_t, etc belong in that zoo.

You still do not want to acknowledge that some folks want to write
programs for machines of different sizes. And that something
like C types is needed. Of course, _now_ one can easily give
simpler system than traditional C types. In particular
"short", "int", "long" and "long long" are redundant,
they exist mainly for backward compatiblity. However,
size_t and off_t are important part of solution. And if
one wants to cover machines that are essentially obsolete but
still find some use, then most of C types are needed.

> And, unfortunately, that language's type system is the one chosen by
> most APIs!

That is not an accident: C works for API-s. And if you want
different language, then do not expect anything significantly
simpler. I looked at OS API in Pascal, and it looked at least
as complicated as C API.

> > but fortunetely, in next few years such extention is pure theory,
> >
> >> It seems to me that Linux is the odd one out!
> >
> > Blame GNU folks for oddity. "long long" is GNU invention and
> > was supposed to be twice of size of "long". But when first
> > 64-bit port of gcc appeared (Alpha port), GNU folks made
> > "long long" into 64-bit type. After bug report that
> > "long long" violates its spec GNU folks instead of implementing
> > 128-bit type (which would be right thing to do) changed
> > the spec. Later "you know who" decieded that 64-bit
> > system would be desirable in the future and forced
> > amendment to C to allow them 32-bit "long" on 64-bit
> > systems.
>
> Actually, the solution is simple, at least for versions of C intended to
> run on 32-bit/64-bit consumer machines. For example, define/redefine
> these classic C types to be exactly:
>
> char 8 bits (and make it a concrete signed or unsigned type,
> preferably the latter)
> short 16 bits
> int 32 bits
> long 64 bits
> long long 128 bits
>
> This in theory shouldn't be a problem since they would still be within
> the rules for minimum widths and how they relate to each other.
>
> In practice, most C programs would probably stop working.

Yes, in practice such change will cause serious trouble. And it
solves almost nothing: if machine resonably supports above, then you
have standard fixed size types:

int8_t, unt16_t, int32_t, int64_t

(and unsigned variants). The only worthwile addition is 128 bit
type, but most 32-bit implementations seem to have no support
for 128 bit type. And some 64-bit implementations do not
support 128 bit type. Which C compiler that you use support such
type?

If standard types are not available, then it is unresonable to
expect that your convention for type sizes will be followed.

And we are still left with types like 'size_t'. Do you want
to mandate that argument to 'malloc' has 128 bits? Note that
on true 8-bitter there is no reason to use more than 16-bits
for array indices and similar.

Concerning fixed size types: resonable solution already appeared
in PL/I, IIRC "FIXED BINARY(n)" where n was a compile-time constant
gave you smallest type big enough for n-bit integers. Lisp
has '(signed byte n)' where (despite name) n gives number of bits.
For better control Pascal (and Ada) have range type. In Lisp
range type is '(integer low high)'. Arguably, having choice
between ranges and bit length gives possiblity of precise
specification when needed and simpler specification for
common cases. Anyway, once there is choice of sizes natural
solution is to use size as part of type, either as parameter
(nicer) or part of name.

More significant issue is if size is exact or serves as lower
bound. For range types situation is resonably clear: with
natural representation there will be some bit patterns that
do not correspond to valid values. With bit specifications
some folks may insist that 27-bit type produces wraparound
expected of 27-bit two-complement arithmetic. But one can
expect that need for wraparound is relatively rare, so
it is better that normal types either signal error on
overflow or behave in some unspecified way (say using 32-bit
arithmetic). There is still question of storage, one
possibility is that type is big enough to contains 27 bits,
but may be bigger. This is consistent with range types
and leads to efficient implementation. But some folks
insist on using exactly specified amount of storage.
IMO using unnatutal storage sizes may be supported,
but while it may save few bits, it may also slow down
access quite a lot. And it may complicate implementation.
In one case (GNU Pascal) compiler generated inline code,
and in general case this led to serious inefficiency.
In another case all odd accesses are handled by a
"bit extract" subroutine, this is inefficient, but
due to complexity of inline code in general case
overhead of subroutine call is not so big. Of course,
there are cases where more efficient acces is possible
(in particular C bitfields exclude worst problems),
and language/compiler designer must decide what will
be supported and what will be optimized (have efficient
implementation).

Anyway, traditional names for C integer types should be
considered "legacy", something which can not be
removed from C (at least in reasonble timeframe), but
no new language should copy traditional C names.

--
Waldek Hebisch

Bart

unread,

Aug 22, 2021, 7:24:42 AM8/22/21

to

On 22/08/2021 01:45, anti...@math.uni.wroc.pl wrote:
> Bart <b...@freeuk.com> wrote:

> Yes, in practice such change will cause serious trouble. And it
> solves almost nothing: if machine resonably supports above, then you
> have standard fixed size types:
>
> int8_t, unt16_t, int32_t, int64_t
>
> (and unsigned variants). The only worthwile addition is 128 bit
> type, but most 32-bit implementations seem to have no support
> for 128 bit type. And some 64-bit implementations do not
> support 128 bit type. Which C compiler that you use support such
> type?

I have 128-byte support (though not perfect) in my systems language.

C support for it is much poorer; even when available, there appears to
be no standard type denotation for it; it doesn't allow 128-bit
literals, and it is not supported by printf.

> If standard types are not available, then it is unresonable to
> expect that your convention for type sizes will be followed.
>
> And we are still left with types like 'size_t'. Do you want
> to mandate that argument to 'malloc' has 128 bits?

Why? Every machine these days has a 64-bit processor. 64 bits is plenty
for most things, but every processor has generally had software support
for a type that is double the word size. (Eg. 16-bit 8086 will have had
an emulated 32-bit type. I know there was a 32-bit float type because I
had to emulate one!)

128-bit support is no different, but you only use it as needed.

(One reason I have it is because the underlying infrastructure needed
for 128-bit types is also needed for non-arithmetic types that use two
machine words, notably slices.)

> Note that
> on true 8-bitter there is no reason to use more than 16-bits
> for array indices and similar.

I remember prototyping a Z80 board with bank-switched 256KB memory. If
supported by a language, when it would have needed more than 16 bits to
address.

You're making things a lot more complicated than they need be. APIs tend
not to use Ada-style types and ranges; they use simple types which are
1, 2, 4 or 8 bytes wide, and usually only 4 or 8 bytes on desktop-class
machines.

But C doesn't want to make it that simple. It always has to drag in the
fact that it might have to work on weird microcontrollers to keep things
medieval. However C-style APIs are used in desktop libraries such as SDL
or GTK where those microcontrollers are irrelevant.

Many APIs /love/ making up their own types on top of the C one, as
though they can't quite trust the C types to be stable, and I don't
blame them (I do the same!).

So SDL includes Uint32, khronos_uint32_t, uint32 and GLUint, all
intended to be u32. STB_IMAGE includes stbi__uint32, and also __m128i,
which I guess is an i128 type.

The regular C 'int' is still used everywhere; presumably this is going
to be i32.

What I'd love to see in API is, in order of preference:

* Concrete types like u64

* If some types need to be conditional [on platform], then a small,
fixed set with straightforward rules

* If the API wants to expose only type aliases, then it must also
define exactly what they are within the same resource, with one
level of indirection only (so not defined on top of another alias)

* Aggregate types whose layout doesn not depend on arcane C alignment
rules.

Ideally, the API info should be presented in, or available as, a format
that can be accessed by a simple tool rather than needing a full-blown C
compiler.

David Brown

unread,

Aug 22, 2021, 9:34:03 AM8/22/21

to

On 21/08/2021 20:57, Bart wrote:
> On 21/08/2021 19:24, David Brown wrote:
>> On 21/08/2021 15:08, Bart wrote:
>
>>> However there is no direct equivalent for C's 'long', which is
>>> apparently 64 bits on 64-bit Linux, and 32 bits everywhere else (on my
>>> x86 and arm targets).
>>>
>>
>> "long" is 64-bit on every 64-bit architecture except Windows. It is
>> /always/ Windows that is the odd one out in these things, not Linux.
>
> You can equally say that "long" is 32-bit on every 32/64-bit
> architecture except Linux.

You could say that, if you wanted to be wrong, to the extend that you
were actually making any sense (Linux is not an architecture). Perhaps
you are not aware that the computer world does not revolve around
Windows, and that Linux is not the only non-Windows system around.

Long always has to be at least 32 bits. So there is no option for its
size (on "normal" power-of-two systems) until you have 64-bit as a
native size. And then you find that on /every/ system, except Windows,
"long" is 64-bit. This applies to mainframes, supercomputers and
workstations from before Windows was even properly 32-bit and before
Linux was a twinkle in Linus' eye. It applies to embedded systems, *nix
systems, Macs, and anything else.

>
> And that "long long" is 64-bit on every 32/64-bit architecture.

You could say that if you wanted to be a bit pointless, given that by
definition it must be at least 64-bit, and there are no produced systems
with native sizes greater than 64-bit.

>
> From that it follows that "long long" is twice the width of "long" on
> every architecture, 32 /and/ 64-bit, except Linux.
>

Except it doesn't. It's bollocks.

> It seems to me that Linux is the odd one out!

It seems to the rest of the world that you don't know what you are
talking about.

>
> Actually neither get it right. Probably 'int' should have been 32 bits
> on every 32/64-bit system, and 'long' should have been 64 bits on those
> same systems.

On that point, I agree to some extent. I don't think the way C defines
its fundamental types is ideal, and I don't think C implementations have
always made the best choices.

The mistake, as I see it, is that the emphasis has been wrong. In
practice in the C world, you the implementation gives you type "int" as
your most convenient type, but its size depends on the implementation.
The way it /should/ work is that the /programmer/ should ask for a fast
type that works up to at least N, and the type provided by the
implementation should be able to vary to suit.

Now, in theory, C has this, for particular values of N. The signed
types signed char, short, int, long, long long are precisely such types
for N = 2 ^ 7 - 1, 2 ^15 - 1, 2 ^ 31 - 1 and 2 ^ 63 - 1. But in
practice, that's not how people use them.

>
> "long long" would either not be necessary, or could be reserved for 128
> bits, and you have int/long types that are guaranteed sizes on those
> architectures. (And, in C, could be confident about continuing to use
> the standard printf formats and literal suffixes for those types.)
>
> At the moment you have programs where developers have either used "long"
> interchangeably with "int" on 32-bit systems with little thought, that
> will be inefficient or go subtly wrong on Linux64.

Yes, programmers get things wrong, and make incorrect assumptions. You
do it all the time, no matter how often people tell you your assumptions
don't apply in a wider context.

This is much more of a problem in the Windows world than outside it, as
in the wider world programmers have been using a variety of different
systems and sizes all their lives.

Perhaps you don't understand /why/ every 64-bit system has 64-bit long,
except Windows? So here is a little history lesson.

In the world at large, 64-bit has been mainstream since the 1990's, with
supercomputers using 64-bit long before that. It preceded C99, and
certainly preceded the large-scale adoption of C99 (which was quite
slow). Programmers needed a 64-bit type - it had to be "long" because
there was no other standard type available (excluding weird Cray's,
where everything from "short" upwards was 64-bit).

And programmers for non-Windows systems got used to this quickly, and
rarely made mistakes, mixups or assumptions about type sizes. *nix
programmers in particular have always been used to writing code that
would work on a variety of different *nix systems and a variety of
different processors, with different sizes and endianness. They were
not perfect, of course, but pretty good - code written for one cpu size
would usually work fine on the other size. One thing that did crop up
was using "long" when they needed an integer type for pointers, as
"uintptr_t" did not exist prior to C99. However, that still worked
across 32/64 bit changes.

Back in the DOS/Windows world, programmers were still using 16-bit while
others were at 32-bit or more. There people were used to "long" for
32-bit, and to programming with the assumption that their code would
never have to be portable to anything else, as 16-bit DOS/Windows would
last forever. People gradually moved to 32-bit Windows, but again
assumed that was the whole world. And MS didn't support C99 types at
all. So when you needed 32-bit types, you used "long" just as you
always had done, and you assumed it would always be 32-bit - it was
easier than trying to understand the myriad of windows-specific type
names, and you had nothing to gain from using them (unlike in the *nix
world) because your code would never be used on anything different.

So when MS finally started getting 64-bit Windows underway, they had a
choice. Should they make "long" 32-bit, so that code that assumed that
size would still work? Should they make it 64-bit, so that code that
assumed it matched a pointer would still work? They picked 32-bit,
different from everyone else before them (they also picked a different,
and less efficient, ABI for x86-64 than everyone else uses). Whichever
they picked, it would mean a great deal of C code written for Windows
could not be compiled as 64-bit because of invalid assumptions and poor
typing.

I suppose they picked the size that they thought would involve the least
work to fix. If you like conspiracy theories (and MS has a proven track
record of being willing to make life hard for third-party developers and
end-users if it also annoys their competition), you might note that
because they did not support C99, using 32-bit "long" means that C
programmers for 64-bit Windows have no standard way of dealing with
64-bit integers. Was this to further discourage people from programming
in C, writing code that might work on other systems, and move them over
to C# that was windows-only and controlled by MS ?

>
> Or developers who have assumed 64-bit 'long' on Linux64 without
> realising that on Linux32 or on Windows, it will be 32-bit and their
> programs might not work.>
> The actual situation for me is that I never use 'long' because it is too
> badly defined; I need types that are a known size across platforms.
>
>>> However types that need to represent file offsets for example, would
>>> probably still need to be signed 64 bits; a 32-bit machine will not
>>> necessarily be full of small files!
>>>
>>
>> Equally, it will not necessarily have any need of /large/ files - and
>> not want the waste of using 64-bit types when 32-bit will do.
>>
>> And remember that while /you/ have the choice of what systems you want
>> to support with your languages, C supports almost /everything/. Some
>> programs written in C are written to support a wide range of systems -
>> including systems that come from an era where 32-bit was more than
>> enough for file sizes because disk size was measured in hundreds of
>> megabytes. Use "off_t" instead of "i64" (or whatever you want to call
>> it), and the program works on systems written in the 90's as well as
>> systems written in 2020 - whether the program was written in 1990 or in
>> 2020.
>
> Except that in 2020 it is likely to be used for much bigger data than in
> 1990.
>

The vast majority of programs don't need big data. And while it is
obviously true that most PC code now written will mostly be used on
64-bit systems, /some/ people still write code that is usable on older
or more limited systems. For C, and for any libraries or headers
useable from C, that flexibility is vital. For /new/ languages, you can
of course decide that you don't care anything except 64-bit systems.

> In any case, there is always a choice: choose one of 32 bits or 64 bits,
> or provide both functions or both sets of data structure. But tell
> people what the choice was.

That's fine for some structures - but certainly not all. For some of
the ones you complain about, such as uid_t or pid_t, these have changed
size over time /without/ corresponding changes to the size of "long" or
other aspects of the hardware. And they don't all change at the same
time. There is no problem for programs that use the types correctly,
only a problem for people who want to oversimplify things to the point
that they no longer work.

>
> If you need to provide one choice, don't hide it behind an opaque
> typedef and then be cagey about what it actually is. APIs need to be
> open and transparent, just like a datasheet.
>

They are.

Have you ever looked at datasheets for electronics? They are full of
details that refer to other data. The entry that shows the "high"
voltage level on an input doesn't say "minimum 3.6 V", it says "minimum
0.7 VDD" - referring to the supply voltage. The timings are often not
in MHz or ns, but in units of the maximum frequency for the chip, or its
system frequency. And so on.

When you want an input to be "high", you don't connect the input to 3.3V
or to 2.5V. You connect it to VDD - and know it is correct for the
chip, regardless of the voltage being used.

Treat your types in C in the same way. The API says "type pid_t". Use
that type. Don't say "pid_t should be 32-bit, so I'll use my own u32".

David Brown

unread,

Aug 22, 2021, 9:50:39 AM8/22/21

to

On 21/08/2021 21:31, James Harris wrote:
> On 20/08/2021 14:49, David Brown wrote:
>> On 20/08/2021 13:18, Bart wrote:
>>> On 20/08/2021 08:05, David Brown wrote:
>>>> On 19/08/2021 17:28, Bart wrote:
>
>
> ...
>
>
>>>>> And then you have those off_t types mentioned below.
>>>>
>>>> "off_t" is not part of C. But don't let that get in the way of another
>>>> rant.
>>>
>>> So what is it a part of? Since it came up extensively in a recent clc
>>> thread.
>>
>> It is part of POSIX. It is used to hold offsets within files, for some
>> kinds of file functions.
>
> Isn't it odd to have the maximum size of a file decided by a language's
> implementation (or an OS's implementation)?

It is the OS's choice, not the language.

> An 8Gbyte file is an 8Gbyte
> file. It doesn't magically change its size if it is transferred to a
> different machine.

It's quite simple. Some OS's don't support files bigger than a
particular size. Look at this list:

<https://en.wikipedia.org/wiki/Comparison_of_file_systems>

If an OS does not support file systems that can handle files bigger than
4 GB, why should it make every file operation bigger and slower with a
pointlessly large "off_t" size? The move from 32-bit being big enough
for all (mainstream) file sizes to needing 64-bit for such types came
/before/ the move to 64-bit for common processor sizes, size_t size,
etc. That is one of the reasons it is so nice that these types are
independent of the language and the processor, but defined by for the OS
and system in use.

>
> Surely a file should be as long as is required /for its contents/ and a
> pointer to offsets should be as long as required by the medium on which
> the file is being stored.
>
> I remember thinking the same about the standard fseek function. Its
> signature is
>
> int fseek(FILE *stream, long offset, int whence);
>
> It seems absurd to have the offset as a long - because a long on a
> particular implementation may not be long enough. Similar can be said of
> lseek which uses off_t.

fseek (and ftell) were defined in the C standards at a time where "long
int" was the biggest integer type available. The fgetpos and fsetpos
functions use a transparent type - the standard says nothing about
"fpos_t" except that you can use it with these functions. Those let you
deal with bigger files. It's not perfect, but it is the best that could
be defined in the language.

POSIX functions using off_t solve this problem, precisely by having
"off_t" being an integer type big enough to handle any file supported by
the OS. If you use an old *nix system with 32-bit off_t, it won't
handle 8 GB files (or at least won't handle them well) - not because of
limitations of "off_t", but because of limitations of the OS.

>
> Put another way, the C/Posix or whatever concept of off_t seems to me to
> be broken.
>

It only seems that way because you don't understand it.

Bart

unread,

Aug 22, 2021, 10:36:26 AM8/22/21

to

On 22/08/2021 14:34, David Brown wrote:

> On 21/08/2021 20:57, Bart wrote:

>> If you need to provide one choice, don't hide it behind an opaque
>> typedef and then be cagey about what it actually is. APIs need to be
>> open and transparent, just like a datasheet.
>>
>
> They are.
>
> Have you ever looked at datasheets for electronics?

Yes of course. (Used to be my job for a few years.)

> They are full of
> details that refer to other data. The entry that shows the "high"
> voltage level on an input doesn't say "minimum 3.6 V", it says "minimum
> 0.7 VDD" - referring to the supply voltage.

I'm looking at a datasheet for Z80 that says +5V and GND on the pin
info. Not VCC and VSS, although those were common too.

But on the datasheet, it will say what the limits for VCC are.

The timings are often not
> in MHz or ns, but in units of the maximum frequency for the chip, or its
> system frequency. And so on.
>
> When you want an input to be "high", you don't connect the input to 3.3V
> or to 2.5V. You connect it to VDD - and know it is correct for the
> chip, regardless of the voltage being used.

I wouldn't connect it to VDD (usually +12V in my day), not with +5V
logic! And actually I /would/ connect it to +5V (maybe via a pullup),
since I will know perfectly well how my circuit is constructed. (I used
to make the PSUs too!)

(Devices with 3 power rails: VDD, VCC, VBB, were disappearing when I
came into this. Intel's 8080 used those three, but everything they did
was much more complicated than need be; it probably still is. Eg. 67/33%
duty cycles on clocks instead of 50/50. Zilog did everything right.)

> Treat your types in C in the same way. The API says "type pid_t". Use
> that type. Don't say "pid_t should be 32-bit, so I'll use my own u32".

I can't use that type from outside C; what do I write instead in another
language?

I think you've used C and C++ too much, where there is a huge industry
dedicated to life cushy for you, to have a valid perspective on what
it's like to not only /use/ a non-mainstream language, but to implement
them.

And actually, it's not that easy if you want to /implement/ C either!
You need to provide header files that define the functions that people
need to use; where does that info come from? Every other compiler, every
online resource, either is cagey or says something different. I've
already posted a header with 5 versions of struct stat.

A few years ago I posted an example (IIRC from header files from a Linux
system) where clock_t was defined on top of SIX levels of typedefs and
macros! An extreme example, but it shows what happens when it is
nobody's job to stay of top of this stuff, or to simplify it. They can
only add to the mess; they can't reduce it.

So to get back to that pid_t, for an API for /specific platform/, it
should say u32 or whatever is. It if uses a typedef like pid_t, then it
should list those typedefs in the same document or the same file
together with their concrete types.

(Will reply separately to other points.)

Dmitry A. Kazakov

unread,

Aug 22, 2021, 11:02:12 AM8/22/21

to

On 2021-08-22 16:36, Bart wrote:
> On 22/08/2021 14:34, David Brown wrote:

>> Treat your types in C in the same way. The API says "type pid_t". Use
>> that type. Don't say "pid_t should be 32-bit, so I'll use my own u32".
>
> I can't use that type from outside C; what do I write instead in another
> language?

You write the corresponding integer type. It is easy in a normal language.

Ada provides a standard library package for interfacing C with all C
types defined.

You can also easily declare a custom type compatible with C. There is a
language pragmas for doing this which handles all C's idiosyncrasies.

> I think you've used C and C++ too much, where there is a huge industry
> dedicated to life cushy for you, to have a valid perspective on what
> it's like to not only /use/ a non-mainstream language, but to implement
> them.

It is OK to have a niche language, a bad niche language is not.

> So to get back to that pid_t, for an API for /specific platform/, it
> should say u32 or whatever is.

Nope. It should declare a numeric type for a POSIX target in a
POSIX-specific library package. That is, provided your language does not
borrows C's stupidity of having structural type equivalence, and allows
type declarations of required machine representation, which need *not*
to be exposed in the visible part of the package!

Bart

unread,

Aug 22, 2021, 11:25:37 AM8/22/21

to

OK, so help me out here. I want to use getpid() for example. Online info
tells me its signature is:

pid_t getpid(void)

and that I need to use #include <unistd.h>.

However, in either of my languages, I have to provide a declaration in
my syntax that needs to look like this:

clang function getpid => XXX

XXX is the return type. It's necessary to know the type to properly be
able to call the function and deal with the result.

The question is, what do I write in place of XXX? My language only knows
about the basic types such as u8-u128 and i8-i128.

I can't use unistd.h, since that is C file (and also Unix). It is
suggested elsewhere that pid_t might be defined in sys/types.h. If I
look inside my mingw/gcc headers, its type.h defines 3 versions of
pid_t, as int, _int64_t, or as _pid_t.

Whatever, am I really expected to go trawling to internet or through
reams of system headers files to track down every piddly type? How will
I know which of multiple definitions to use?

Should I write test programs which include unistd.h and print out the
characterics of pid_t? What if different C compilers give different answers?

So, since you're the expert, how would you do it?

David Brown

unread,

Aug 22, 2021, 11:31:15 AM8/22/21

to

On 22/08/2021 16:36, Bart wrote:
> On 22/08/2021 14:34, David Brown wrote:
>> On 21/08/2021 20:57, Bart wrote:
>

>
>> Treat your types in C in the same way. The API says "type pid_t". Use
>> that type. Don't say "pid_t should be 32-bit, so I'll use my own u32".
>
> I can't use that type from outside C; what do I write instead in another
> language?

Your language, your problem. It's not the fault of C.

>
> I think you've used C and C++ too much, where there is a huge industry
> dedicated to life cushy for you, to have a valid perspective on what
> it's like to not only /use/ a non-mainstream language, but to implement
> them.

Ah, so the reason /you/ create so many problems for yourself is that /I/
use good tools that help me be a better and more productive programmer?

>
> And actually, it's not that easy if you want to /implement/ C either!
> You need to provide header files that define the functions that people
> need to use; where does that info come from? Every other compiler, every
> online resource, either is cagey or says something different. I've
> already posted a header with 5 versions of struct stat.
>

The information you need to define functions needed for a C
implementation are in the C standards.

> A few years ago I posted an example (IIRC from header files from a Linux
> system) where clock_t was defined on top of SIX levels of typedefs and
> macros! An extreme example, but it shows what happens when it is
> nobody's job to stay of top of this stuff, or to simplify it. They can
> only add to the mess; they can't reduce it.
>

That is how that C library implementation chooses to handle it. Another
library might do something completely different. Libraries that are
made for one simple fixed system can do so simply - libraries that are
designed to be used with a huge range of OS's, targets, compilers (not
just gcc), versions, C language standards, etc., need to be more
complicated. And it's likely that libraries like glibc are more
complicated than they need to be because they have evolved over time
through the work of large numbers of people as a collection of many
small changes. There's lots of advantages for development doing it that
way, and no one (but you) cares how many layers of typedefs you have in
the end as long as everything works correctly and efficiently.

> So to get back to that pid_t, for an API for /specific platform/, it
> should say u32 or whatever is. It if uses a typedef like pid_t, then it
> should list those typedefs in the same document or the same file
> together with their concrete types.
>

"pid_t" is the type used to hold process identifiers. That's all you
need in order to use it.

Dmitry A. Kazakov

unread,

Aug 22, 2021, 11:48:23 AM8/22/21

to

In Ada you write in the public part of the POSIX bindings:

type pid_t is private;
function getpid return pid_t;

Note that this is 100% platform-independent.

> The question is, what do I write in place of XXX? My language only knows
> about the basic types such as u8-u128 and i8-i128.

Make it better.

> Whatever, am I really expected to go trawling to internet or through
> reams of system headers files to track down every piddly type? How will
> I know which of multiple definitions to use?

You must know the target you are going to support.

> Should I write test programs which include unistd.h and print out the
> characterics of pid_t? What if different C compilers give different
> answers?

Yes, when designing bindings to alien libraries we usually write tests
checking sizes of C structures, because the effective layouts might be
quite unpredictable and people writing headers often do not know C well
and use various ugly hacks.

> So, since you're the expert, how would you do it?

There exist Ada binding generators that convert C header files to Ada
declarations. Though complex cases require manual attention.

In order to be able to write such tools you need a language that
supports type declarations. E.g. provided pid_t is a 24-bit signed
integer, the private part of the package would contain:

type pid_t is range -2**23..2**23-1 with Convention => C;

Bart

unread,

Aug 22, 2021, 11:52:57 AM8/22/21

to

On 22/08/2021 16:31, David Brown wrote:
> On 22/08/2021 16:36, Bart wrote:
>> On 22/08/2021 14:34, David Brown wrote:
>>> On 21/08/2021 20:57, Bart wrote:
>>
>
>>
>>> Treat your types in C in the same way. The API says "type pid_t". Use
>>> that type. Don't say "pid_t should be 32-bit, so I'll use my own u32".
>>
>> I can't use that type from outside C; what do I write instead in another
>> language?
>
> Your language, your problem. It's not the fault of C.
>
>>
>> I think you've used C and C++ too much, where there is a huge industry
>> dedicated to life cushy for you, to have a valid perspective on what
>> it's like to not only /use/ a non-mainstream language, but to implement
>> them.
>
> Ah, so the reason /you/ create so many problems for yourself is that /I/
> use good tools that help me be a better and more productive programmer?

So, everyone should just give up and use C? Or another mainstream
language where /someone else/ has done the hard work of creating library
bindings?

I said you didn't have the perspective or experience to appreciate the
problems people like me face.

You say that a type like pid_t aids portability. I say it only helps
portability between different C implementations, and does nothing for
other languages.

Universal types, minimising aliases and avoiding language-specific
features in APIs such as C macros, /would/ aid portability.

>>
>> And actually, it's not that easy if you want to /implement/ C either!
>> You need to provide header files that define the functions that people
>> need to use; where does that info come from? Every other compiler, every
>> online resource, either is cagey or says something different. I've
>> already posted a header with 5 versions of struct stat.
>>
>
> The information you need to define functions needed for a C
> implementation are in the C standards.

I'm using an external C runtime library and the problem is devising
suitable declarations for that library. Including functions like
fstat(), which for some reason are bundled as part of the C runtime even
though it's nothing to do with C.

>
> "pid_t" is the type used to hold process identifiers. That's all you
> need in order to use it.

From C.

But for the umpteenth time, NOT EVERYBODY WANTS TO USE C!

So, I pose the same question I did to Dmitry; how would you solve the
problem of calling getpid(), say, from a script language, say.

Assume the language is fine with any fixed-width integer type; you just
have to know the concrete type to use.

Assume the function is ready-to-use in a binary library such as lib.so.6,

Don't assume you have a C implementation. Don't assume you have a
collection of C header files.

The information you have available is this:

#include <unistd.h>
pid_t getpid(void);

Bart

unread,

Aug 22, 2021, 12:08:14 PM8/22/21

to

On 22/08/2021 16:48, Dmitry A. Kazakov wrote:

> On 2021-08-22 17:25, Bart wrote:

>> So, since you're the expert, how would you do it?
>
> There exist Ada binding generators that convert C header files to Ada
> declarations. Though complex cases require manual attention.

Yes, I had a similar tool. It usually involves implementing the best
part of a C compiler.

You will need to work from C header files, but they often contain
conditional code that depends on the detected C compiler, but you're not
using a normal compiler that is known to it, so what will it do?

They also very often contain macros that will usually expand to an
expression in C syntax; now you have to convert not just declarations,
but actual C code, but you don't want code, you want bindings in your
language.

(The GTK headers expose 3000 C macros.)

C APIs especially exposed as C headers are not actually very portable at
all.

> In order to be able to write such tools you need a language that
> supports type declarations. E.g. provided pid_t is a 24-bit signed
> integer, the private part of the package would contain:
>
> type pid_t is range -2**23..2**23-1 with Convention => C;
>

I'm not interested. At least, I just need to know how many bits that
type comprises (so I can pass it straight to another function that takes
pid_t, or compare the bit patterns for equivalence, test whether it is
zero or non-zero, etc etc).

I'm not planning to re-implement Ada.

Here's how I declare fopen:

clang function fopen(cstring, cstring)int64

fopen in C headers returns a FILE* type which is a pointer to some
internal data structure.

But all I need the return type for is:

* Seeing if is nil which means it has failed

* Passing it to subsequent functions like fread

So, I don't care about its exact type. However, because I know it's a
pointer, I know it will be 64 bits wide on my targets.

Dmitry A. Kazakov

unread,

Aug 22, 2021, 12:37:41 PM8/22/21

to

On 2021-08-22 18:08, Bart wrote:
> On 22/08/2021 16:48, Dmitry A. Kazakov wrote:

>> In order to be able to write such tools you need a language that
>> supports type declarations. E.g. provided pid_t is a 24-bit signed
>> integer, the private part of the package would contain:
>>
>> type pid_t is range -2**23..2**23-1 with Convention => C;
>
> I'm not interested.

Surely you are not interested in close-to-hardware programming.

> At least, I just need to know how many bits that
> type comprises (so I can pass it straight to another function that takes
> pid_t, or compare the bit patterns for equivalence, test whether it is
> zero or non-zero, etc etc).

Convention C is just an abbreviation of various settings like alignment,
size, bit order the C compiler takes for the target. In Ada you can
precisely describe all that, what neither you, nor C can. For C it is no
problem because all OSes are built around C. For you it makes your
language unusable for systems programming on most platforms. So why do
you care about pid_t again?

> I'm not planning to re-implement Ada.
>
> Here's how I declare fopen:
>
> clang function fopen(cstring, cstring)int64
>
> fopen in C headers returns a FILE* type which is a pointer to some
> internal data structure.

Not even able to express pointer to char or pointer to an opaque structure?

> So, I don't care about its exact type. However, because I know it's a
> pointer, I know it will be 64 bits wide on my targets.

Not supporting armhf?

James Harris

unread,

Aug 22, 2021, 1:49:06 PM8/22/21

to

On 21/08/2021 22:31, Dmitry A. Kazakov wrote:
> On 2021-08-21 22:23, James Harris wrote:
>
>> Is Ada any better?
>
> Ada has separate integer types, so you would not care about the range
> unless you convert values to another integer type, which, ideally, you
> should never do. In practice, you are forced to convert types when you
> use alien libraries and/or communication protocols.
>
> The problem is that practically all library interfaces are low-level.
> You cannot pass an integer value as an instance of the class, you must
> convert it to a specific type and that includes a specific range.
>
> [Templates/generics do not resolve the problem because they cannot live
> in a library]

I'll suggest a solution below. It requires nothing special.

>
> This is a real programming issue, not the silly things you are chatting
> about with Bart.

LOL! You'd be surprised at how many 'silly things' become important when
you design a language from scratch.

A programming language is type of machine. As with any machine it's one
thing to operate it but quite another to design it. With a car the
driver just gets behind the wheel and operates it by interacting with
its controls and indicators. That's easy. The car's designers, however,
will have had to make decisions on thousands of minutiae to make the car
operate as well as it does.

I'd suggest to you that it's only when you design a language - as Bart
and I have - that you get any idea of the myriad of decisions involved.

>
>> It has direct IO with Set_Index but its range is up to Count'Last ...
>> which is also implementation defined.
>
> It will fit the target platform and this is the block number, 9 bits
> shorter. A closer case is Stream_Element_Offset, that is an equivalent
> of fseek's offset. On a 32-bit system it could well be 32-bit. An
> equivalent of fseek is Set_Index from Ada.Streams.Stream_IO.
>

Here's my offering. I'll illustrate it with lseek but it could apply to
any other system. The form of lseek is

off_t lseek(int fd, off_t offset, int whence);

ISTM that that's immediately a failure because a file's offsets have
nothing to do with whatever a certain language implementation thinks
should be the size of off_t. And I really do mean /nothing/. A file's
maximum offset and a machine's words may both be integers but that's as
far as the similarity goes; the useful ranges of those integers are
completely unrelated.

Basically, a file's offset requires between 1 and N unsigned integers to
represent it so I'd suggest that a file offset should be maintained in
memory as a small array of such integers and pointed at.

If a program needs to update the offset it would know the array is
stored and how long it is. For convenience it could probably manipulate
the offset using library calls but that would not be essential.

Then, rather than having the offset as a parameter the seek call would
have a /pointer/ to the array which held the offset.

In such a way, different files would be able to have different maximum
sizes and, crucially, the program could never be presented with a file
which it could not access by offset.

I think that's pretty simple. Unless I've missed something that's it;
there is not and never was any need to have the plethora of seek calls
with their different-but-still-broken 'upgrades' from one another.

I think we have

fseek
fseeki64
fseeko
lseek
lseek64

and probably others. All broken!!!?

--
James Harris

James Harris

unread,

Aug 22, 2021, 1:57:47 PM8/22/21

to

On 21/08/2021 22:19, Bart wrote:

Perhaps you'd agree that the very presence of both of those shows that
the original idea was a failure. What's worse, someone noticed that the
original idea was too limiting so created something else that was still
limiting, only less so.

>
> On Linux, fseek/ftell are limited to long as you say. But the man-page
> suggests using fgetpos/fsetpos which use 'fpos_t'; I'm guessing that's a
> 64-bit type! But I'm only guessing.
>
> With fseeki64, not only is the i64 in the name a clue, but the offset
> parameter has type __int64.
>
> It is understandable when an OS API has to use the data-types of a
> specific language, especially one which didn't have precise types until
> C99, which are not going to get magically bigger as memory, disk and
> file sizes get larger.

I think that's the key. There are different limits involved. A
programming language will be aware of the size of memory because it will
have pointers. But that will be unrelated to a /file's/ potential max
size. The file max size will be outside the control of both the language
and even the OS.

By contrast, the storage system will know about the maximum possible
size of file because that will be part of how the medium was formatted.

I suggested a solution in a reply just now to Dmitry. (qv)

...

> My own approach would be to just switch everything, all APIs, to 64
> bits, and recompile everything. But that won't work in the real world
> where there so much existing code, headers, docs and binaries where
> fseek take off_t, whatever that is.

I'd suggest to you that that doesn't fix the problem. In fact, it makes
things worse. As well as being potentially too small for future files a
small system would have to work with and manipulate integers which, to
it, would be many words in size even though most files it would access
would not need such large limits.

--
James Harris

James Harris

unread,

Aug 22, 2021, 2:10:21 PM8/22/21

to

On 22/08/2021 14:50, David Brown wrote:
> On 21/08/2021 21:31, James Harris wrote:

...

>> Isn't it odd to have the maximum size of a file decided by a language's
>> implementation (or an OS's implementation)?
>
> It is the OS's choice, not the language.

It should be neither's!

The max size of a file should be determined by the formatting of the
filesystem on which it is stored. That's all.

>
>> An 8Gbyte file is an 8Gbyte
>> file. It doesn't magically change its size if it is transferred to a
>> different machine.
>
> It's quite simple. Some OS's don't support files bigger than a
> particular size. Look at this list:
>
> <https://en.wikipedia.org/wiki/Comparison_of_file_systems>

Those maxima are about file systems rather than OSes.

>
> If an OS does not support file systems that can handle files bigger than
> 4 GB, why should it make every file operation bigger and slower with a
> pointlessly large "off_t" size?

My solution wouldn't require that. Please see the details in the reply I
made to Dmitry about this a little while ago.

...

>> Put another way, the C/Posix or whatever concept of off_t seems to me to
>> be broken.
>>
>
> It only seems that way because you don't understand it.
>

Oh? What specifically do you think I don't understand?

--
James Harris

Dmitry A. Kazakov

unread,

Aug 22, 2021, 2:15:58 PM8/22/21

to

On 2021-08-22 19:49, James Harris wrote:

> I think that's pretty simple. Unless I've missed something that's it;
> there is not and never was any need to have the plethora of seek calls
> with their different-but-still-broken 'upgrades' from one another.

You missed everything. The language must produce close to optimal
machine code. Making a container out of each datatype makes no sense and
after all, then pointer should be a container too, and a pointer to
pointer etc. Not even able to pass a container by-value, in a register, huh?

Yes it is possible to use the most universal and the most inefficient
representation for all integers. The dynamic languages go this path.
Bart does the same using 64, 128, 256 how many bits next? This resolves
nothing and is utterly uninteresting and useless from the language
design point of view. Such discussions were appropriate in 70s. Silly to
have them a half-century later.

The real problem with the type system is to have a class of integers (or
some other scalar/small types) with the most efficient
machine-accelerated representation of each specific instance in the
class while being able to deal with each instance in a general way.

Bart

unread,

Aug 22, 2021, 2:43:13 PM8/22/21

to

A decent-sized /disk/ drive might be about 1TB or 2**40 bytes. 64 bits
allows you to represent sizes equivalent to the total of 16M such drives
(16 exabytes I think).

So a 64-bit signed type will work for a /single/ file of 16 exabytes.

I suggest the vast majority of files on consumer machines will be
somewhat smaller than that for some time yet.

And when you do have files that are larger, then 64 bits will likely be
too small for everything else too.

But at the moment a 32-bit value, even unsigned, is too small to even
represent the size of a DVD.

> small system would have to work with and manipulate integers which, to
> it, would be many words in size even though most files it would access
> would not need such large limits.

The ideal such function would use a type like Python's 'int' which
seamlessly grows to any size.

My own dynamic language would also work, and from the user's
perspective, would also represent any file size, but internally it would
switch between two types (i64 and bignum).

A static language is not as flexible. But an i64 type is an easy choice
which can cover any conceivable file size of any machine either of us is
going to get access too for the foreseeable future.

It will have (has to work) on any 32/64-bit machine. Working with
64-bits for /file/ operations (which are always a bit slow) on 32-bit
machines is not a significant overhead.

If someone needs to store large numbers of file sizes, which are known
never to exceed 2GB, then they can choose to use an i32 type to save
space. But the functions used to pass and return such sizes can still be
i64.

Your solution which I have involves arrays (so arbitrary precision?) I
think is overkill, and awkward to deal with in a lower level language.

As I suggested at the start, the user just wants to call fgetpos() or
some such function, and it will return a suitable sized integer which in
my 64-bit languages, they don't even need to think about.

Bart

unread,

Aug 22, 2021, 2:52:25 PM8/22/21

to

On 22/08/2021 17:37, Dmitry A. Kazakov wrote:
> On 2021-08-22 18:08, Bart wrote:
>> On 22/08/2021 16:48, Dmitry A. Kazakov wrote:
>
>>> In order to be able to write such tools you need a language that
>>> supports type declarations. E.g. provided pid_t is a 24-bit signed
>>> integer, the private part of the package would contain:
>>>
>>> type pid_t is range -2**23..2**23-1 with Convention => C;
>>
>> I'm not interested.
>
> Surely you are not interested in close-to-hardware programming.

I've only done close-to-hardware programming. (My first machine was
coded in binary machine code by poking wire terminals with an earth lead
to set each bit.)

>
>> At least, I just need to know how many bits that type comprises (so I
>> can pass it straight to another function that takes pid_t, or compare
>> the bit patterns for equivalence, test whether it is zero or non-zero,
>> etc etc).
>
> Convention C is just an abbreviation of various settings like alignment,
> size, bit order the C compiler takes for the target. In Ada you can
> precisely describe all that, what neither you, nor C can. For C it is no
> problem because all OSes are built around C. For you it makes your
> language unusable for systems programming on most platforms. So why do
> you care about pid_t again?

I need to use other people's libraries. Up to about 1995, I never had to
both (except for some DOS calls to read and write files). Now I do
because I no longer control 95% of the machine.

And those libraries are pain to use because too many people who write
them, especially those libraries associated with the OS, just assume
that everyone coding for the machine will use C.

>> I'm not planning to re-implement Ada.
>>
>> Here's how I declare fopen:
>>
>> clang function fopen(cstring, cstring)int64
>>
>> fopen in C headers returns a FILE* type which is a pointer to some
>> internal data structure.
>
> Not even able to express pointer to char or pointer to an opaque structure?

It doesn't matter. Sometimes I will use ref void or ref byte for this,
sometimes u64 because it's quick to type. The DLL FFI only cares about
passing some 64-bit non-float value, or some 32/64-bit float value,
since those use different registers.

>> So, I don't care about its exact type. However, because I know it's a
>> pointer, I know it will be 64 bits wide on my targets.
>
> Not supporting armhf?

I don't how that is relevant. If you intended to highlight some target
with 32-bit pointers on a 64-bit-data processors, then my first 64-bit
compiler did exactly that.

I later dropped it as it was too complicated. I don't whether I'll ever
encounter something like that, if I do, then I'll cross that bridge.

But right now I see no need.

David Brown

unread,

Aug 22, 2021, 5:44:29 PM8/22/21

to

On 22/08/2021 17:52, Bart wrote:
> On 22/08/2021 16:31, David Brown wrote:
>> On 22/08/2021 16:36, Bart wrote:
>>> On 22/08/2021 14:34, David Brown wrote:
>>>> On 21/08/2021 20:57, Bart wrote:
>>>
>>
>>>
>>>> Treat your types in C in the same way. The API says "type pid_t". Use
>>>> that type. Don't say "pid_t should be 32-bit, so I'll use my own u32".
>>>
>>> I can't use that type from outside C; what do I write instead in another
>>> language?
>>
>> Your language, your problem. It's not the fault of C.
>>
>>>
>>> I think you've used C and C++ too much, where there is a huge industry
>>> dedicated to life cushy for you, to have a valid perspective on what
>>> it's like to not only /use/ a non-mainstream language, but to implement
>>> them.
>>
>> Ah, so the reason /you/ create so many problems for yourself is that /I/
>> use good tools that help me be a better and more productive programmer?
>
> So, everyone should just give up and use C? Or another mainstream
> language where /someone else/ has done the hard work of creating library
> bindings?

You use what is best for the job in hand. And if you want to create a
new language or new way to handle bindings for a language, expect it to
be hard work - and stop blaming other people and other people's choices
or tools.

If /your/ language and tools are limited, that's /your/ problem. It is
not the job of the OS developers, or library writers, or C standards
authors, or anyone else to make life easy for /you/ !

As I see it, you have three choices. You can decide that your tools are
simple and limited, and can only be used for some purposes - at other
times, you need to switch to bigger tools and mainstream languages. You
can decide that you want these features in your tools and languages,
roll up your sleeves and do the work. Or you can decide that you have
some great ideas in your language and can make something good and
useful, and collect together a group of people to share the load.

Bizarrely, you have decided that the way forward is to claim you have a
great language that is better than everything else available, yet do
nothing to spread it or work with other people. You decide you want to
make use of other libraries, API's, etc., but feel it is more productive
to blame /everyone/ else and whine and complain about hard unfair it all
is, rather than actually solving the problems you have made for yourself.

>
> I said you didn't have the perspective or experience to appreciate the
> problems people like me face.

And you don't have the perspective or experience to appreciate that the
computing world is not designed to suit /you/, and it is not going to
change to suit /you/. It is made by other people, for other people.

David Brown

unread,

Aug 22, 2021, 5:46:12 PM8/22/21

to

On 22/08/2021 20:10, James Harris wrote:
> Those maxima are about file systems rather than OSes.
>

And what determines the file systems supported by an OS? The OS.

Bart

unread,

Aug 22, 2021, 6:44:53 PM8/22/21

to

On 22/08/2021 22:44, David Brown wrote:
> On 22/08/2021 17:52, Bart wrote:

>> So, everyone should just give up and use C? Or another mainstream
>> language where /someone else/ has done the hard work of creating library
>> bindings?
>
> You use what is best for the job in hand. And if you want to create a
> new language or new way to handle bindings for a language, expect it to
> be hard work - and stop blaming other people and other people's choices
> or tools.
>
> If /your/ language and tools are limited, that's /your/ problem. It is
> not the job of the OS developers, or library writers, or C standards
> authors, or anyone else to make life easy for /you/ !

But it seems to be the job of the whole world to make things easy for
YOU! Namely:

* Providing optimising compilers and analysers for C

* Providing any linkers, make systems and any other utility you could
need, all based around the needs of C

* Providing massive IDEs with extensive support for C including syntax
highlighting and code completion

* Providing any library you desire, with ready-made headers /for C/,
all you have to do is type #include "gtk.h" and you've got 10,000
functions ready to use with barely lifting a finger

* Providing symbolic debuggers that know C

* Providing whole operating systems built around C and with interfaces
that use C

* Devising ABIs that, magically, exactly match the requirements of C

C must be the most highly polished turd in the history of computing!

I don't believe that you understand my perspective at all, yet you
costantly berate me for pointing across my view, constantly put me down,
and constantly try to twist things around and try to make out it is all
my fault.

Now, I don't care about most of those things I listed. What I'm saying
is that if someone creates a library, or an interface to an OS, that is
supposed to be cross-language, that it should not be as C-centric as it is.

That is a perfectly reasonable request.

> As I see it, you have three choices. You can decide that your tools are
> simple and limited,

Sure; my tools supported a product that generated $1000000 a year of
business, so clearly very limited.

> And you don't have the perspective or experience to appreciate that the
> computing world is not designed to suit /you/, and it is not going to
> change to suit /you/. It is made by other people, for other people.

No it is isn't. It seems to be made largely for people writing C, or for
people writing mega-languages that have the considerable resources
required to interact with C. (Eg. Zig comes with a complete clang
implementation, a 100MB program to convert C headers into a form
suitable for Zig.)

I've got a great idea for a library. I want it to be usable from any
language, and to that end I will:

* Use only the bizarre, inconsistent set of data types of one quirky
language, where 'int' is 32 bits, or maybe 64, except on Tuesdays when
it's anyone's guess

* Hide most of those under aliases whose definitions are not included
in the interface provided

* Write half the interface in the form of macros specific to that
language, which expand to code - both executable and non-executable - in
that same language

* Make the rest conditional upon which compiler X, Y an Z for this
language that might be used, even though they utterly irrelevant to
users of any other language

Sounds a good plan, yes? Of course if anyone claims that it makes it
difficult to use from their favoured language, then I can just say, how
dare they be so selfish as to expect anyone else to consider their needs.

anti...@math.uni.wroc.pl

unread,

Aug 22, 2021, 8:51:07 PM8/22/21

to

Bart <b...@freeuk.com> wrote:
> On 22/08/2021 01:45, anti...@math.uni.wroc.pl wrote:
> > Bart <b...@freeuk.com> wrote:
>
> > Yes, in practice such change will cause serious trouble. And it
> > solves almost nothing: if machine resonably supports above, then you
> > have standard fixed size types:
> >
> > int8_t, unt16_t, int32_t, int64_t
> >
> > (and unsigned variants). The only worthwile addition is 128 bit
> > type, but most 32-bit implementations seem to have no support
> > for 128 bit type. And some 64-bit implementations do not
> > support 128 bit type. Which C compiler that you use support such
> > type?
>
> I have 128-byte support (though not perfect) in my systems language.
>
> C support for it is much poorer; even when available, there appears to
> be no standard type denotation for it; it doesn't allow 128-bit
> literals, and it is not supported by printf.
>
>
> > If standard types are not available, then it is unresonable to
> > expect that your convention for type sizes will be followed.
> >
> > And we are still left with types like 'size_t'. Do you want
> > to mandate that argument to 'malloc' has 128 bits?
>
> Why? Every machine these days has a 64-bit processor. 64 bits is plenty
> for most things,

You want to provide fixed size types, indenendent of machine, so need
to decide how big 'size_t' should be. C has almost 50 years, if
you propose change it should be good for next 50 years. Do you
say that after 50 years 64 bits will be enough?

> > Note that
> > on true 8-bitter there is no reason to use more than 16-bits
> > for array indices and similar.
>
> I remember prototyping a Z80 board with bank-switched 256KB memory. If
> supported by a language, when it would have needed more than 16 bits to
> address.

If 256KB memory is supported by language, then it is no longer
true 8-bitter...

Well, the tread was about integer types, nothing said about API-s.
And bit fields frequently appear in hardware interfaces, so you
can not fully ignore them.

> But C doesn't want to make it that simple. It always has to drag in the
> fact that it might have to work on weird microcontrollers to keep things
> medieval. However C-style APIs are used in desktop libraries such as SDL
> or GTK where those microcontrollers are irrelevant.

Medieval computer would be hybrid mechanical-biological system
probably working with roman numerals. While some folks consider
biological stuff as correct solution for the future, nobody
is going to copy other aspects.

And, while we are doing name calling: you try to trough out
experience of software engineering and go back to practice
of 1970. Rest of the world learned to appreciate types
and benefits of abstraction.

> Many APIs /love/ making up their own types on top of the C one, as
> though they can't quite trust the C types to be stable, and I don't
> blame them (I do the same!).
>
> So SDL includes Uint32, khronos_uint32_t, uint32 and GLUint, all
> intended to be u32. STB_IMAGE includes stbi__uint32, and also __m128i,
> which I guess is an i128 type.
>
> The regular C 'int' is still used everywhere; presumably this is going
> to be i32.
>
> What I'd love to see in API is, in order of preference:
>
> * Concrete types like u64
>
> * If some types need to be conditional [on platform], then a small,
> fixed set with straightforward rules
>
> * If the API wants to expose only type aliases, then it must also
> define exactly what they are within the same resource, with one
> level of indirection only (so not defined on top of another alias)
>
> * Aggregate types whose layout doesn not depend on arcane C alignment
> rules.
>
> Ideally, the API info should be presented in, or available as, a format
> that can be accessed by a simple tool rather than needing a full-blown C
> compiler.

If you want to use an API you have two choices:

1) understand the API and based on understanding create what
fits to your language
2) Ape (match exactly) C API

1) is frequently superior, but requires some effort. 2) has
disadvantage, but once you create proper tools can be done
mostly automatically. AFAICS you do not want to spent effort
on 1) and complan that automating 2) is too much work.

FYI, I needed to use GMP. First, GMP has hundreds of functions,
but I established that I need only 4 of them. They used 3 types,
and while sizes depend on platform, it was no problem to handle.

Another case (in another language): binding to GUI interface
mostly copies text from C headres. That is C typedefs, struct
declaration etc. Macros need translation by hand, but the
language in question have more powerful macros than C, so
limited number of macros is not a problem (and most of macros
were trivial). But this approach worked because language had
exensible syntax and developers implemented parser for C
declarations. And the whole thing has about 0.5M lines of
code and was developed by a team of order of 10 people in
several years. To get comfortable FFI you could probably
strip it to 100 kloc, but not much smaller (the rest was
libraries and extentions).

--
Waldek Hebisch

James Harris

unread,

Aug 23, 2021, 5:04:53 AM8/23/21

to

No, filesystem drivers are loadable. Each FS will have its own maximum
file size.

--
James Harris

James Harris

unread,

Aug 23, 2021, 5:25:01 AM8/23/21

to

On 22/08/2021 19:43, Bart wrote:
> On 22/08/2021 18:57, James Harris wrote:
>> On 21/08/2021 22:19, Bart wrote:
>
>>> My own approach would be to just switch everything, all APIs, to 64
>>> bits, and recompile everything. But that won't work in the real world
>>> where there so much existing code, headers, docs and binaries where
>>> fseek take off_t, whatever that is.
>>
>> I'd suggest to you that that doesn't fix the problem. In fact, it
>> makes things worse. As well as being potentially too small for future
>> files
>
> A decent-sized /disk/ drive might be about 1TB or 2**40 bytes. 64 bits
> allows you to represent sizes equivalent to the total of 16M such drives
> (16 exabytes I think).
>
> So a 64-bit signed type will work for a /single/ file of 16 exabytes.

Yes, I think that's the correct limit for 64-bit octet addressing
because 32 bits (2**32) is approx 4x1000^3 so 64 bits would be that
squared, i.e. 16x1000^6.

>
> I suggest the vast majority of files on consumer machines will be
> somewhat smaller than that for some time yet.

Having lived through the last few decades how many times have we had
such expectations but then they've been exceeded!

...

>> small system would have to work with and manipulate integers which, to
>> it, would be many words in size even though most files it would access
>> would not need such large limits.
>
> The ideal such function would use a type like Python's 'int' which
> seamlessly grows to any size.

Yes, that would certainly be better than anything fixed.

>
> My own dynamic language would also work, and from the user's
> perspective, would also represent any file size, but internally it would
> switch between two types (i64 and bignum).
>
> A static language is not as flexible. But an i64 type is an easy choice
> which can cover any conceivable file size of any machine either of us is
> going to get access too for the foreseeable future.

That cannot be the case. There is at least one FS, ZFS, which /already/
uses 128-bit offsets. This is happening now. In 2021. 64-bit file
offsets are ALREADY too small.

This is the trouble with engineers projecting what they think will be
'enough for years to come'. The real world doesn't work that way. Things
happen that the engineers didn't anticipate.

Either my array idea idea or your bigint idea would solve the problem.
By contrast, the move to a bigger N in 'N bits' doesn't really address
what needs to be addressed.

...

> Your solution which I have involves arrays (so arbitrary precision?) I
> think is overkill, and awkward to deal with in a lower level language.

Your bigint idea is basically the same as it would store the words of
the integer in an array.

--
James Harris

James Harris

unread,

Aug 23, 2021, 5:27:42 AM8/23/21

to

On 22/08/2021 19:15, Dmitry A. Kazakov wrote:
> On 2021-08-22 19:49, James Harris wrote:
>
>> I think that's pretty simple. Unless I've missed something that's it;
>> there is not and never was any need to have the plethora of seek calls
>> with their different-but-still-broken 'upgrades' from one another.
>
> You missed everything. The language must produce close to optimal
> machine code. Making a container out of each datatype makes no sense and
> after all, then pointer should be a container too, and a pointer to
> pointer etc. Not even able to pass a container by-value, in a register,
> huh?

I don't know what you are talking about but you don't seem to understand
the proposal.

>
> Yes it is possible to use the most universal and the most inefficient
> representation for all integers. The dynamic languages go this path.
> Bart does the same using 64, 128, 256 how many bits next? This resolves
> nothing and is utterly uninteresting and useless from the language
> design point of view. Such discussions were appropriate in 70s. Silly to
> have them a half-century later.
>
> The real problem with the type system is to have a class of integers (or
> some other scalar/small types) with the most efficient
> machine-accelerated representation of each specific instance in the
> class while being able to deal with each instance in a general way.
>

Different filesystems have different limits. The limit for a particular
file will not be known to a compiler.

--
James Harris

David Brown

unread,

Aug 23, 2021, 5:36:44 AM8/23/21

to

On 23/08/2021 00:44, Bart wrote:
> On 22/08/2021 22:44, David Brown wrote:
>> On 22/08/2021 17:52, Bart wrote:
>
>>> So, everyone should just give up and use C? Or another mainstream
>>> language where /someone else/ has done the hard work of creating library
>>> bindings?
>>
>> You use what is best for the job in hand. And if you want to create a
>> new language or new way to handle bindings for a language, expect it to
>> be hard work - and stop blaming other people and other people's choices
>> or tools.
>>
>> If /your/ language and tools are limited, that's /your/ problem. It is
>> not the job of the OS developers, or library writers, or C standards
>> authors, or anyone else to make life easy for /you/ !
>
> But it seems to be the job of the whole world to make things easy for
> YOU! Namely:

No, not for /me/ - for millions of developers. The solid majority of
the tools I personally use cover different host systems, different
programming languages, different target devices. Some tools have been
around for decades, and stick around because they do a good job (evolved
over time). Others are newer, as new tools replace or complement old ones.

>
> * Providing optimising compilers and analysers for C