Best term for a pointer which is null/nil/none etc

127 views
Skip to first unread message

James Harris

unread,
Aug 16, 2021, 11:37:43 AMAug 16
to
What's the best term for what might be called a null or nil pointer? In
a recent thread it turned out that there were various preferences and
various names that people are familiar with.

I am thinking to use as a keyword something like one of these:

null
nil
nullptr
ptr_null
none
empty
void
nothing
nowhere

As context imagine that you wanted to initialise a child node n with

n.left = X
n.right = X
n.data = 0

where X is one of the keywords above or some other that I've not listed.
The question is, which X would be best?


If it doesn't muddy the waters too much I should say that in addition to
'a pointer to no object' I /might/ also need a name for a pointer value
which has not been defined. If I do go that way then I'll need not one
but two names: one for a pointer which has been set to point to no
object and one for a pointer which has never been initialised. That's
why I added ptr_null into the above list - so that there would also be a
similar-looking ptr_undef or ptr_undefined. If you prefer, say, 'none'
for an explicitly set pointer to no object what name would you use for
undefined?



As a related matter, what capitalisation do you prefer for
language-defined constants such as the above and for 'true' and 'false'?
Do you prefer to see them have all lower case, all upper case, or to
capitalise just the first letter?



I know that a particular name is a minor matter but as I have to choose
I wondered what you guys find most intuitive or natural. Perhaps that
depends on whether one is thinking of the pointer or the referent. For
example, if one is thinking of the pointer then

nil

might be most natural as in "the pointer /is/ nil" whereas if one is
thinking of the referent then

nothing

may be better in the sense that "the pointer /is pointing at/ nothing".
Not sure.

Either way, how do the options look to you and what keyword or keywords
would you prefer to see in a piece of code? Also, are there any you
really dislike?!


--
James Harris

Bart

unread,
Aug 16, 2021, 12:41:15 PMAug 16
to
On 16/08/2021 16:37, James Harris wrote:
> What's the best term for what might be called a null or nil pointer? In
> a recent thread it turned out that there were various preferences and
> various names that people are familiar with.
>
> I am thinking to use as a keyword something like one of these:
>
>   null
>   nil
>   nullptr
>   ptr_null
>   none
>   empty
>   void
>   nothing
>   nowhere


In static code I use 'nil' as a built-in named constant of type 'ref
void' (void* in C), which is compatible with any pointer type.

There is also 'empty' and 'clear', which are interchangeable and can be
used as nouns or verbs, but only in the context of initialising a
variable or assigning to an expression; it is not a value:

int a:=empty
clear a # same as a:=empty
empty a # same as a:=empty

This was intended for array/record types, but works with any type
including pointers, where it will set them to nil.

It will clear the object to all-zeros, so nil must be all-zeros too.

In dynamic code, I also have 'void', but this simply means 'unassigned'.
All objects start off as void, but they can be set manually back to void
too:

a := 100
a := void

('void' is actually a type, but for convenience, it is treated as a
value - of type void - in source code. I have to use void.type for the
other meaning.)


>
> If it doesn't muddy the waters too much I should say that in addition to
> 'a pointer to no object' I /might/ also need a name for a pointer value
> which has not been defined.

See my 'void' above. However that only applies to dynamic code.
Elsewhere I would need to invent some suitable value:

int dummy # outside a function
ref void undefined = &dummy

ref byte p := undefined # inside a function

if p=undefined then ...

The undefined value should work for any pointer type.


> As a related matter, what capitalisation do you prefer for
> language-defined constants such as the above and for 'true' and 'false'?

That's up to my 'users'. My languages are case-insensitive, so they can
can choose truE and falsE if they like.


> I know that a particular name is a minor matter but as I have to choose
> I wondered what you guys find most intuitive or natural. Perhaps that
> depends on whether one is thinking of the pointer or the referent. For
> example, if one is thinking of the pointer then
>
>   nil
>
> might be most natural as in "the pointer /is/ nil" whereas if one is
> thinking of the referent then
>
>   nothing


You can give a choice maybe? Allow both null and nil for example.


> Either way, how do the options look to you and what keyword or keywords
> would you prefer to see in a piece of code? Also, are there any you
> really dislike?!

ptr_null

Anything with embedded underscore (shifted on my keyboard) in general.


David Brown

unread,
Aug 17, 2021, 5:47:05 AMAug 17
to
It might depend on how you use the pointer in the language. For
languages that implicitly dereference pointers to objects, something
denoting "nothing", "none", or "empty" makes sense - by writing "p =
empty" you are saying that the object referred to by "p" is empty or
non-existent. (For prior art, Python uses "None".)

For languages where you really think of "p" as a a pointer, and are
interested in the pointer rather than just the thing it points to,
something denoting "zero" is the popular choice - "null" and "nil" are
commonly used, with "null" being a little more common AFAICS. (C++ now
uses "nullptr", but that's because they needed a new name and "null" was
taken, and they didn't want something that was likely to be an existing
identifier.)

Some languages allow you to think in both ways - having both pointers
and references.

>
> As a related matter, what capitalisation do you prefer for
> language-defined constants such as the above and for 'true' and 'false'?
> Do you prefer to see them have all lower case, all upper case, or to
> capitalise just the first letter?
>

I personally dislike anything being in all-caps. I prefer keywords to
be small letters.

Sometimes a language has a system (either voluntarily by convention, or
enforced by the language) with identifiers being different categories
depending on whether they start with a capital or with a small letter.

One thing I would advise against is making a language case insensitive -
that's just a license for programmers to be inconsistent and confusing.

James Harris

unread,
Aug 17, 2021, 10:29:01 AMAug 17
to
On 16/08/2021 17:41, Bart wrote:
> On 16/08/2021 16:37, James Harris wrote:

>> What's the best term for what might be called a null or nil pointer?

...

>> I know that a particular name is a minor matter but as I have to
>> choose I wondered what you guys find most intuitive or natural.
>> Perhaps that depends on whether one is thinking of the pointer or the
>> referent. For example, if one is thinking of the pointer then
>>
>>    nil
>>
>> might be most natural as in "the pointer /is/ nil" whereas if one is
>> thinking of the referent then
>>
>>    nothing
>
>
> You can give a choice maybe? Allow both null and nil for example.

Am not a fan of arbitrary choices. They can lead to friction if one
programmer has to maintain the code written by another and their
personal preferences differ. I think I would need to pick one word.

>
>
>> Either way, how do the options look to you and what keyword or
>> keywords would you prefer to see in a piece of code? Also, are there
>> any you really dislike?!
>
> ptr_null
>
> Anything with embedded underscore (shifted on my keyboard) in general.

OK. Is it the shift keying you don't like? I know you write expressions
with no spaces either side of operators but if

ptr-null

were a single permitted name (which didn't require shift) how would your
view change, if at all?


--
James Harris

James Harris

unread,
Aug 17, 2021, 10:43:46 AMAug 17
to
On 17/08/2021 10:47, David Brown wrote:
> On 16/08/2021 17:37, James Harris wrote:

>> What's the best term for what might be called a null or nil pointer?

...

> It might depend on how you use the pointer in the language. For
> languages that implicitly dereference pointers to objects, something
> denoting "nothing", "none", or "empty" makes sense - by writing "p =
> empty" you are saying that the object referred to by "p" is empty or
> non-existent. (For prior art, Python uses "None".)

Maybe that's true even without automatic dereferencing. For example, in
C one might write

n->left = newnode;

where newnode is really a pointer. Correspondingly, in

n->right = Nothing

In that, Nothing would also be a pointer even though the form is
ostensibly saying that there's nothing on the right rather than that the
right-side pointer is a certain value.

>
> For languages where you really think of "p" as a a pointer, and are
> interested in the pointer rather than just the thing it points to,
> something denoting "zero" is the popular choice - "null" and "nil" are
> commonly used, with "null" being a little more common AFAICS. (C++ now
> uses "nullptr", but that's because they needed a new name and "null" was
> taken, and they didn't want something that was likely to be an existing
> identifier.)

OK.

>
> Some languages allow you to think in both ways - having both pointers
> and references.

Something to come back to!

>
>>
>> As a related matter, what capitalisation do you prefer for
>> language-defined constants such as the above and for 'true' and 'false'?
>> Do you prefer to see them have all lower case, all upper case, or to
>> capitalise just the first letter?
>>
>
> I personally dislike anything being in all-caps. I prefer keywords to
> be small letters.

OK.

It occurs to me that one area where an initial cap can be useful is when
including keywords in written text. For example, if I write that
something is False with an initial capital letter then it more clearly
shows that I am referring to a keyword rather than to a constant or a
concept. That would make the keywords

True
False
None (or Nothing or Null or Nil etc)

ISTM that (as I think you suggested) they look better than all caps.


--
James Harris

Bart

unread,
Aug 17, 2021, 11:51:32 AMAug 17
to
On 17/08/2021 15:28, James Harris wrote:
> On 16/08/2021 17:41, Bart wrote:
>> On 16/08/2021 16:37, James Harris wrote:
>
>>> What's the best term for what might be called a null or nil pointer?
>
> ...
>
>>> I know that a particular name is a minor matter but as I have to
>>> choose I wondered what you guys find most intuitive or natural.
>>> Perhaps that depends on whether one is thinking of the pointer or the
>>> referent. For example, if one is thinking of the pointer then
>>>
>>>    nil
>>>
>>> might be most natural as in "the pointer /is/ nil" whereas if one is
>>> thinking of the referent then
>>>
>>>    nothing
>>
>>
>> You can give a choice maybe? Allow both null and nil for example.
>
> Am not a fan of arbitrary choices. They can lead to friction if one
> programmer has to maintain the code written by another and their
> personal preferences differ. I think I would need to pick one word.

C allows both NULL and 0. Plus any expression that yields 0.

>>
>>
>>> Either way, how do the options look to you and what keyword or
>>> keywords would you prefer to see in a piece of code? Also, are there
>>> any you really dislike?!
>>
>> ptr_null
>>
>> Anything with embedded underscore (shifted on my keyboard) in general.
>
> OK. Is it the shift keying you don't like? I know you write expressions
> with no spaces either side of operators but if
>
>   ptr-null
>
> were a single permitted name (which didn't require shift) how would your
> view change, if at all?

ptr-null is better. Although I'd start wondering why you need the 'ptr'
part, if 'null' is not used in other contexts.


James Harris

unread,
Aug 17, 2021, 12:18:12 PMAug 17
to
On 17/08/2021 16:51, Bart wrote:
> On 17/08/2021 15:28, James Harris wrote:
>> On 16/08/2021 17:41, Bart wrote:
>>> On 16/08/2021 16:37, James Harris wrote:

>>>> What's the best term for what might be called a null or nil pointer?

...

>>> You can give a choice maybe? Allow both null and nil for example.
>>
>> Am not a fan of arbitrary choices. They can lead to friction if one
>> programmer has to maintain the code written by another and their
>> personal preferences differ. I think I would need to pick one word.
>
> C allows both NULL and 0. Plus any expression that yields 0.

Indeed, though I'm not planning to copy that approach. I'd probably
prohibit comparisons against zero. Something like the following.

if p eq 0 [prohibited]
if p eq Undef
if p eq Nil
if p [true if p is valid (i.e. neither Undef nor Nil)]

In those, Undef would be all-bits-zero but would be of a type which
could be compared against a pointer whereas an integer could not.
Further, if p were to be converted to False/True as in the last line
then False would mean "either Undef or Nil".

This is all speculation at the moment. Am just throwing around some ideas.


>
>>>
>>>> Either way, how do the options look to you and what keyword or
>>>> keywords would you prefer to see in a piece of code? Also, are there
>>>> any you really dislike?!
>>>
>>> ptr_null
>>>
>>> Anything with embedded underscore (shifted on my keyboard) in general.
>>
>> OK. Is it the shift keying you don't like? I know you write
>> expressions with no spaces either side of operators but if
>>
>>    ptr-null
>>
>> were a single permitted name (which didn't require shift) how would
>> your view change, if at all?
>
> ptr-null is better.

OK.

>
> Although I'd start wondering why you need the 'ptr'
> part, if 'null' is not used in other contexts.

I was thinking that if I had more than one reserved pointer value that
it may be better to give them a common form. Instead of, for example,

Undef
Nil

there would be

ptr-undef
ptr-nil

That would conceptually group similar constants together and take fewer
words away from those the programmer could define.

If there ends up being just one reserved pointer value then it would not
be a good idea.

If there end up being just two reserved pointer values then it may or
may not be worth it.

But if I were to end up adding a number of other reserved pointer values
then it might be better for the language overall if they were to have a
common appearance.


--
James Harris

David Brown

unread,
Aug 17, 2021, 12:56:11 PMAug 17
to
On 17/08/2021 17:51, Bart wrote:
> On 17/08/2021 15:28, James Harris wrote:
>> On 16/08/2021 17:41, Bart wrote:
>>> On 16/08/2021 16:37, James Harris wrote:
>>
>>>> What's the best term for what might be called a null or nil pointer?
>>
>> ...
>>
>>>> I know that a particular name is a minor matter but as I have to
>>>> choose I wondered what you guys find most intuitive or natural.
>>>> Perhaps that depends on whether one is thinking of the pointer or
>>>> the referent. For example, if one is thinking of the pointer then
>>>>
>>>>    nil
>>>>
>>>> might be most natural as in "the pointer /is/ nil" whereas if one is
>>>> thinking of the referent then
>>>>
>>>>    nothing
>>>
>>>
>>> You can give a choice maybe? Allow both null and nil for example.
>>
>> Am not a fan of arbitrary choices. They can lead to friction if one
>> programmer has to maintain the code written by another and their
>> personal preferences differ. I think I would need to pick one word.
>
> C allows both NULL and 0. Plus any expression that yields 0.
>

/Every/ language allows arbitrary choices in all sorts of places. That
does not mean you have to encourage it from the outset. James is right
here - it matters little whether he picks "null" or "nil", but either is
far better than having both.

>>>
>>>
>>>> Either way, how do the options look to you and what keyword or
>>>> keywords would you prefer to see in a piece of code? Also, are there
>>>> any you really dislike?!
>>>
>>> ptr_null
>>>
>>> Anything with embedded underscore (shifted on my keyboard) in general.
>>
>> OK. Is it the shift keying you don't like? I know you write
>> expressions with no spaces either side of operators but if
>>
>>    ptr-null
>>
>> were a single permitted name (which didn't require shift) how would
>> your view change, if at all?
>
> ptr-null is better. Although I'd start wondering why you need the 'ptr'
> part, if 'null' is not used in other contexts.
>

Programmers read code far more than they type it. If the programming
language designer here thinks "ptr_null" is the clearest way for a null
pointer to be expressed in the language, then that preference totally
dominates over one single person's complaints about the hardships of
using the shift key.

And Bart, if underscore is so diffult for you (perhaps you have
arthritis or other challenges), I'd recommend looking at different
keyboards, or enabling "sticky shift keys" or similar aids supported by
your OS of choice.

Certainly the use of "shift" is not relevant to language design.



Bart

unread,
Aug 17, 2021, 4:00:45 PMAug 17
to
On 17/08/2021 17:56, David Brown wrote:
> On 17/08/2021 17:51, Bart wrote:
>> On 17/08/2021 15:28, James Harris wrote:
>>> On 16/08/2021 17:41, Bart wrote:
>>>> On 16/08/2021 16:37, James Harris wrote:
>>>
>>>>> What's the best term for what might be called a null or nil pointer?
>>>
>>> ...
>>>
>>>>> I know that a particular name is a minor matter but as I have to
>>>>> choose I wondered what you guys find most intuitive or natural.
>>>>> Perhaps that depends on whether one is thinking of the pointer or
>>>>> the referent. For example, if one is thinking of the pointer then
>>>>>
>>>>>    nil
>>>>>
>>>>> might be most natural as in "the pointer /is/ nil" whereas if one is
>>>>> thinking of the referent then
>>>>>
>>>>>    nothing
>>>>
>>>>
>>>> You can give a choice maybe? Allow both null and nil for example.
>>>
>>> Am not a fan of arbitrary choices. They can lead to friction if one
>>> programmer has to maintain the code written by another and their
>>> personal preferences differ. I think I would need to pick one word.
>>
>> C allows both NULL and 0. Plus any expression that yields 0.
>>
>
> /Every/ language allows arbitrary choices in all sorts of places. That
> does not mean you have to encourage it from the outset. James is right
> here - it matters little whether he picks "null" or "nil", but either is
> far better than having both.

I sometimes allow a choice if I can't make up my mind about a feature or
a keyword. Then I can try it out and see which one feels better or looks
better, or which is used more often. Or I might use one form privately,
and another for shared code.

Here however, both null and nil are commonly used in programming
languages for the same thing. So why not allow both? If someone uses two
languages, one uses NULL, the other nil, if would be really convenient
to not have to keep thinking about which one you should be using.

(Although having said that, mine don't allow null! But then I am the
only user.)

>
> And Bart, if underscore is so diffult for you (perhaps you have
> arthritis or other challenges), I'd recommend looking at different
> keyboards, or enabling "sticky shift keys" or similar aids supported by
> your OS of choice.
>
> Certainly the use of "shift" is not relevant to language design.

Simplest of all is having alphunumeric identifiers not requiring you to
pause in the middle to deal with case or shift changes.

This especially applies to keywords that you can't do anything about.

So I'd say it's very relevant to not having a language that is a pita to
use.

David Brown

unread,
Aug 17, 2021, 4:59:34 PMAug 17
to
I do think "True" is better than "TRUE". But I think "true" is best :-)

Initial capitals won't mark a keyword unless you use capitals for /all/
keywords, and that will quickly get tedious and ugly. It's better to
use syntax highlighting in an editor that will mark the keywords in some
way (such as bold, or a particular colour). When writing code by hand,
I usually underline the keywords for clarity - but I wouldn't want to
use initial capitals.

David Brown

unread,
Aug 17, 2021, 5:08:45 PMAug 17
to
Keeping the choice open while prototyping, developing and testing makes
sense - that's fair enough. But once your language has solidified
somewhat, then it's good to fix these things. (Though there is always a
trade-off between keeping consistency between versions and being able to
correct mistakes or sub-optimal decisions with later versions. A good
period of trial and testing helps here.)

>
> Here however, both null and nil are commonly used in programming
> languages for the same thing. So why not allow both? If someone uses two
> languages, one uses NULL, the other nil, if would be really convenient
> to not have to keep thinking about which one you should be using.
>
> (Although having said that, mine don't allow null! But then I am the
> only user.)
>
>>
>> And Bart, if underscore is so diffult for you (perhaps you have
>> arthritis or other challenges), I'd recommend looking at different
>> keyboards, or enabling "sticky shift keys" or similar aids supported by
>> your OS of choice.
>>
>> Certainly the use of "shift" is not relevant to language design.
>
> Simplest of all is having alphunumeric identifiers not requiring you to
> pause in the middle to deal with case or shift changes.
>

Some people like camelCase for multi-word identifiers, some prefer
underscore_separation. I can't imagine many people dislike underscore
purely because of using the shift key (though some /do/ dislike it
because they find the underscore hard to see in some circumstances).

There are a few languages that allow multi-word identifiers separated by
spaces, or allow hyphens as "letters", but those are rare, and likely to
cause confusion.

> This especially applies to keywords that you can't do anything about.
>

Certainly it makes sense to have shorter and simpler keywords, at least
for those that are commonly used. And there is no point in having extra
underscores for no purpose. I might not object to underscores as much
as you do, but I see no benefit of "null_ptr" over "nullptr".

> So I'd say it's very relevant to not having a language that is a pita to
> use.

Well, I guess the OP will collect opinions, and use that to help make
his decisions.

Bart

unread,
Aug 17, 2021, 6:52:18 PMAug 17
to
On 17/08/2021 22:08, David Brown wrote:
> On 17/08/2021 22:00, Bart wrote:

>> I sometimes allow a choice if I can't make up my mind about a feature or
>> a keyword. Then I can try it out and see which one feels better or looks
>> better, or which is used more often. Or I might use one form privately,
>> and another for shared code.
>
> Keeping the choice open while prototyping, developing and testing makes
> sense - that's fair enough. But once your language has solidified
> somewhat, then it's good to fix these things. (Though there is always a
> trade-off between keeping consistency between versions and being able to
> correct mistakes or sub-optimal decisions with later versions. A good
> period of trial and testing helps here.)

Another area where I like to have alternatives is basic types; the
choices on each line all refer to the same type:

byte word8 u8
word word64 u64
int int64 i64
real real64 r64 float64
int16 i16

The ones in the third column are universally understood, and I use them
for generated or shared code, but in normal source I use ones from the
first column, if they exist, or second if the width is significant or
there is no colloquial form.

The 'float64' I'd forgotten about; I guess that'll be coming out soon.

C of course famously has dozens of ways of writing some types (partly
due to them requiring multiple tokens, some optional, and which can be
in any order).

It's not surprising that so many applications define their own sets of
types. That's a worse problem than the language allowing a choice of 2 or 3.


>
> Some people like camelCase for multi-word identifiers, some prefer
> underscore_separation. I can't imagine many people dislike underscore
> purely because of using the shift key

I dislike them also because I can never remember if there is a
underscore or not.

David Brown

unread,
Aug 18, 2021, 4:40:09 AMAug 18
to
On 18/08/2021 00:52, Bart wrote:
> On 17/08/2021 22:08, David Brown wrote:
>> On 17/08/2021 22:00, Bart wrote:
>
>>> I sometimes allow a choice if I can't make up my mind about a feature or
>>> a keyword. Then I can try it out and see which one feels better or looks
>>> better, or which is used more often. Or I might use one form privately,
>>> and another for shared code.
>>
>> Keeping the choice open while prototyping, developing and testing makes
>> sense - that's fair enough.  But once your language has solidified
>> somewhat, then it's good to fix these things.  (Though there is always a
>> trade-off between keeping consistency between versions and being able to
>> correct mistakes or sub-optimal decisions with later versions.  A good
>> period of trial and testing helps here.)
>
> Another area where I like to have alternatives is basic types; the
> choices on each line all refer to the same type:
>
> byte   word8   u8
> word   word64  u64
> int    int64   i64
> real   real64  r64 float64
>        int16   i16
>
> The ones in the third column are universally understood, and I use them
> for generated or shared code, but in normal source I use ones from the
> first column, if they exist, or second if the width is significant or
> there is no colloquial form.

When people say "universally understood", they usually mean "I like them
and don't much care about the rest of the universe". As you know, I
think these extremely short names are horrible in many ways, and I
totally disagree that they are "universally understood". With enough
context I expect people can figure out what they are, but that's another
matter entirely - and it applies equally to any naming scheme that
includes bit sizes explicitly.

You spend a significant amount of time posting on c.l.c. about how
terrible it is when people use different names for the same type,
regardless of how vital type names are to program clarity and code
flexibility. And now you are recommending multiple names for the same
type that give absolutely /no/ advantages or benefits. You also
regularly complain that in C, fundamental types like "int" and "long
int" are poorly defined and unclear, and how it is better to have
explicitly sized types (and then you won't use C's explicitly sized
types, because that would mean you couldn't whine about them). And now
you want to tell us that it's great for a language to have "word"
meaning the same thing as "word64" and "u64"!

>
> The 'float64' I'd forgotten about; I guess that'll be coming out soon.

So one of the great things about having lots of different ways to write
exactly the same thing in a language is that the language's designer,
implementer and /single/ user can't remember them all.

When planning a new language, it's good to learn about existing
languages so that you can be inspired by parts that work well, and avoid
ideas that work badly. From that second viewpoint, I think you are
helping the OP significantly.

>
> C of course famously has dozens of ways of writing some types (partly
> due to them requiring multiple tokens, some optional, and which can be
> in any order).

I would not advise copying C's system for fundamental types any more
than I would recommend copying your multiple different names. But
unless you are trying to make a programming language less user-friendly
than Forth, and less portable than assembly, a language needs a way to
name types for use in particular cases.

>
> It's not surprising that so many applications define their own sets of
> types. That's a worse problem than the language allowing a choice of 2
> or 3.

C is far from perfect here (and no one claims otherwise). What /is/
surprising is that you would suggest that the answer is to have the
language start off with multiple names so that programmers get mixed up
and inconsistent before they even start writing their own code.


>
>>
>> Some people like camelCase for multi-word identifiers, some prefer
>> underscore_separation.  I can't imagine many people dislike underscore
>> purely because of using the shift key
>
> I dislike them also because I can never remember if there is a
> underscore or not.
>

You can't even remember how your own languages work, despite having
written and implemented them and apparently written lots of code in them.

Bart

unread,
Aug 18, 2021, 6:59:31 AMAug 18
to
No, I mean that they are used everywhere and widely understood. They're
even used in Linux kernel, see here under Typedefs:

https://www.kernel.org/doc/html/v4.10/process/coding-style.html

They are the primary types in Rust.

I believe they are used in the MSVC C compiler as suffixes for integer
literals (1234i32 and 1234ui32).

This all means they are useful as language-independent ways of refering
to such types in forums like this, because either everyone will already
know what they mean, or they can make a pretty good guess.

They are after all just contractions of C's int32_t-style family of
types, with extraneous letters (and underscores!) elided.

In my case they started off as being internal representations, then were
used in generated code, then, in order to be able to read that code back
in, they were made part of the language.


> And now
> you want to tell us that it's great for a language to have "word"
> meaning the same thing as "word64" and "u64"!

'word' is specific to my languages to mean 'unsigned integer'. The
formal, consistently named set of unsigned types go from word1 to
word128, which have the parallel naming scheme u1 to u128.

word64 also has the informal, "don't care" synonym 'word' (like 'int' is
a synonym for int64').

And 'byte' is a synonym for 'word8'.

>
>>
>> The 'float64' I'd forgotten about; I guess that'll be coming out soon.
>
> So one of the great things about having lots of different ways to write
> exactly the same thing in a language is that the language's designer,
> implementer and /single/ user can't remember them all.

I wanted something more mainstream since 'real' is not well known these
days. Probably the idea was to be be able to use it when sharing code,
so as not to have to explain when real was. But that didn't happen.

David Brown

unread,
Aug 18, 2021, 11:16:21 AMAug 18
to
They are most certainly /not/ used everywhere. They are used in a
number of programs and a number of languages, but that is not
/everywhere/ or even remotely close to a measurable percentage of
"everywhere" in programming.

In context, if it is clear you are talking about a type, then I agree
that the short names are easily understood even if they are not names
you commonly use. But they are always less clear than something like
"int32".

> They're
> even used in Linux kernel, see here under Typedefs:
>
>   https://www.kernel.org/doc/html/v4.10/process/coding-style.html
>

No sane programmer has ever used the Linux kernel as an example of good
style for general purpose coding. It is a highly specialised program,
with a unique background (and an old background - it comes from a time
before the standardised C types like int32_t). And while Linus Torvalds
has many fine qualities, there are many of his preferred styles and
other opinions on programming and languages that are, to put it mildly,
controversial.

> They are the primary types in Rust.

IMHO that was a silly decision. I haven't used Rust for anything more
than very simple testing, but to me it seems mostly a wasted
opportunity. It has several nice ideas, but I can't see it having any
significant benefit over C++ for my kind of use. For people who can't
get their pointers right in C, and refuse to use smart pointers in C++,
then perhaps Rust's system is safer. And it has some nice things, like
pattern matching. But its various types of macros and generics are way
weaker than C++'s templates. Maybe it will gain features to compete
well with C++ - maybe C++ will gain features to compete with Rust.

>
> I believe they are used in the MSVC C compiler as suffixes for integer
> literals (1234i32 and 1234ui32).

In that use-case, I can see some advantages. (I could also see them
being the basis for printf format specifiers, rather than C's rather
ugly <inttype.h> macros. But in a new language, I'd rather see a better
system than printf anyway.)

>
> This all means they are useful as language-independent ways of refering
> to such types in forums like this, because either everyone will already
> know what they mean, or they can make a pretty good guess.
>
> They are after all just contractions of C's int32_t-style family of
> types, with extraneous letters (and underscores!) elided.
>

int32 and uint32 are simple, clear, unambiguous, easy to type, and
cannot seriously be mistaken for anything else. I personally like the
_t suffixes, but I'm quite happy to accept that others have different
opinions.

> In my case they started off as being internal representations, then were
> used in generated code, then, in order to be able to read that code back
> in, they were made part of the language.
>
>
>>  And now
>> you want to tell us that it's great for a language to have "word"
>> meaning the same thing as "word64" and "u64"!
>
> 'word' is specific to my languages to mean 'unsigned integer'. The
> formal, consistently named set of unsigned types go from word1 to
> word128, which have the parallel naming scheme u1 to u128.
>
> word64 also has the informal, "don't care" synonym 'word' (like 'int' is
> a synonym for int64').
>

The trouble with "word" is the size is seriously ambiguous. I'd say it
is worse than "int" in that respect.

> And 'byte' is a synonym for 'word8'.

"byte" is fair enough - I think it's reasonable to say that the meaning
of "smallest addressable unit of memory" is outdated. But I would not
use that for an 8-bit number, I'd use it for raw memory access that does
not have any semantic information.

>
>>
>>>
>>> The 'float64' I'd forgotten about; I guess that'll be coming out soon.
>>
>> So one of the great things about having lots of different ways to write
>> exactly the same thing in a language is that the language's designer,
>> implementer and /single/ user can't remember them all.
>
> I wanted something more mainstream since 'real' is not well known these
> days. Probably the idea was to be be able to use it when sharing code,
> so as not to have to explain when real was. But that didn't happen.
>

I've nothing against "float32", "float64", etc., as type names for
floating point data. (I'd add a "_t", of course!)

I'd be okay with "real32", "real64", etc., as well - but I think "float"
is more accurate (floating point numbers do not exactly represent real
numbers). And like "word", the name "real" is a long outdated name
without clear rules on its size.

Bart

unread,
Aug 18, 2021, 2:02:32 PMAug 18
to
On 18/08/2021 16:16, David Brown wrote:
> On 18/08/2021 12:59, Bart wrote:

> int32 and uint32 are simple, clear, unambiguous, easy to type, and
> cannot seriously be mistaken for anything else.

int32 is OK, that's what I use mostly when I need that specific,
narrower type.

But I need a bigger difference between signed and unsigned integers.
Just sticking a 'u' at the front doesn't cut it, unless it replaces 'i'
with 'u' as happens with i32 and u32. With int32/uint32, the difference
is too subtle, and uint is unpleasant to type.

>> word64 also has the informal, "don't care" synonym 'word' (like 'int' is
>> a synonym for int64').
>>
>
> The trouble with "word" is the size is seriously ambiguous. I'd say it
> is worse than "int" in that respect.

Denotations like 'int' and 'word' are supposed to be /slightly/
ambiguous. They are used when you don't care about the width, but expect
the default to be sufficient.

The default on my current languages is to make then 64 bits wide, so
it's unlikely to be insufficient.

40 years ago they were 16 bits, and some 20 years ago they become 32
bits. I don't really see a need for default 128-bit integers even in 20
years from now.

Most languages appear to be stuck with a default 32-bit int type, which
is now too small for memory sizes, large object sizes, file sizes and
many other things.

As for 'word', I've used that to mean an unsigned version of 'int' since
the 80s, although it was then 16 bits. (It still is in my x64 assembler
in the form of DW directives and register names like W3.)


> I'd be okay with "real32", "real64", etc., as well - but I think "float"
> is more accurate (floating point numbers do not exactly represent real
> numbers). And like "word", the name "real" is a long outdated name
> without clear rules on its size.

I used 'real' in Algol, Fortran and Pascal, and I've used it in my own
languages since 1981 (when it was implemented as an f24 type). So I
don't care that it's outdated. Just that I might need to keep explaining
what it means!


David Brown

unread,
Aug 19, 2021, 7:13:23 AMAug 19
to
On 18/08/2021 20:02, Bart wrote:
> On 18/08/2021 16:16, David Brown wrote:
>> On 18/08/2021 12:59, Bart wrote:
>
>> int32 and uint32 are simple, clear, unambiguous, easy to type, and
>> cannot seriously be mistaken for anything else.
>
> int32 is OK, that's what I use mostly when I need that specific,
> narrower type.
>
> But I need a bigger difference between signed and unsigned integers.
> Just sticking a 'u' at the front doesn't cut it, unless it replaces 'i'
> with 'u' as happens with i32 and u32. With int32/uint32, the difference
> is too subtle, and uint is unpleasant to type.

So it is obvious to you (so obvious that you think it is universally
obvious) that "u32" is "unsigned integer 32-bit", equally obvious that
"int32" is "signed integer 32-bit", and yet "uint32" is too subtle? I
can't help feeling there is an inconsistency here somewhere...


Let's just say I think that a programming language should have one
single standard method for naming these fixed-size types. Multiple
small variations for a type whose name means basically the same thing,
and which will be used in the same circumstances, does not help anyone.
Different names with different meanings, and used in different
circumstances, are another matter - even if they happen to have the same
size in a particular use-case.

These names don't have to be keywords or fundamental types in the
language. A language could have a completely flexible system for
integer types, so that "int32" is defined in the language standard
library as "type_alias int32 = builtin::integer<4, signed>", or whatever
syntax or features you pick. But thereafter, programmers should stick
to the standard names unless they have need of a specific name for the
type. (Thus in C, the fundamental boolean type is "_Bool" - but the
standard name is "bool". And for size-specific types, you should use
"int32_t" and friends, as those are the standard names. It doesn't
matter that some people use other names, for good or bad reasons - those
are still the names you should use.)


As for the details of the names - the language designed should pick
names that he/she likes, consulting with others in the project. Then
during alpha testing they should collect feedback from other users and
interested parties who are looking at the language.


>
>>> word64 also has the informal, "don't care" synonym 'word' (like 'int' is
>>> a synonym for int64').
>>>
>>
>> The trouble with "word" is the size is seriously ambiguous.  I'd say it
>> is worse than "int" in that respect.
>
> Denotations like 'int' and 'word' are supposed to be /slightly/
> ambiguous. They are used when you don't care about the width, but expect
> the default to be sufficient.

I think it's fair to expect "int" to mean "a type meant to hold
integers". If you are used to C, or if the language is fairly low
level, you could assume it also means a small and efficient type. If
you are used to high level languages, you might take it to mean
unlimited range. But amongst anyone that has worked with low-level
programming, or who knows what processor they are targetting, "word"
means "machine word" at is tightly connected to the processor - with the
definition and size varying hugely. To someone without low-level
experience or knowledge, it might make no sense at all.

Thus I think "word" is a particularly bad choice of names - it has been
used and abused too much and has no real meaning left. I'd put it as a
lot worse than "int" in that respect.


I think there are times where a generic "number" type could be very
convenient. In simple languages with few types, it makes a lot of sense
(perhaps even more so in interpreted or bytecode-compiled languages).
Just have one type "num" that is a signed integer of the biggest size
that works efficiently for the processor. There would be no point in
having signed and unsigned versions here. This could be simple and
convenient for local variables, but I would not want to allow it for
types that are used in interfaces - you'd want it for limited scope use
where the compiler can see the ranges needed. ("int" in C is a little
like this in its original intention, but that has got lost somewhere
along the line as "int" has been used inappropriately when more tightly
specified types would be better, and as implementations have failed to
make "int" 64-bit on 64-bit systems.)


>
> The default on my current languages is to make then 64 bits wide, so
> it's unlikely to be insufficient.
>
> 40 years ago they were 16 bits, and some 20 years ago they become 32
> bits. I don't really see a need for default 128-bit integers even in 20
> years from now.
>
> Most languages appear to be stuck with a default 32-bit int type, which
> is now too small for memory sizes, large object sizes, file sizes and
> many other things.
>

32-bit is big enough for almost every situation for memory sizes, file
sizes, etc. Not /every/ situation - but almost all. But if you want a
type that can handle everything and be efficient on PC's, then 64-bit is
the choice.


> As for 'word', I've used that to mean an unsigned version of 'int' since
> the 80s, although it was then 16 bits. (It still is in my x64 assembler
> in the form of DW directives and register names like W3.)
>

Yes, I understand that. But programming languages should not be
designed around the experiences and preferences of one single
programmer. The name "word" should not be used in a language that has
ambitions beyond a small hobby language, precisely because it means so
many different things to different people or in different contexts, and
is thus meaningless and confusing.

>
>> I'd be okay with "real32", "real64", etc., as well - but I think "float"
>> is more accurate (floating point numbers do not exactly represent real
>> numbers).  And like "word", the name "real" is a long outdated name
>> without clear rules on its size.
>
> I used 'real' in Algol, Fortran and Pascal, and I've used it in my own
> languages since 1981 (when it was implemented as an f24 type). So I
> don't care that it's outdated. Just that I might need to keep explaining
> what it means!
>

Sometimes it is hard to be objective about things that have been
familiar for so long. I've been familiar with "real numbers" as a named
mathematical concept since I was perhaps 10 years old. So it is hard to
imagine that someone might not know what "real" means.

But it is certainly easy to imagine that the size of a type "real" is
not clearly defined - unlike "float" and "double", it has never been
standardised and different sizes of "real" have been in common use.

James Harris

unread,
Aug 19, 2021, 8:51:48 AMAug 19
to
On 17/08/2021 22:08, David Brown wrote:

...

> Certainly it makes sense to have shorter and simpler keywords, at least
> for those that are commonly used. And there is no point in having extra
> underscores for no purpose. I might not object to underscores as much
> as you do, but I see no benefit of "null_ptr" over "nullptr".

To be clear, I don't think I personally suggested null_ptr for the
following reasons. What I had in mind was that if there were going to be
two or more reserved pointer values then it might make sense for them to
be clearly related. And so I suggested ptr as a prefix - something like

ptr_undef
ptr_null

By contrast, if I were to use the nullptr that you mentioned then the
corresponding keywords would be

undefptr
nullptr

which is perhaps getting to be a bit hard to read. The situation would
be worse if there were many such reserved names as in

undefptr
nullptr
debugptr
signalptr
chainendptr

I know that discussion of what name to use for a reserved pointer value
is somewhat about minutiae but as I think the list shows even choices
such as this can make a difference to the readability of the program.

Not to be overlooked is the words which get reserved by the language and
so become unavailable for the programmer to use as identifier names. For
that reason, I think a better list than the above would be

ptr_undef
ptr_null
ptr_debug
ptr_signal
ptr_chainend

Although they are arguably more ugly such names would, perhaps, be
easier to read and recognise, and that would keep the reserved words as
having a common prefix. A programmer scanning the text of the program
and seeing the prefix would be able to immediately recognise it as a
special pointer value and then would only really need to notice the
specific if it was relevant to why he was looking at the code.

As I say, minor points.

...

> Well, I guess the OP will collect opinions, and use that to help make
> his decisions.
>

Yes. Opinions are always welcome.


--
James Harris

Bart

unread,
Aug 19, 2021, 8:58:00 AMAug 19
to
On 19/08/2021 12:13, David Brown wrote:
> On 18/08/2021 20:02, Bart wrote:

>> But I need a bigger difference between signed and unsigned integers.
>> Just sticking a 'u' at the front doesn't cut it, unless it replaces 'i'
>> with 'u' as happens with i32 and u32. With int32/uint32, the difference
>> is too subtle, and uint is unpleasant to type.
>
> So it is obvious to you (so obvious that you think it is universally
> obvious) that "u32" is "unsigned integer 32-bit", equally obvious that
> "int32" is "signed integer 32-bit", and yet "uint32" is too subtle? I
> can't help feeling there is an inconsistency here somewhere...

uint just looks like a typo to me. (On UK keyboards, u and i are
adjacent so typing 'ui' with one press is not uncommon.)

If you actually make such a typo when writing i32 or u32, then the
difference is more apparent.

> These names don't have to be keywords or fundamental types in the
> language. A language could have a completely flexible system for
> integer types, so that "int32" is defined in the language standard
> library as "type_alias int32 = builtin::integer<4, signed>", or whatever
> syntax or features you pick. But thereafter, programmers should stick
> to the standard names unless they have need of a specific name for the
> type. (Thus in C, the fundamental boolean type is "_Bool" - but the
> standard name is "bool".

C is not the language to set examples from. A C implementation is quite
likely to define 'int32_t' on top of 'int', and 'uint32_t' on top of
'unsigned'! Somewhat circular definitions...

And for size-specific types, you should use
> "int32_t" and friends, as those are the standard names. It doesn't
> matter that some people use other names, for good or bad reasons - those
> are still the names you should use.)

I used to have such schemes; my earlier languages used the following
(inspired by Fortran) on top of which the colloquial aliases were defined:

int*N for signed integers
byte*N for unsigned integers
real*N for floats

as fundamental types, where N is a byte-size. I also played with int:32
and byte:64. Then I realised I was never going to use int*7 or byte:23,
and just used hardcodes names (and saved typing those shifted "*" and ":"!)

It's a not really a problem what a language uses; people will write
whatever the language requires [except in C where people are more apt to
use typedefs for basic types].

Outside of a specific language, I might use int32 or i32 or u64. Nobody
has ever asked me what they mean (except in c.l.c.)


> Thus I think "word" is a particularly bad choice of names - it has been
> used and abused too much and has no real meaning left. I'd put it as a
> lot worse than "int" in that respect.

OK. You don't need to use my language, or if you do, you can choose to
use 'u64' for a specific size, or create your own aliases, like I do in C.

Personally I prefer to type 'word' over 'unsigned long long int' or even
'uint64_t'.


>> Most languages appear to be stuck with a default 32-bit int type, which
>> is now too small for memory sizes, large object sizes, file sizes and
>> many other things.
>>
>
> 32-bit is big enough for almost every situation for memory sizes, file
> sizes, etc. Not /every/ situation - but almost all.

Not enough for you to forget completely about it's limitations. Have a
simple loop summing filesizes, and you are likely to overflow an int32
range; certainly you have to keep it very much in mind.

This is the whole reason for the ugly size_t in C. If 'int' was 64 bits,
you could forget about size_t, off_t, time_t, clock_t, and all the rest
of that zoo.

(If you are implementing 64-bit compilers, assemblers, linkers,
interpreters and runtimes, then you /need/ a 64-bit int!)

James Harris

unread,
Aug 19, 2021, 9:03:46 AMAug 19
to
On 18/08/2021 19:02, Bart wrote:
> On 18/08/2021 16:16, David Brown wrote:


>> int32 and uint32 are simple, clear, unambiguous, easy to type, and
>> cannot seriously be mistaken for anything else.
>
> int32 is OK, that's what I use mostly when I need that specific,
> narrower type.
>
> But I need a bigger difference between signed and unsigned integers.
> Just sticking a 'u' at the front doesn't cut it, unless it replaces 'i'
> with 'u' as happens with i32 and u32. With int32/uint32, the difference
> is too subtle, and uint is unpleasant to type.

Did you really say you don't like uint because it's "unpleasant to type"?

:-o

...

> Denotations like 'int' and 'word' are supposed to be /slightly/
> ambiguous. They are used when you don't care about the width, but expect
> the default to be sufficient.

That's fine but only if you are (1) restricted to certain hardware or
(2) your language allows you to specify either the default or a range in
which the default will be required to be.


--
James Harris

Bart

unread,
Aug 19, 2021, 9:32:32 AMAug 19
to
On 19/08/2021 14:03, James Harris wrote:
> On 18/08/2021 19:02, Bart wrote:
>> On 18/08/2021 16:16, David Brown wrote:
>
>
>>> int32 and uint32 are simple, clear, unambiguous, easy to type, and
>>> cannot seriously be mistaken for anything else.
>>
>> int32 is OK, that's what I use mostly when I need that specific,
>> narrower type.
>>
>> But I need a bigger difference between signed and unsigned integers.
>> Just sticking a 'u' at the front doesn't cut it, unless it replaces
>> 'i' with 'u' as happens with i32 and u32. With int32/uint32, the
>> difference is too subtle, and uint is unpleasant to type.
>
> Did you really say you don't like uint because it's "unpleasant to type"?
>
> :-o

Yeah. I just don't like it.


> ...
>
>> Denotations like 'int' and 'word' are supposed to be /slightly/
>> ambiguous. They are used when you don't care about the width, but
>> expect the default to be sufficient.
>
> That's fine but only if you are (1) restricted to certain hardware or
> (2) your language allows you to specify either the default or a range in
> which the default will be required to be.

I usually set the default to the target machine word size.

This works for me since, once my languages target 64 bits for example,
they're unlikely to still target 32 bits, which mainstream ones still
have to support.

I can still run on 32 bits (eg. via a C target), but it will be less
efficient as many operations will be unnecessarily 64 bits.

There could be a dedicated language version where int might be 32 or 16
bits, but I can't guarantee the same programs still working, as they may
assume the wider int type:

int worldpop = 7500 million

David Brown

unread,
Aug 19, 2021, 10:09:32 AMAug 19
to
On 19/08/2021 14:57, Bart wrote:
> On 19/08/2021 12:13, David Brown wrote:
>> On 18/08/2021 20:02, Bart wrote:
>
>>> But I need a bigger difference between signed and unsigned integers.
>>> Just sticking a 'u' at the front doesn't cut it, unless it replaces 'i'
>>> with 'u' as happens with i32 and u32. With int32/uint32, the difference
>>> is too subtle, and uint is unpleasant to type.
>>
>> So it is obvious to you (so obvious that you think it is universally
>> obvious) that "u32" is "unsigned integer 32-bit", equally obvious that
>> "int32" is "signed integer 32-bit", and yet "uint32" is too subtle?  I
>> can't help feeling there is an inconsistency here somewhere...
>
> uint just looks like a typo to me. (On UK keyboards, u and i are
> adjacent so typing 'ui' with one press is not uncommon.)

They are adjacent on most Latin alphabet keyboard layouts, I think.

>
> If you actually make such a typo when writing i32 or u32, then the
> difference is more apparent.

I find it very hard to believe that "uint32" is a common typo for
"int32" and commonly goes unnoticed, and that "u32" vs "i32" is less
significantly likely to happen or go unnoticed. But unless someone has
statistics on such errors, we'll never know for sure.

>
>> These names don't have to be keywords or fundamental types in the
>> language.  A language could have a completely flexible system for
>> integer types, so that "int32" is defined in the language standard
>> library as "type_alias int32 = builtin::integer<4, signed>", or whatever
>> syntax or features you pick.  But thereafter, programmers should stick
>> to the standard names unless they have need of a specific name for the
>> type.  (Thus in C, the fundamental boolean type is "_Bool" - but the
>> standard name is "bool".
>
> C is not the language to set examples from. A C implementation is quite
> likely to define 'int32_t' on top of 'int', and 'uint32_t' on top of
> 'unsigned'! Somewhat circular definitions...

I used C as an example here, not because I think the details of its
types should be copied. C does things the way it does because they made
sense at the time, and history has passed since its conception. Despite
all your moanings and groanings, C's system was worked well for the last
50 years and continues to work well now - at least for those willing to
accept it and work with it instead of fighting it. But I would not copy
the same system for a /new/ language, nor did I suggest it.

>
>   And for size-specific types, you should use
>> "int32_t" and friends, as those are the standard names.  It doesn't
>> matter that some people use other names, for good or bad reasons - those
>> are still the names you should use.)
>
> I used to have such schemes; my earlier languages used the following
> (inspired by Fortran) on top of which the colloquial aliases were defined:
>
>   int*N               for signed integers
>   byte*N              for unsigned integers
>   real*N              for floats
>
> as fundamental types, where N is a byte-size. I also played with int:32
> and byte:64. Then I realised I was never going to use int*7 or byte:23,
> and just used hardcodes names (and saved typing those shifted "*" and ":"!)
>
> It's a not really a problem what a language uses; people will write
> whatever the language requires [except in C where people are more apt to
> use typedefs for basic types].

Anybody doing serious programming in a real language is going to make
extensive use of named types - including named scaler types. There is,
of course, a need for simple little languages - not everything has to be
suitable for large-scale coding projects. But if you are going to write
large-scale software, and are interested in writing clear, maintainable
code that minimises the risk of error, then you want good typing.
Ideally there should be support for strong types here, not just aliases.

>
> Outside of a specific language, I might use int32 or i32 or u64. Nobody
> has ever asked me what they mean (except in c.l.c.)

No one in c.l.c. has asked you want they mean, to the best of my
recollection. But many have asked you why you use them, or asked you to
use standard C types when writing C instead of silly, petty
abbreviations. If you had talked about your "prgmming langage", people
would know what you meant - and question your spelling. If you
continued to insist that that's how /you/ prefer to write it, and that
it is superior to the standard spelling, people would think you are
rude, arrogant and ignorant. Unsurprisingly, you evoke similar
reactions when you post your silliness in c.l.c. (And I'd expect the
same results if I posted to comp.lang.rust with code using type "int32_t".)

>
>
>> Thus I think "word" is a particularly bad choice of names - it has been
>> used and abused too much and has no real meaning left.  I'd put it as a
>> lot worse than "int" in that respect.
>
> OK. You don't need to use my language, or if you do, you can choose to
> use 'u64' for a specific size, or create your own aliases, like I do in C.
>
> Personally I prefer to type 'word' over 'unsigned long long int' or even
> 'uint64_t'.
>

Well, in your language you can of course use whatever you want (as long
as you can remember what it means) - no one else is looking at your code
or using the language. But here we are not talking about /your/
personal languages - we are giving opinions and ideas to help some one
else with their language. And I assume he has ambitions that it might
be of interest to people other than himself.

>
>>> Most languages appear to be stuck with a default 32-bit int type, which
>>> is now too small for memory sizes, large object sizes, file sizes and
>>> many other things.
>>>
>>
>> 32-bit is big enough for almost every situation for memory sizes, file
>> sizes, etc.  Not /every/ situation - but almost all.
>
> Not enough for you to forget completely about it's limitations. Have a
> simple loop summing filesizes, and you are likely to overflow an int32
> range; certainly you have to keep it very much in mind.
>

Sure - if that's the kind of program you are writing.

> This is the whole reason for the ugly size_t in C. If 'int' was 64 bits,
> you could forget about size_t, off_t, time_t, clock_t, and all the rest
> of that zoo.
>

No, you could not. And I assume you are just being your usual perverse
argumentative self, rather than actually wanting to learn anything.
(You have, after all, had this stuff explained patiently and repeatedly
many times.)

> (If you are implementing 64-bit compilers, assemblers, linkers,
> interpreters and runtimes, then you /need/ a 64-bit int!)
>

Certainly a 64-bit integer type is convenient - having it as "int" is
very far from necessary.

David Brown

unread,
Aug 19, 2021, 10:23:20 AMAug 19
to
On 19/08/2021 14:51, James Harris wrote:
> On 17/08/2021 22:08, David Brown wrote:
>
> ...
>
>> Certainly it makes sense to have shorter and simpler keywords, at least
>> for those that are commonly used.  And there is no point in having extra
>> underscores for no purpose.  I might not object to underscores as much
>> as you do, but I see no benefit of "null_ptr" over "nullptr".
>
> To be clear, I don't think I personally suggested null_ptr for the
> following reasons. What I had in mind was that if there were going to be
> two or more reserved pointer values then it might make sense for them to
> be clearly related. And so I suggested ptr as a prefix - something like
>
>   ptr_undef
>   ptr_null
>
> By contrast, if I were to use the nullptr that you mentioned then the
> corresponding keywords would be
>
>   undefptr
>   nullptr
>
> which is perhaps getting to be a bit hard to read.

I'd question the usefulness of having these as distinct names or values
in the first place (especially when balanced against the run-time cost
of manual or automatic checking of pointer validity - a comparison to 0
is cheap, a comparison to something else is not). And I'd question the
usefulness of having "ptr" as part of the name here at all. Remember,
C++ only has the name "nullptr" because it could not use "null".

You'd perhaps be better having "undefined" as a keyword and allowing it
for value types as well as pointers. Perhaps it would be a meta-value -
generating no code, but being useful for the compiler to check that the
programmer has put a real value in the variable.

> The situation would
> be worse if there were many such reserved names as in
>
>   undefptr
>   nullptr
>   debugptr
>   signalptr
>   chainendptr
>

These are definitely getting bad. Whatever you are thinking of here,
it's unlikely that making these reserved names is a good idea. A
well-designed language should aim to minimise the reserved name count,
not maximise it.

> I know that discussion of what name to use for a reserved pointer value
> is somewhat about minutiae but as I think the list shows even choices
> such as this can make a difference to the readability of the program.
>
> Not to be overlooked is the words which get reserved by the language and
> so become unavailable for the programmer to use as identifier names. For
> that reason, I think a better list than the above would be
>
>   ptr_undef
>   ptr_null
>   ptr_debug
>   ptr_signal
>   ptr_chainend
>
> Although they are arguably more ugly such names would, perhaps, be
> easier to read and recognise, and that would keep the reserved words as
> having a common prefix. A programmer scanning the text of the program
> and seeing the prefix would be able to immediately recognise it as a
> special pointer value and then would only really need to notice the
> specific if it was relevant to why he was looking at the code.
>
> As I say, minor points.
>

These are important points that are at the heart of how your language is
read and written.

Things that are used often, should be easy to read and write. Things
that are used rarely, can be hard. I don't know what you want to do
with your language, but for the sake of argument let's guess it should
be useable where C is used today. How often are null pointers used in C
code? Very often - so make it short, simple, and a keyword in your
language (such as "null"). How often are pointers to signals used?
Almost never - so it's fine if the type is pulled in from system
libraries as "system::signals::signal_pointer", and it most certainly
should /not/ be a reserved keyword.

(As a guide rule, never put anything into the language itself if it can
equally well be put in a system library.)

Bart

unread,
Aug 19, 2021, 11:28:38 AMAug 19
to
On 19/08/2021 15:09, David Brown wrote:
> On 19/08/2021 14:57, Bart wrote:

> I find it very hard to believe that "uint32" is a common typo for
> "int32" and commonly goes unnoticed, and that "u32" vs "i32" is less
> significantly likely to happen or go unnoticed. But unless someone has
> statistics on such errors, we'll never know for sure.

It probably takes me 50% longer to distinguish int32_t from uint32_t,
than i32 and u32. Between int32 and word32, is a bit quicker.

BTW those compact types are also used by Odin and Zig languages, not
just Rust.

> I used C as an example here, not because I think the details of its
> types should be copied. C does things the way it does because they made
> sense at the time, and history has passed since its conception. Despite
> all your moanings and groanings, C's system was worked well for the last
> 50 years and continues to work well now

No, it is still failing now. /You/ might want to adopt the [u]intN_t
types, but you still need to interact with other software that uses char
(especially char*) with its indeterminate signedness; or with int, long
and long long where long may match one of the other two but is
incompatible with neither.

And then you have those off_t types mentioned below.

>> This is the whole reason for the ugly size_t in C. If 'int' was 64 bits,
>> you could forget about size_t, off_t, time_t, clock_t, and all the rest
>> of that zoo.
>>
>
> No, you could not. And I assume you are just being your usual perverse
> argumentative self, rather than actually wanting to learn anything.

I think you can since you just don't see such types anywhere else.

At best there might be a special type such as usize, but that's likely
because many languages are still dominated by a 32-bit int type which is
too small for current data and memory and file sizes.

James Harris

unread,
Aug 19, 2021, 3:25:27 PMAug 19
to
On 19/08/2021 15:23, David Brown wrote:
> On 19/08/2021 14:51, James Harris wrote:
>> On 17/08/2021 22:08, David Brown wrote:

...

>>   ptr_undef
>>   ptr_null
>>
>> By contrast, if I were to use the nullptr that you mentioned then the
>> corresponding keywords would be
>>
>>   undefptr
>>   nullptr
>>
>> which is perhaps getting to be a bit hard to read.
>
> I'd question the usefulness of having these as distinct names or values
> in the first place

So would I. This is just an idea, as yet.

>
> (especially when balanced against the run-time cost
> of manual or automatic checking of pointer validity - a comparison to 0
> is cheap, a comparison to something else is not).

Performance should not be a problem. It will be largely unaffected even
if there are quite a few such constants. For example, say that there
were many (more than two) values of the pointer constants starting with
these

0 = undefined
1 = null
2 = debug
etc

In a paging environment all of them would be in the lowest page. It
would be unmapped. So attempts to dereference any of them would
automatically lead to an exception - at no cost.

Where a bad pointer would have to be detected programmatically (e.g. in
the absence of paging) then instead of the nominal

cmp eax, 0
je badpointer

the generated code could have something like

cmp eax, 16
jb badpointer

Further, many of those tests could be either hoisted to be outside the
inner loop or omitted altogether where it can be proven that the
pointer's value must be in a certain range.

>
> And I'd question the
> usefulness of having "ptr" as part of the name here at all. Remember,
> C++ only has the name "nullptr" because it could not use "null".

I wondered why C++ added nullptr. From what I've found, it seems that
NULL can be automatically converted to an integer and that can cause
problems for C++'s overloading whereas nullptr cannot be so converted. I
expect there's more to it than that but it suggests that a new language
would not have to have both.

>
> You'd perhaps be better having "undefined" as a keyword and allowing it
> for value types as well as pointers. Perhaps it would be a meta-value -
> generating no code, but being useful for the compiler to check that the
> programmer has put a real value in the variable.

I am considering a course which would implement something you suggested
earlier where pointers are declared as either of these:

(pointer to T)
(pointer to T) or (null)

For an identifier declared as the former, setting the pointer to null
would be prohibited. If declared as the latter, however, then
dereferences would essentially need to be wrapped in case statements.

However, that would be part of variant typing where an object is
declared as

(T0) or (T1) or (T2) or (T3) ....

for arbitrary types Tn. Again, uses would need to be wrapped in case
statements. There would only be the one mechanism for variants but it
could be applied to pointers which could potentially be null.

But ATM that's a long way off as it would be complex to implement and I
am at a much earlier stage.

...

>> As I say, minor points.
>>
>
> These are important points that are at the heart of how your language is
> read and written.

Thanks for saying that. It's true that language (and standard library)
design is filled with a myriad of small decisions that each have a
bearing - some large, some small.

...

> (As a guide rule, never put anything into the language itself if it can
> equally well be put in a system library.)

Agreed.


--
James Harris

James Harris

unread,
Aug 19, 2021, 5:06:28 PMAug 19
to
On 19/08/2021 13:51, James Harris wrote:

...

>   ptr_undef
>   ptr_null
>   ptr_debug
>   ptr_signal
>   ptr_chainend

On the topic of having more 'bad pointer' identifications than just NULL
I came across this:

https://lwn.net/Articles/236920/

It suggests a distinction over kmalloc returns such as

not initialized
allocation failed due to lack of space
allocation failed due to depletion of slots
allocated OK but was of zero bytes so do not try to dereference

One suggestion "causes kmalloc(0) to return a special ZERO_SIZE_PTR
value. It is a non-NULL value which looks like a legitimate pointer, but
which causes a fault on any attempt at dereferencing it. Any attempt to
call kfree() with this special value will do the right thing".

Bottom line: there /may/ (and I'd put it no higher than that) be good
reason to have more than one pointer value which cannot be dereferenced.
As written in a reply earlier today it looks as though the different
values could be implemented virtually for free.


--
James Harris

Bart

unread,
Aug 19, 2021, 6:42:12 PMAug 19
to
'equally well'. Things implemented with a library are usually inferior.
Either in how they can be used, or by incurring extra overheads (eg.
slower compilation).

It can also mean users replacing standard features with their own.

David Brown

unread,
Aug 20, 2021, 3:05:32 AMAug 20
to
On 19/08/2021 17:28, Bart wrote:
> On 19/08/2021 15:09, David Brown wrote:
>> On 19/08/2021 14:57, Bart wrote:
>
>> I find it very hard to believe that "uint32" is a common typo for
>> "int32" and commonly goes unnoticed, and that "u32" vs "i32" is less
>> significantly likely to happen or go unnoticed.  But unless someone has
>> statistics on such errors, we'll never know for sure.
>
> It probably takes me 50% longer to distinguish int32_t from uint32_t,
> than i32 and u32. Between int32 and word32, is a bit quicker.
>

Um, okay. I guess I'll take your word for it, rather than asking for
the timing measurements you've made to back up the figures.

> BTW those compact types are also used by Odin and Zig languages, not
> just Rust.

The world is full of obscure and minor programming languages, most of
which almost no one has ever heard of. Some have useful niche
application areas, others are "general purpose" and therefore never used
in practice. Occasionally, one will break out and become popular -
perhaps for good technical reasons, but usually for non-technical
reasons. So who cares how many languages there are on Wikipedia's lists
or Rosetta Stone's language comparisons that happen to use a particular
name for the their types? It's like telling us there is this guy in
Russia who drives a tank to work - therefore a tank is a perfectly
reasonable choice of commuter car.

>
>> I used C as an example here, not because I think the details of its
>> types should be copied.  C does things the way it does because they made
>> sense at the time, and history has passed since its conception.  Despite
>> all your moanings and groanings, C's system was worked well for the last
>> 50 years and continues to work well now
>
> No, it is still failing now. /You/ might want to adopt the [u]intN_t
> types, but you still need to interact with other software that uses char
> (especially char*) with its indeterminate signedness; or with int, long
> and long long where long may match one of the other two but is
> incompatible with neither.

These are all perfectly good types in their place, and are often exactly
what you want. /Sometimes/ you want size-specific types, but often they
are not necessary and a type that adapts to the target for efficiency is
better for portability. When you are dealing with strings and
characters, signedness doesn't matter - it's only in quite niche
situations that you'd want to do arithmetic with 8-bit types.

The problems come when people misunderstand what a type is, or how it
works, or use the wrong type in the wrong circumstance. This may come
as a revelation to you, but C is not unique here - people who don't
understand the language they are using or the code they are writing, or
who write bad code, get poor results.

The only thing special about C is that there is a /vast/ amount of code
written in the language, and vast amounts of it are available for
scrutiny. If there were any measurable amount of code written in your
languages, and any users other than yourself, you'd find approximately
the same percentage of programmers writing the same percentage of poor
code as with any other language.


>
> And then you have those off_t types mentioned below.

"off_t" is not part of C. But don't let that get in the way of another
rant.

>
>>> This is the whole reason for the ugly size_t in C. If 'int' was 64 bits,
>>> you could forget about size_t, off_t, time_t, clock_t, and all the rest
>>> of that zoo.
>>>
>>
>> No, you could not.  And I assume you are just being your usual perverse
>> argumentative self, rather than actually wanting to learn anything.
>
> I think you can since you just don't see such types anywhere else.
>

Ah, ignorance is bliss!

Dmitry A. Kazakov

unread,
Aug 20, 2021, 3:27:10 AMAug 20
to
On 2021-08-20 09:05, David Brown wrote:

> It's like telling us there is this guy in
> Russia who drives a tank to work - therefore a tank is a perfectly
> reasonable choice of commuter car.

Depends on the line of work, how many checkpoints are on the road, if
there is a road at all. Sorry, could not resist.

P.S. As in any non-free state, gun laws in Russia are very strict. In
Germany, I believe, you could buy a decommissioned disarmed tank, but
you could not certify it for the public road. If Elon Musk built a Tesla
tank...

--
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de

James Harris

unread,
Aug 20, 2021, 3:29:05 AMAug 20
to
On 18/08/2021 19:02, Bart wrote:
> On 18/08/2021 16:16, David Brown wrote:


>> int32 and uint32 are simple, clear, unambiguous, easy to type, and
>> cannot seriously be mistaken for anything else.
>
> int32 is OK, that's what I use mostly when I need that specific,
> narrower type.
>
> But I need a bigger difference between signed and unsigned integers.
> Just sticking a 'u' at the front doesn't cut it, unless it replaces 'i'
> with 'u' as happens with i32 and u32.

If you like

i32
u32

are you reserving all identifiers of the form iN and uN where N is an
integer?

What if the programmer wants to use, say, i2 and i3 as identifiers?

On the other hand, if the programmer wants to define a 128-bit integer
and a 21-bit unsigned integer would he write

i128
u21

?

If he wants a 1024-bit unsigned integer would be write

u1024

?

IOW the iN and uN forms are tempting but they seem to be rather limiting.


--
James Harris

David Brown

unread,
Aug 20, 2021, 3:33:09 AMAug 20
to
On 19/08/2021 21:25, James Harris wrote:
> On 19/08/2021 15:23, David Brown wrote:
>> On 19/08/2021 14:51, James Harris wrote:
>>> On 17/08/2021 22:08, David Brown wrote:
>
> ...
>
>>>    ptr_undef
>>>    ptr_null
>>>
>>> By contrast, if I were to use the nullptr that you mentioned then the
>>> corresponding keywords would be
>>>
>>>    undefptr
>>>    nullptr
>>>
>>> which is perhaps getting to be a bit hard to read.
>>
>> I'd question the usefulness of having these as distinct names or values
>> in the first place
>
> So would I. This is just an idea, as yet.
>

Fair enough.

>>
>> (especially when balanced against the run-time cost
>> of manual or automatic checking of pointer validity - a comparison to 0
>> is cheap, a comparison to something else is not).
>
> Performance should not be a problem. It will be largely unaffected even
> if there are quite a few such constants. For example, say that there
> were many (more than two) values of the pointer constants starting with
> these
>
>   0 = undefined
>   1 = null
>   2 = debug
>   etc
>
> In a paging environment all of them would be in the lowest page. It
> would be unmapped. So attempts to dereference any of them would
> automatically lead to an exception - at no cost.
>

That has several costs. One is that it requires a paging environment.
That might be fine - I don't know what your targets are here. And it
might need cooperation with the OS if you want to handle these in
different ways than just a program crash. (You've covered the other big
cost yourself below.)

> Where a bad pointer would have to be detected programmatically (e.g. in
> the absence of paging) then instead of the nominal
>
>   cmp eax, 0
>   je badpointer
>
> the generated code could have something like
>
>   cmp eax, 16
>   jb badpointer
>
> Further, many of those tests could be either hoisted to be outside the
> inner loop or omitted altogether where it can be proven that the
> pointer's value must be in a certain range.
>

On many processors, the difference is a bigger. (And on x86, "test eax,
eax" is more likely to be used than "cmp eax, 0", because it is smaller
and faster IIRC.) For many RISC processors, you would have to load a
constant value 16 into a register before doing the comparison. Some
processors have a "branch if register is 0" instruction, but no flag
register at all. Even the x86 has this kind of thing internally - the
x86 instruction stream may look similar for both comparisons, but the
way they are handled in modern x86 processors can be very different,
with highly optimised paths for the extremely popular "compare to 0"
pattern.

>>
>> And I'd question the
>> usefulness of having "ptr" as part of the name here at all.  Remember,
>> C++ only has the name "nullptr" because it could not use "null".
>
> I wondered why C++ added nullptr. From what I've found, it seems that
> NULL can be automatically converted to an integer and that can cause
> problems for C++'s overloading whereas nullptr cannot be so converted. I
> expect there's more to it than that but it suggests that a new language
> would not have to have both.

That is basically it. It means you can have :

void foo(int);
void foo(char *);

and call "foo(nullptr)" rather than "foo(0)".

Equally, it means you can have :

void bar(int);

and "bar(nullptr)" is a compile-time error, unlike "bar(NULL)".

It also means programmers distinguish their null pointers more clearly
in code, reducing the risk of mistakes and increasing the static
checking that can be done (such as by using gcc's
"-Wzero-as-null-pointer-constant" warning). The use of "0" to mean
either the integer 0 or a null pointer comes from C's history - I
believe in BCPL there was no distinction between integers and pointers
at all. "nullptr" is a step towards improving the language here, though
the historical baggage from C and earlier C++ standards cannot be removed.

>
>>
>> You'd perhaps be better having "undefined" as a keyword and allowing it
>> for value types as well as pointers.  Perhaps it would be a meta-value -
>> generating no code, but being useful for the compiler to check that the
>> programmer has put a real value in the variable.
>
> I am considering a course which would implement something you suggested
> earlier where pointers are declared as either of these:
>
>   (pointer to T)
>   (pointer to T) or (null)
>
> For an identifier declared as the former, setting the pointer to null
> would be prohibited. If declared as the latter, however, then
> dereferences would essentially need to be wrapped in case statements.

I wouldn't use quite that syntax, but I agree with the principle.

>
> However, that would be part of variant typing where an object is
> declared as
>
>   (T0) or (T1) or (T2) or (T3) ....
>
> for arbitrary types Tn. Again, uses would need to be wrapped in case
> statements. There would only be the one mechanism for variants but it
> could be applied to pointers which could potentially be null.
>
> But ATM that's a long way off as it would be complex to implement and I
> am at a much earlier stage.
>

Summation types and pattern matching are a very nice feature in a
language, IMHO.

David Brown

unread,
Aug 20, 2021, 3:39:08 AMAug 20
to
On 20/08/2021 09:27, Dmitry A. Kazakov wrote:
> On 2021-08-20 09:05, David Brown wrote:
>
>> It's like telling us there is this guy in
>> Russia who drives a tank to work - therefore a tank is a perfectly
>> reasonable choice of commuter car.
>
> Depends on the line of work, how many checkpoints are on the road, if
> there is a road at all. Sorry, could not resist.
>
> P.S. As in any non-free state, gun laws in Russia are very strict.

In any /free/ state as well, gun laws are very strict. The only
countries where gun laws are not strict are those without working laws,
and those that misunderstand "free" and think the freedom to shoot
people trumps the freedom to not be shot.

> In
> Germany, I believe, you could buy a decommissioned disarmed tank, but
> you could not certify it for the public road. If Elon Musk built a Tesla
> tank...
>

I should, of course, have used Lithuania as an example for the tank.
After all, even the mayor drives a tank there...

<http://www.baltic-course.com/eng/transport/?doc=46548>

Dmitry A. Kazakov

unread,
Aug 20, 2021, 3:56:23 AMAug 20
to
On 2021-08-20 09:39, David Brown wrote:
> On 20/08/2021 09:27, Dmitry A. Kazakov wrote:

>> P.S. As in any non-free state, gun laws in Russia are very strict.
>
> In any /free/ state as well, gun laws are very strict. The only
> countries where gun laws are not strict are those without working laws,
> and those that misunderstand "free" and think the freedom to shoot
> people trumps the freedom to not be shot.

You confuse free with a benign state which bestows its subjects with
permissions, allowances and licenses. A free state is where free
citizens decide what the state is allowed to do, not the other way around.

But it becomes off-topic...

Bart

unread,
Aug 20, 2021, 6:47:14 AMAug 20
to
On 20/08/2021 08:29, James Harris wrote:
> On 18/08/2021 19:02, Bart wrote:
>> On 18/08/2021 16:16, David Brown wrote:
>
>
>>> int32 and uint32 are simple, clear, unambiguous, easy to type, and
>>> cannot seriously be mistaken for anything else.
>>
>> int32 is OK, that's what I use mostly when I need that specific,
>> narrower type.
>>
>> But I need a bigger difference between signed and unsigned integers.
>> Just sticking a 'u' at the front doesn't cut it, unless it replaces
>> 'i' with 'u' as happens with i32 and u32.
>
> If you like
>
>   i32
>   u32
>
> are you reserving all identifiers of the form iN and uN where N is an
> integer?
>
> What if the programmer wants to use, say, i2 and i3 as identifiers?

I only decided on this scheme (which was int32 and word32) when I
realised I was never going to use anything other than this small set of
power-of-two sizes.

Neither do most other languages of this kind.

Before that I was using general forms like int*4 (i32) or int:64 (i64)
which allowed the possibility of arbitrary sizes. However the language
then needs to decide what to do about int:53 or int:5. Or int*100000.

The compact forms came about later, as I mentioned (but David Brown
doesn't believe) that they are commonly used either in actual languages,
or as colloquial ways of refering to such types.

In my case however I also have bittypes which I call u1, u2 and u4
(which then continue as u8, u16 etc).

Then it sometimes happens that I want variables called t1 and u1, but I
can't!

> On the other hand, if the programmer wants to define a 128-bit integer
> and a 21-bit unsigned integer would he write
>
>   i128
>   u21
>
> ?
>
> If he wants a 1024-bit unsigned integer would be write
>
>   u1024
>
> ?

That's not going to happen. Not in any language of mine. Fixed size
numeric types like that go up to i128/u128 (conceivably one or two
levels further) but that's it. (It's enough of a nightmare just
implementing 128 bits!)

It would anyway need a more general syntax that reserving millions of
identifiers of the form i846464 and u345! Probably int:846464, with the
number part most likely a constant expression.

I do use such syntax for bitfields inside records, for example:

int32 pos : (lineno:23, fileno:9)

Here bitfields are unsigned values.

Anyway, arbitrary-sized integers have been discussed here before.

> IOW the iN and uN forms are tempting but they seem to be rather limiting.

Why, what are you planning? Some languages just give you 'integer' (or
even 'number') and that's it. Specifying 8/16/32/64/128-bit sizes is
usually sufficient.

Anything else sounds like a great idea but is probably not practical and
likely not worthwhile. It's a specialist area that is better covered
with bitfields for 1-127 bits, or arbitrary precision for larger sizes,
each with their own syntax.

Bart

unread,
Aug 20, 2021, 7:18:19 AMAug 20
to
On 20/08/2021 08:05, David Brown wrote:
> On 19/08/2021 17:28, Bart wrote:

> The world is full of obscure and minor programming languages, most of
> which almost no one has ever heard of.

Rust is well known. Zig comes up frequently in forums (even this one in
past threads). Odin less often, but it is mentioned. You can download
all of them and try them out.

There are some really obscure languages on rosettacode, like kapab or
wisp, but the ones above aren't really that obscure.

> The only thing special about C is that

Its type system is not fit for purpose.

>> And then you have those off_t types mentioned below.
>
> "off_t" is not part of C. But don't let that get in the way of another
> rant.

So what is it a part of? Since it came up extensively in a recent clc
thread.

It's a file-offset type used inside struct stat using in sys.h which is
to do with stat() functions.

If I want to use such functions via a FFI, then I might need to find out
what it actually is. But you say it doesn't exist, so that's OK then!

The funny thing is, if I compile this C program:

#include <stdio.h>
#include <sys/types.h>

int main() {
printf("%d\n",(int)sizeof(off_t));
}

I don't get 'unknown identifier' for off_t or some such message; it
seems to know what it is!

(Here, off_t has a concrete type which is i32, but internally is long
int which is distinct from both int (i32 here) and int32_t (also i32).
That's why no one in their right mind would take inspiration from C.)


>> I think you can since you just don't see such types anywhere else.
>>
>
> Ah, ignorance is bliss!

So, enlighten us. It will most likely be languages where you tie
yourself up in knots having a special type for everything.

James Harris

unread,
Aug 20, 2021, 7:55:24 AMAug 20
to
On 20/08/2021 11:47, Bart wrote:
> On 20/08/2021 08:29, James Harris wrote:

...

>> are you reserving all identifiers of the form iN and uN where N is an
>> integer?
>>
>> What if the programmer wants to use, say, i2 and i3 as identifiers?
>
> I only decided on this scheme (which was int32 and word32) when I
> realised I was never going to use anything other than this small set of
> power-of-two sizes.
>
> Neither do most other languages of this kind.
>
> Before that I was using general forms like int*4 (i32) or int:64 (i64)
> which allowed the possibility of arbitrary sizes. However the language
> then needs to decide what to do about int:53 or int:5. Or int*100000.
>
> The compact forms came about later, as I mentioned (but David Brown
> doesn't believe) that they are commonly used either in actual languages,
> or as colloquial ways of refering to such types.
>
> In my case however I also have bittypes which I call u1, u2 and u4
> (which then continue as u8, u16 etc).
>
> Then it sometimes happens that I want variables called t1 and u1, but I
> can't!

So under that scheme

i1 is a type name
i2 is a type name
i3 could be an identifier
i4 is a type name
i5 could be an identifier
i6 could be an identifier
etc

?

Naming integers iN was tempting but I felt that it either took away too
much of the namespace or, as illustrated, would be irregular and fiddly.

...

>> IOW the iN and uN forms are tempting but they seem to be rather limiting.
>
> Why, what are you planning?

If possible (and I haven't implemented it yet) I'd rather have the
number of bits as a qualifier which goes after the type name as follows

int 8 a
int 16 b
int 32 c

etc. I think there's little difference in readability compared with

i8 a
i16 b
i32 c

while the former are flexible, regular and consistent, and they consume
less of the namespace which could otherwise be used for identifiers.

I may need to add some punctuation but I'd avoid it if I could so as to
keep the appearance clean.

>
> Some languages just give you 'integer' (or
> even 'number') and that's it. Specifying 8/16/32/64/128-bit sizes is
> usually sufficient.

Agreed, but such sizes could still be specified in a more regular way.


>
> Anything else sounds like a great idea but is probably not practical and
> likely not worthwhile. It's a specialist area that is better covered
> with bitfields for 1-127 bits, or arbitrary precision for larger sizes,
> each with their own syntax.
>

In another comment you said you didn't like uint but I've tried other
names such as

uns - for unsigned
nat - for natural
nneg - for non-negative

and I have to say that after looking at those

uint

doesn't seem quite so bad! That's especially the case if there's a space
between uint and the size so we are not talking about

uint64

but

uint 64

Having said that, what do you make of uns when compared with uint?


--
James Harris

Bart

unread,
Aug 20, 2021, 9:19:18 AMAug 20
to
I don't use i1 i2 i4, only i8/i16/i32/i64/i128.

Some languages, like Rust, allow i32 to be used as a variable name as
well, since types only appear in certain contexts.

I don't allow that. (I'd need to write `i32 for the variable name. Not
practical for normal coding, but for generated code, it provides a
workaround.)

Many languages now which allow size-specific types will have then as one of:

i32
int32
Int32
int32_t etc

You could say that all these are irregular since int31/int33 are legal
user identifiers, but int32 isn't (well apart from Rust).

This applies to 'int' to:

hnt int jnt ...
ins int inu ...

And actually to most keywords unless the language has a peculiar enough
syntax to allow keywords as identifiers (I think PL/I allowed if if=if ...)

> ...
>
>>> IOW the iN and uN forms are tempting but they seem to be rather
>>> limiting.
>>
>> Why, what are you planning?
>
> If possible (and I haven't implemented it yet) I'd rather have the
> number of bits as a qualifier which goes after the type name as follows
>
>   int 8 a
>   int 16 b
>   int 32 c

This is more flexible (I'd prefer some punctuation or other way of
connecting the number with the type) but as I said, you then have to
deal with extra possibilities:

* Could the number be an expression?

* Could it be the name of a macro or constant that expands to a number?

* If the number is a name, then int a ... becomes ambiguous; are you
defining an int called 'a',or is 'a' a name that expands to '32', and
the actual variable name follows?

* What to do about invalid sizes?

* Could such a number appear also after a user-defined type; for example
if an alias 'T' for 'int' was created, would 'T 8 a' be allowed?

This is where I decided to just define a handful of fixed names and be
done with it.

>

>> Some languages just give you 'integer' (or even 'number') and that's
>> it. Specifying 8/16/32/64/128-bit sizes is usually sufficient.
>
> Agreed, but such sizes could still be specified in a more regular way.

I don't really agree because the possibilties are so small, and
generally hard-coded at each place where they are used. It's not like an
array:

int A[N]

where N can be literally anything, or might not be known until runtime.

However, you only have to look at other languages:

Java Odin Zig Rust C# D Nim Go ...

and they are all follow the same pattern: a set of fixed names, either
like byte/short/int/long, or with size suffixes: int8/int16/int32/int64.


>
>> Anything else sounds like a great idea but is probably not practical
>> and likely not worthwhile. It's a specialist area that is better
>> covered with bitfields for 1-127 bits, or arbitrary precision for
>> larger sizes, each with their own syntax.
>>
>
> In another comment you said you didn't like uint but I've tried other
> names such as
>
>   uns - for unsigned
>   nat - for natural
>   nneg - for non-negative
>
> and I have to say that after looking at those
>
>   uint
>
> doesn't seem quite so bad! That's especially the case if there's a space
> between uint and the size so we are not talking about
>
>   uint64
>
> but
>
>   uint 64
>
> Having said that, what do you make of uns when compared with uint?

Here I agree, uint is better than uns, nat, and nneg! Uint or variations
is also commonly used so that wouldn't be a bad choice.

But in my case, I've always had distinct type names for unsigned integers.

For example, 'byte', 'word', 'long' were all unsigned (these were u8,
u16, u32). Now 'word' is u64, except in my assembler where a 'w' in some
reserved words indicates u16. I no longer use 'long'.

Signed integers have always either included 'int', or 'i' for the
compact form.

Notice however how I'm using u16, u64 etc to clarify exactly what a type
is? So if those forms are that unambiguous in discussions, what not have
them in a language too?

Bart

unread,
Aug 20, 2021, 9:42:50 AMAug 20
to
On 20/08/2021 08:33, David Brown wrote:
> On 19/08/2021 21:25, James Harris wrote:

>> I wondered why C++ added nullptr. From what I've found, it seems that
>> NULL can be automatically converted to an integer and that can cause
>> problems for C++'s overloading whereas nullptr cannot be so converted. I
>> expect there's more to it than that but it suggests that a new language
>> would not have to have both.
>
> That is basically it. It means you can have :
>
> void foo(int);
> void foo(char *);
>
> and call "foo(nullptr)" rather than "foo(0)".
>
> Equally, it means you can have :
>
> void bar(int);
>
> and "bar(nullptr)" is a compile-time error, unlike "bar(NULL)".
>
> It also means programmers distinguish their null pointers more clearly
> in code, reducing the risk of mistakes and increasing the static
> checking that can be done (such as by using gcc's
> "-Wzero-as-null-pointer-constant" warning). The use of "0" to mean
> either the integer 0 or a null pointer comes from C's history - I
> believe in BCPL there was no distinction between integers and pointers
> at all. "nullptr" is a step towards improving the language here, though
> the historical baggage from C and earlier C++ standards cannot be removed.

This is what I did recently. While I've always had 'nil' for a null
pointer (null is better as an adjective than nil!), integers could be
used freely with pointers, at least on my older implementations.

Now the null pointer value can only be denoted as 'nil', not 0.

David Brown

unread,
Aug 20, 2021, 9:49:29 AMAug 20
to
On 20/08/2021 13:18, Bart wrote:
> On 20/08/2021 08:05, David Brown wrote:
>> On 19/08/2021 17:28, Bart wrote:
>
>> The world is full of obscure and minor programming languages, most of
>> which almost no one has ever heard of.
>
> Rust is well known. Zig comes up frequently in forums (even this one in
> past threads). Odin less often, but it is mentioned. You can download
> all of them and try them out.

Rust has a lot of hype, and is more known than used. Zig is just
another "safer than Ada, faster than C" wannabe. Odin likewise (though
it hasn't even reached the stage of having a Wikipedia entry).

Maybe one of these will become popular and reach the mainstream. Almost
certainly they all have technical advantages compared to C or existing
mainstream languages - almost certainly they also have disadvantages,
and absolutely certainly they have features that are subjective and
controversial.

None are big enough that compatibility or familiarity with them is
remotely relevant in defining a new language. (But by all means look at
them and copy any good ideas you see.)

>
> There are some really obscure languages on rosettacode, like kapab or
> wisp, but the ones above aren't really that obscure.
>
>> The only thing special about C is that
>
> Its type system is not fit for purpose.
>

I'm sure some of the millions of C programmers would have noticed that
during the last 50 years, and realised if C did not work.

>>> And then you have those off_t types mentioned below.
>>
>> "off_t" is not part of C.  But don't let that get in the way of another
>> rant.
>
> So what is it a part of? Since it came up extensively in a recent clc
> thread.

It is part of POSIX. It is used to hold offsets within files, for some
kinds of file functions. (I don't know the details.) On some systems,
it will be 32-bit, and on others it will be 64-bit. It may even be
64-bit on a 32-bit system with C90 and no C99 support - in which case
there are no portable 64-bit types. But because it is a named type
defined in the headers for your POSIX system, it means you can use it to
write portable code that can be compiled on a range of POSIX targets and
work correctly. That would not be possible if you used a fundamental C
type (like "long int"), or a fixed-size type.

>
> It's a file-offset type used inside struct stat using in sys.h which is
> to do with stat() functions.
>
> If I want to use such functions via a FFI, then I might need to find out
> what it actually is. But you say it doesn't exist, so that's OK then!
>
> The funny thing is, if I compile this C program:
>
>   #include <stdio.h>
>   #include <sys/types.h>
>
>   int main() {
>       printf("%d\n",(int)sizeof(off_t));
>   }
>
> I don't get 'unknown identifier' for off_t or some such message; it
> seems to know what it is!

What part of "#include <sys/types.h>" is confusing you?

>
> (Here, off_t has a concrete type which is i32, but internally is long
> int which is distinct from both int (i32 here) and int32_t (also i32).
> That's why no one in their right mind would take inspiration from C.)
>
>
>>> I think you can since you just don't see such types anywhere else.
>>>
>>
>> Ah, ignorance is bliss!
>
> So, enlighten us. It will most likely be languages where you tie
> yourself up in knots having a special type for everything.
>

Every portable language that has decent wrappings for OS calls will use
OS-specific named types at the lowest level (though they might translate
them to language-specific types in the wrappings). Of course, lots of
language just assume "all the world is 64-bit *nix", or "all the world
is 32-bit windows".

But of course if you think types are a bad idea, and type-safety is for
wimps who are scared of a little program crash, then you will consider
any use of these types as "tying yourself up in knots".

Bart

unread,
Aug 20, 2021, 10:25:28 AMAug 20
to
On 20/08/2021 14:49, David Brown wrote:
> On 20/08/2021 13:18, Bart wrote:
>> On 20/08/2021 08:05, David Brown wrote:
>>> On 19/08/2021 17:28, Bart wrote:
>>
>>> The world is full of obscure and minor programming languages, most of
>>> which almost no one has ever heard of.
>>
>> Rust is well known. Zig comes up frequently in forums (even this one in
>> past threads). Odin less often, but it is mentioned. You can download
>> all of them and try them out.
>
> Rust has a lot of hype, and is more known than used. Zig is just

The point is, does everyone know or can make an excellent guess as to
what is meant by 'i64' in the context of programming languages and
primitive types?


> It is part of POSIX. It is used to hold offsets within files, for some
> kinds of file functions. (I don't know the details.) On some systems,
> it will be 32-bit, and on others it will be 64-bit. It may even be
> 64-bit on a 32-bit system with C90 and no C99 support - in which case

So is likely to be one of i32 u32 i64 u64.

If I had to write a function right now to return a file offset (that is,
the +ve or -ve difference between two locations in one file), I'd use an
i64 type which covers offsets of +/- 9 billion billion bytes approx.

> What part of "#include <sys/types.h>" is confusing you?

What parts of those angle brackets is confusing /you/?

On Windows systems such a header file is only located within the system
headers of a C implementation.

But if your point is that off_t doesn't appear within the C standard,
then take your pick from size_t, time_t, clock_t, fpos-t, ptrdiff_t,
maxalign_t, wint_t, rsize_t, errno_t, ...

There are actually 100s of such types. I'd say the vast majority are
merely one of the u8-u64 and i8-i64 types.



>>>> I think you can since you just don't see such types anywhere else.
>>>>
>>>
>>> Ah, ignorance is bliss!
>>
>> So, enlighten us. It will most likely be languages where you tie
>> yourself up in knots having a special type for everything.
>>
>
> Every portable language that has decent wrappings for OS calls will use
> OS-specific named types at the lowest level (though they might translate
> them to language-specific types in the wrappings). Of course, lots of
> language just assume "all the world is 64-bit *nix", or "all the world
> is 32-bit windows".

Pretty much every processor including most microcontrollers will be
using power-of-type types. So lots of languages will assume that too;
why not?


> But of course if you think types are a bad idea, and type-safety is for
> wimps who are scared of a little program crash, then you will consider
> any use of these types as "tying yourself up in knots".

There might be some point to it if a language stopped me from doing
time_t + size_t + errno_t, but in the case of C, it doesn't.

It just makes it harder for people using libraries via FFIs when the API
makes use of those highly specific C types.

I have to use elaborate tools to do the conversion. Then that struct
stat which typically uses one custom typedef per member ends up looking
like this:

record stat = $caligned
u32 st_dev
u16 st_ino
u16 st_mode
i16 st_nlink
i16 st_uid
i16 st_gid
u32 st_rdev
u32 st_size
u64 st_atime
u64 st_mtime
u64 st_ctime
end

No other input is now needed. Now go at look at some actual declarations
of 'struct stat', if you can even pinpoint the one used for your
platform of interest, amongst all the multiple declarations and
conditional code blocks, once you've located the header file from the
tower of include statements.

James Harris

unread,
Aug 20, 2021, 10:45:52 AMAug 20
to
On 20/08/2021 14:19, Bart wrote:
> On 20/08/2021 12:55, James Harris wrote:

...

>>    int 8 a
>>    int 16 b
>>    int 32 c
>
> This is more flexible (I'd prefer some punctuation or other way of
> connecting the number with the type) but as I said, you then have to
> deal with extra possibilities:

...

> * Could such a number appear also after a user-defined type; for example
> if an alias 'T' for 'int' was created, would 'T 8 a' be allowed?

That's a good question and it leads to a bigger topic. As we are already
OT on this one I'll start a new thread.


--
James Harris

James Harris

unread,
Aug 20, 2021, 11:27:54 AMAug 20
to
On 20/08/2021 14:42, Bart wrote:
> On 20/08/2021 08:33, David Brown wrote:
>> On 19/08/2021 21:25, James Harris wrote:

...

>> The use of "0" to mean
>> either the integer 0 or a null pointer comes from C's history - I
>> believe in BCPL there was no distinction between integers and pointers
>> at all.  "nullptr" is a step towards improving the language here, though
>> the historical baggage from C and earlier C++ standards cannot be
>> removed.
>
> This is what I did recently. While I've always had 'nil' for a null
> pointer


> (null is better as an adjective than nil!),

Most apropos!

And "nil points" can sound very French. ;-)


> integers could be
> used freely with pointers, at least on my older implementations.
>
> Now the null pointer value can only be denoted as 'nil', not 0.
>

What if you want to set a pointer to a specific address or you want to
compare addresses? Do you now have conversion functions?


--
James Harris

James Harris

unread,
Aug 20, 2021, 11:49:18 AMAug 20
to
On 20/08/2021 08:33, David Brown wrote:

...

> The use of "0" to mean
> either the integer 0 or a null pointer comes from C's history - I
> believe in BCPL there was no distinction between integers and pointers
> at all.

Just checked. BCPL treated memory as an array of words. "Pointers to
consecutive words of memory are consecutive integers" and "words of
memory have consecutive integer addresses".

https://www.cl.cam.ac.uk/~mr10/bcplman.pdf

I guess that means that addresses would be scaled by the number of bytes
per word.


--
James Harris

James Harris

unread,
Aug 20, 2021, 11:54:09 AMAug 20
to
On 20/08/2021 15:25, Bart wrote:
> On 20/08/2021 14:49, David Brown wrote:
>> On 20/08/2021 13:18, Bart wrote:
>>> On 20/08/2021 08:05, David Brown wrote:
>>>> On 19/08/2021 17:28, Bart wrote:
>>>
>>>> The world is full of obscure and minor programming languages, most of
>>>> which almost no one has ever heard of.
>>>
>>> Rust is well known. Zig comes up frequently in forums (even this one in
>>> past threads). Odin less often, but it is mentioned. You can download
>>> all of them and try them out.
>>
>> Rust has a lot of hype, and is more known than used.  Zig is just
>
> The point is, does everyone know or can make an excellent guess as to
> what is meant by 'i64' in the context of programming languages and
> primitive types?

Isn't there more to it than that in that if the person guessed correctly
for i64 would he also guess correctly for i48 or i8 or i3. I would
suggest that under your scheme a reasonable person would be likely to
guess some of them wrong! :-)

So as I say it's not as simple as getting someone to guess at an easy name.


--
James Harris

Bart

unread,
Aug 20, 2021, 12:49:22 PMAug 20
to
Probably an educated guess would be a better term, as they may have come
across i64 etc before, and recognised those power-of-two values as also
used in similar type names.

Not so for i3. i8 is a funny one; I'm a little uneasy with single-digit
widths like this, but there is only u8 and i8.

(In the past I used more byte widths than bit widths, so such names may
have been i1 i2 i4 i8, and actually were, internally. Now I mainly use
bit widths, partly to avoid the crossover point at 'i8', which might
mean 64 bits under the old system, or 8 bits under the new.)

Andy Walker

unread,
Aug 20, 2021, 12:49:41 PMAug 20
to
On 20/08/2021 16:27, James Harris wrote:
> What if you want to set a pointer to a specific address or you want
> to compare addresses? Do you now have conversion functions?

I've asked this before, and don't recall getting a definitive
answer. Is your language intended to be high-level and portable, which
is what I understood, or not? If so, then "specific address" should
not be part of it, nor should comparing addresses [exc, as in C, for
(in)equality or within one array/structure]. If you need to address,
[eg] the hardware clock, that would need to be done via a library
facility, not [eg] by "clockptr := 123456;". You don't need to ask
other posters about "life imitating Bart".

As for questions about whether "int16" is "reserved", isn't
that best dealt with by a notional outer layer in which standard
library facilities [inc types] are declared? Then programmers who
want to can declare their own versions of [eg] "sqrt" and "int",
without trampling on the library facilities. [Eg, you could want
an own-version of these for debugging purposes while leaving the
underlying program unchanged.]

--
Andy Walker, Nottingham.
Andy's music pages: www.cuboid.me.uk/andy/Music
Composer of the day: www.cuboid.me.uk/andy/Music/Composers/Soler

James Harris

unread,
Aug 20, 2021, 1:19:39 PMAug 20
to
On 20/08/2021 17:49, Andy Walker wrote:
> On 20/08/2021 16:27, James Harris wrote:


>> What if you want to set a pointer to a specific address or you want
>> to compare addresses? Do you now have conversion functions?
>
>     I've asked this before, and don't recall getting a definitive
> answer.  Is your language intended to be high-level and portable, which
> is what I understood, or not?

If you are asking that of me rather than Bart then Yes, it's meant to be
/very/ portable :-) but that doesn't prevent it from being used to write
hardware drivers or prevent it from reading in addresses as parameters.


> If so, then "specific address" should
> not be part of it, nor should comparing addresses [exc, as in C, for
> (in)equality or within one array/structure].

Well, take as an example a VGA driver. On a common-or-garden PC which is
running its video in VGA mode there's a standard address for the frame
buffer which is, specifically, 0xb8000. The address might be different
on different hardware but it can be provided as a parameter to the VGA
driver, e.g. from a configuration file. The language would then need to
allow a program to use that address to form a pointer. Hence the need
for a conversion function.

I was asking Bart if he had conversion functions in both directions:
address-to-pointer and pointer-to-address.


> If you need to address,
> [eg] the hardware clock, that would need to be done via a library
> facility, not [eg] by "clockptr := 123456;".

Indeed. But the library facility also needs to be written!


> You don't need to ask
> other posters about "life imitating Bart".

:-)


>
>     As for questions about whether "int16" is "reserved", isn't
> that best dealt with by a notional outer layer in which standard
> library facilities [inc types] are declared?  Then programmers who
> want to can declare their own versions of [eg] "sqrt" and "int",
> without trampling on the library facilities.  [Eg, you could want
> an own-version of these for debugging purposes while leaving the
> underlying program unchanged.]
>

Again, if that's a question for me rather than for Bart I have a plan to
define all types (possibly except boolean) in the standard library
rather than making them part of the language. The language would define
keywords, control structures, operators (without defining their
meanings), and namespace management rules. The standard library (or
libraries) would supply standard types along with their behaviour, and
various data structures. And programmers could supply their own types in
exactly the same way. At least that's the theory!


--
James Harris

James Harris

unread,
Aug 20, 2021, 1:26:09 PMAug 20
to
On 20/08/2021 17:49, Bart wrote:
> On 20/08/2021 16:54, James Harris wrote:
>> On 20/08/2021 15:25, Bart wrote:

...

>>> The point is, does everyone know or can make an excellent guess as to
>>> what is meant by 'i64' in the context of programming languages and
>>> primitive types?
>>
>> Isn't there more to it than that in that if the person guessed
>> correctly for i64 would he also guess correctly for i48 or i8 or i3. I
>> would suggest that under your scheme a reasonable person would be
>> likely to guess some of them wrong! :-)
>>
>> So as I say it's not as simple as getting someone to guess at an easy
>> name.
>
> Probably an educated guess would be a better term, as they may have come
> across i64 etc before, and recognised those power-of-two values as also
> used in similar type names.
>
> Not so for i3. i8 is a funny one; I'm a little uneasy with single-digit
> widths like this, but there is only u8 and i8.
>
> (In the past I used more byte widths than bit widths, so such names may
> have been i1 i2 i4 i8, and actually were, internally. Now I mainly use
> bit widths, partly to avoid the crossover point at 'i8', which might
> mean 64 bits under the old system, or 8 bits under the new.)

Sounds as though your educated guesser may have to do an awful lot of
educated guessing! ;-)

But I agree with you that bit widths rule the day. I'd even use bits for
allocation of storage. For digital computing bits are fundamental.
Eight-bit bytes, by contrast, may one day turn out to be just what we
used in the 20th and 21st centuries.


--
James Harris

David Brown

unread,
Aug 20, 2021, 2:34:56 PMAug 20
to
On 20/08/2021 16:25, Bart wrote:
> On 20/08/2021 14:49, David Brown wrote:
>> On 20/08/2021 13:18, Bart wrote:
>>> On 20/08/2021 08:05, David Brown wrote:
>>>> On 19/08/2021 17:28, Bart wrote:
>>>
>>>> The world is full of obscure and minor programming languages, most of
>>>> which almost no one has ever heard of.
>>>
>>> Rust is well known. Zig comes up frequently in forums (even this one in
>>> past threads). Odin less often, but it is mentioned. You can download
>>> all of them and try them out.
>>
>> Rust has a lot of hype, and is more known than used.  Zig is just
>
> The point is, does everyone know or can make an excellent guess as to
> what is meant by 'i64' in the context of programming languages and
> primitive types?
>
>
>> It is part of POSIX.  It is used to hold offsets within files, for some
>> kinds of file functions.  (I don't know the details.)  On some systems,
>> it will be 32-bit, and on others it will be 64-bit.  It may even be
>> 64-bit on a 32-bit system with C90 and no C99 support - in which case
>
> So is likely to be one of i32 u32 i64 u64.

It is going to be signed (since the POSIX standard says so). But it is
not necessarily the same type as int32_t or int64_t (we are talking
about C here - whether you like them or not, those are the standard
names) even if it is the same size. (Or have you forgotten how type
compatibility works in C?) It is quite possible for it to be an extra
implementation-specific type.

Remember, there can be different integer types with the same size and
signedness which are not compatible with each other. I am quite sure
you don't like that, and will tell us that in /your/ language things are
different. I'm not sure I like it myself, though it is not something
that bothers me in my programming. (I'd prefer them either to be fully
compatible, or strong types that are not compatible at all without
explicit type changes.)

To be fair, however, I agree that it is quite likely that off_t is the
same type as either int32_t or int64_t. I just can't imagine writing
decent code that would rely on that. (I have used systems where int32_t
and uint32_t, etc., are completely independent from the fundamental C
types despite having the same size.)

>
> If I had to write a function right now to return a file offset (that is,
> the +ve or -ve difference between two locations in one file), I'd use an
> i64 type which covers offsets of +/- 9 billion billion bytes approx.
>

Some people, on the other hand, write code that uses the specified types
for the tasks and thus have code that is more portable. You write code
for yourself in your own languages for your own computers, and don't
have to be concerned about the world outside.

>> What part of "#include <sys/types.h>" is confusing you?
>
> What parts of those angle brackets is confusing /you/?
>

Nothing - but then, I know what they mean in C. They don't mean the
file is part of the C implementation. I could give you a reference in
the C standards, but I know that would only annoy you since you want C
to be how /you/ define it, not how anyone else defines it.

> On Windows systems such a header file is only located within the system
> headers of a C implementation.

That may be true on your Windows system. It is not true in general for C.

And even more relevant, the standard headers that are part of the C
standard are all documented in the C standard (oddly enough) - and
"sys/types.h" is not there.

>
> But if your point is that off_t doesn't appear within the C standard,
> then take your pick from size_t, time_t, clock_t, fpos-t, ptrdiff_t,
> maxalign_t, wint_t, rsize_t, errno_t, ...
>

A few of these are actually in the C standard - well done!

> There are actually 100s of such types.> I'd say the vast majority are
> merely one of the u8-u64 and i8-i64 types.
>

There are quite a lot of types defined in the C standard - I doubt if it
as many as 100, but I haven't counted them. The solid majority are not
the same as any of the intNN_t or uintNN_t types, even if you get the
naming right. (clock_t and time_t are floating point types, all the
atomic types are different from the non-atomic types, the div_t types
are structures, etc.)

As for the scalar integer POSIX types, these are most likely to be one
of the intNN_t or uintNN_t types. But they are given names because they
may be of different sizes on different systems. Some will be 32-bit on
one system and 64-bit on another one. Some are different for different
versions of the same OS on the same target. Some are different for
different configuration options on the same version of the same OS on
the same target. So you use the named POSIX types and get the right
size at compile time.

The same applies to other OS's, and other portable libraries. The only
exception is in the DOS and Windows world, where programmers have a long
tradition of writing non-portable shite code that makes unwarranted
assumptions about how /they/ don't need to follow rules about type names
or other portability issues. It's because of cowboy programmers with
their "i32 should be good enough for any use" attitude that mean the
world is stuck with polished turds for computers, because you can't get
their crap code to run on anything better.

>
>
>>>>> I think you can since you just don't see such types anywhere else.
>>>>>
>>>>
>>>> Ah, ignorance is bliss!
>>>
>>> So, enlighten us. It will most likely be languages where you tie
>>> yourself up in knots having a special type for everything.
>>>
>>
>> Every portable language that has decent wrappings for OS calls will use
>> OS-specific named types at the lowest level (though they might translate
>> them to language-specific types in the wrappings).  Of course, lots of
>> language just assume "all the world is 64-bit *nix", or "all the world
>> is 32-bit windows".
>
> Pretty much every processor including most microcontrollers will be
> using power-of-type types. So lots of languages will assume that too;
> why not?
>

I didn't say they shouldn't assume power-of-two types.

>
>> But of course if you think types are a bad idea, and type-safety is for
>> wimps who are scared of a little program crash, then you will consider
>> any use of these types as "tying yourself up in knots".
>
> There might be some point to it if a language stopped me from doing
> time_t + size_t + errno_t, but in the case of C, it doesn't.
>

I'd be happier with stronger typing - I use strong types in C++.

But that's not the point of these types.

> It just makes it harder for people using libraries via FFIs when the API
> makes use of those highly specific C types.
>
> I have to use elaborate tools to do the conversion. Then that struct
> stat which typically uses one custom typedef per member ends up looking
> like this:
>
>     record stat = $caligned
>         u32 st_dev
>         u16 st_ino
>         u16 st_mode
>         i16 st_nlink
>         i16 st_uid
>         i16 st_gid
>         u32 st_rdev
>         u32 st_size
>         u64 st_atime
>         u64 st_mtime
>         u64 st_ctime
>     end
>
> No other input is now needed. Now go at look at some actual declarations
> of 'struct stat', if you can even pinpoint the one used for your
> platform of interest, amongst all the multiple declarations and
> conditional code blocks, once you've located the header file from the
> tower of include statements.

If you use a tool to generate system-specific and target-specific
interface wrappers (and that's a perfectly good solution), then what's
your complaint? Is it just that when the C and Unix founding fathers
designed their language and OS, that they didn't have enough
consideration for how those design decisions would cost Bart several
hours of extra effort?

Bart

unread,
Aug 20, 2021, 2:39:43 PMAug 20
to
It turns out the i32-style is also used by LLVM for the source format of
its intermediate language:

define dso_local void @Main() #0 {
%1 = alloca i32, align 4
%2 = alloca i32, align 4
%3 = alloca i32, align 4
%4 = alloca i32, align 4

store i32 2, i32* %2, align 4
....

(I remember reading that LLVM allows integer types up to 2**23 or
possibly 2**24; I don't know how those would be represented.)

I think even David Brown must have heard of LLVM!

I use them also in my own intermediate language; a new project with a
discrete textual format, where an instruction looks like this:

add i64

This language directly supports arithmetic on i8-128, and i8-u128 types.
Anything else has to be programmed on top. All this IL does is allow you
to specify and manipulate (push/pop/pass/return) block data types in
general, eg. block:72 for an int573 type (types must be byte multiples).

I think if you have in mind completely arbitrary integer widths, and
intend to define an independent IL, then you need to consider what
capabilities /that/ will have, and how much effort it would require.

Bart

unread,
Aug 20, 2021, 2:49:26 PMAug 20
to
On 20/08/2021 18:19, James Harris wrote:
> On 20/08/2021 17:49, Andy Walker wrote:

>
>> If so, then "specific address" should
>> not be part of it, nor should comparing addresses [exc, as in C, for
>> (in)equality or within one array/structure].
>
> Well, take as an example a VGA driver. On a common-or-garden PC which is
> running its video in VGA mode there's a standard address for the frame
> buffer which is, specifically, 0xb8000. The address might be different
> on different hardware but it can be provided as a parameter to the VGA
> driver, e.g. from a configuration file. The language would then need to
> allow a program to use that address to form a pointer. Hence the need
> for a conversion function.
>
> I was asking Bart if he had conversion functions in both directions:
> address-to-pointer and pointer-to-address.

I don't have anything as fancy as separate pointer and address types.

Just explicit conversions between pointers and integers.

Since everything is 64 bits, mostly they are no-ops.

> Again, if that's a question for me rather than for Bart I have a plan to
> define all types (possibly except boolean) in the standard library
> rather than making them part of the language.

I think Julia does something like that:

primitive type Int32 <: Signed 32 end

(https://docs.julialang.org/en/v1/manual/types/)

As I said, I'm not a fan of that approach, as I think it's simpler,
tidier and more efficient just to build these in.


Dmitry A. Kazakov

unread,
Aug 20, 2021, 2:53:15 PMAug 20
to
On 2021-08-20 19:19, James Harris wrote:

> Again, if that's a question for me rather than for Bart I have a plan to
> define all types (possibly except boolean) in the standard library
> rather than making them part of the language.

Your language lacks necessary abstraction level for that. Higher level
languages have type-algebraic operations to produce types. Like "long"
in Algol 68, or type range <>, type mod <> in Ada. The syntax of such
operations is less relevant, if you like unreadable languages it could
well be

u(<number>)

where <number> is a static positive-valued expression. Note an
*expression*. If you have that in the language you do not need any
library, if you don't you will have to write your library in C.

Bart

unread,
Aug 20, 2021, 5:01:22 PMAug 20