On 21/08/2021 20:57, Bart wrote:
> On 21/08/2021 19:24, David Brown wrote:
>> On 21/08/2021 15:08, Bart wrote:
>
>>> However there is no direct equivalent for C's 'long', which is
>>> apparently 64 bits on 64-bit Linux, and 32 bits everywhere else (on my
>>> x86 and arm targets).
>>>
>>
>> "long" is 64-bit on every 64-bit architecture except Windows. It is
>> /always/ Windows that is the odd one out in these things, not Linux.
>
> You can equally say that "long" is 32-bit on every 32/64-bit
> architecture except Linux.
You could say that, if you wanted to be wrong, to the extend that you
were actually making any sense (Linux is not an architecture). Perhaps
you are not aware that the computer world does not revolve around
Windows, and that Linux is not the only non-Windows system around.
Long always has to be at least 32 bits. So there is no option for its
size (on "normal" power-of-two systems) until you have 64-bit as a
native size. And then you find that on /every/ system, except Windows,
"long" is 64-bit. This applies to mainframes, supercomputers and
workstations from before Windows was even properly 32-bit and before
Linux was a twinkle in Linus' eye. It applies to embedded systems, *nix
systems, Macs, and anything else.
>
> And that "long long" is 64-bit on every 32/64-bit architecture.
You could say that if you wanted to be a bit pointless, given that by
definition it must be at least 64-bit, and there are no produced systems
with native sizes greater than 64-bit.
>
> From that it follows that "long long" is twice the width of "long" on
> every architecture, 32 /and/ 64-bit, except Linux.
>
Except it doesn't. It's bollocks.
> It seems to me that Linux is the odd one out!
It seems to the rest of the world that you don't know what you are
talking about.
>
> Actually neither get it right. Probably 'int' should have been 32 bits
> on every 32/64-bit system, and 'long' should have been 64 bits on those
> same systems.
On that point, I agree to some extent. I don't think the way C defines
its fundamental types is ideal, and I don't think C implementations have
always made the best choices.
The mistake, as I see it, is that the emphasis has been wrong. In
practice in the C world, you the implementation gives you type "int" as
your most convenient type, but its size depends on the implementation.
The way it /should/ work is that the /programmer/ should ask for a fast
type that works up to at least N, and the type provided by the
implementation should be able to vary to suit.
Now, in theory, C has this, for particular values of N. The signed
types signed char, short, int, long, long long are precisely such types
for N = 2 ^ 7 - 1, 2 ^15 - 1, 2 ^ 31 - 1 and 2 ^ 63 - 1. But in
practice, that's not how people use them.
>
> "long long" would either not be necessary, or could be reserved for 128
> bits, and you have int/long types that are guaranteed sizes on those
> architectures. (And, in C, could be confident about continuing to use
> the standard printf formats and literal suffixes for those types.)
>
> At the moment you have programs where developers have either used "long"
> interchangeably with "int" on 32-bit systems with little thought, that
> will be inefficient or go subtly wrong on Linux64.
Yes, programmers get things wrong, and make incorrect assumptions. You
do it all the time, no matter how often people tell you your assumptions
don't apply in a wider context.
This is much more of a problem in the Windows world than outside it, as
in the wider world programmers have been using a variety of different
systems and sizes all their lives.
Perhaps you don't understand /why/ every 64-bit system has 64-bit long,
except Windows? So here is a little history lesson.
In the world at large, 64-bit has been mainstream since the 1990's, with
supercomputers using 64-bit long before that. It preceded C99, and
certainly preceded the large-scale adoption of C99 (which was quite
slow). Programmers needed a 64-bit type - it had to be "long" because
there was no other standard type available (excluding weird Cray's,
where everything from "short" upwards was 64-bit).
And programmers for non-Windows systems got used to this quickly, and
rarely made mistakes, mixups or assumptions about type sizes. *nix
programmers in particular have always been used to writing code that
would work on a variety of different *nix systems and a variety of
different processors, with different sizes and endianness. They were
not perfect, of course, but pretty good - code written for one cpu size
would usually work fine on the other size. One thing that did crop up
was using "long" when they needed an integer type for pointers, as
"uintptr_t" did not exist prior to C99. However, that still worked
across 32/64 bit changes.
Back in the DOS/Windows world, programmers were still using 16-bit while
others were at 32-bit or more. There people were used to "long" for
32-bit, and to programming with the assumption that their code would
never have to be portable to anything else, as 16-bit DOS/Windows would
last forever. People gradually moved to 32-bit Windows, but again
assumed that was the whole world. And MS didn't support C99 types at
all. So when you needed 32-bit types, you used "long" just as you
always had done, and you assumed it would always be 32-bit - it was
easier than trying to understand the myriad of windows-specific type
names, and you had nothing to gain from using them (unlike in the *nix
world) because your code would never be used on anything different.
So when MS finally started getting 64-bit Windows underway, they had a
choice. Should they make "long" 32-bit, so that code that assumed that
size would still work? Should they make it 64-bit, so that code that
assumed it matched a pointer would still work? They picked 32-bit,
different from everyone else before them (they also picked a different,
and less efficient, ABI for x86-64 than everyone else uses). Whichever
they picked, it would mean a great deal of C code written for Windows
could not be compiled as 64-bit because of invalid assumptions and poor
typing.
I suppose they picked the size that they thought would involve the least
work to fix. If you like conspiracy theories (and MS has a proven track
record of being willing to make life hard for third-party developers and
end-users if it also annoys their competition), you might note that
because they did not support C99, using 32-bit "long" means that C
programmers for 64-bit Windows have no standard way of dealing with
64-bit integers. Was this to further discourage people from programming
in C, writing code that might work on other systems, and move them over
to C# that was windows-only and controlled by MS ?
>
> Or developers who have assumed 64-bit 'long' on Linux64 without
> realising that on Linux32 or on Windows, it will be 32-bit and their
> programs might not work.>
> The actual situation for me is that I never use 'long' because it is too
> badly defined; I need types that are a known size across platforms.
>
>>> However types that need to represent file offsets for example, would
>>> probably still need to be signed 64 bits; a 32-bit machine will not
>>> necessarily be full of small files!
>>>
>>
>> Equally, it will not necessarily have any need of /large/ files - and
>> not want the waste of using 64-bit types when 32-bit will do.
>>
>> And remember that while /you/ have the choice of what systems you want
>> to support with your languages, C supports almost /everything/. Some
>> programs written in C are written to support a wide range of systems -
>> including systems that come from an era where 32-bit was more than
>> enough for file sizes because disk size was measured in hundreds of
>> megabytes. Use "off_t" instead of "i64" (or whatever you want to call
>> it), and the program works on systems written in the 90's as well as
>> systems written in 2020 - whether the program was written in 1990 or in
>> 2020.
>
> Except that in 2020 it is likely to be used for much bigger data than in
> 1990.
>
The vast majority of programs don't need big data. And while it is
obviously true that most PC code now written will mostly be used on
64-bit systems, /some/ people still write code that is usable on older
or more limited systems. For C, and for any libraries or headers
useable from C, that flexibility is vital. For /new/ languages, you can
of course decide that you don't care anything except 64-bit systems.
> In any case, there is always a choice: choose one of 32 bits or 64 bits,
> or provide both functions or both sets of data structure. But tell
> people what the choice was.
That's fine for some structures - but certainly not all. For some of
the ones you complain about, such as uid_t or pid_t, these have changed
size over time /without/ corresponding changes to the size of "long" or
other aspects of the hardware. And they don't all change at the same
time. There is no problem for programs that use the types correctly,
only a problem for people who want to oversimplify things to the point
that they no longer work.
>
> If you need to provide one choice, don't hide it behind an opaque
> typedef and then be cagey about what it actually is. APIs need to be
> open and transparent, just like a datasheet.
>
They are.
Have you ever looked at datasheets for electronics? They are full of
details that refer to other data. The entry that shows the "high"
voltage level on an input doesn't say "minimum 3.6 V", it says "minimum
0.7 VDD" - referring to the supply voltage. The timings are often not
in MHz or ns, but in units of the maximum frequency for the chip, or its
system frequency. And so on.
When you want an input to be "high", you don't connect the input to 3.3V
or to 2.5V. You connect it to VDD - and know it is correct for the
chip, regardless of the voltage being used.
Treat your types in C in the same way. The API says "type pid_t". Use
that type. Don't say "pid_t should be 32-bit, so I'll use my own u32".