Portable 16 bit binary integer write

Malcolm McLean

unread,

Sep 10, 2016, 5:23:17 PM9/10/16

to

Is this bullet-proof?
Write a two's complement 16-bit integer in binary format, regardless of host
integer encoding.

int fput16be(int x, FILE *fp)
{
unsigned int rack;
int answer;

assert(x >= -32678 && x < 32768);

if (x >= 0)
rack = x;
else
{
if (x == -32768)
rack = 0xFFFF;
else
{
rack = -x;
rack = ~rack;
rack++;
}
}
fputc((rack >> 8) & 0xFF, fp);
answer = fputc(rack & 0xFF, fp);
return answer;
}

What is the best strategy on x out of range or fputc returning EOF?

Barry Schwarz

unread,

Sep 10, 2016, 8:11:58 PM9/10/16

to

On Sat, 10 Sep 2016 14:23:07 -0700 (PDT), Malcolm McLean
<malcolm...@btinternet.com> wrote:

>Is this bullet-proof?

No.

>Write a two's complement 16-bit integer in binary format, regardless of host
>integer encoding.
>
>int fput16be(int x, FILE *fp)
>{
> unsigned int rack;
> int answer;
>
> assert(x >= -32678 && x < 32768);
>
> if (x >= 0)
> rack = x;
> else
> {
> if (x == -32768)
> rack = 0xFFFF;

This sets rack to the same value that rack would be set to if x is -1.

> else
> {
> rack = -x;

when x is -1, rack is set to 0x0001 and

> rack = ~rack;

then it is set to 0xFFFE and

> rack++;

finally it is set to 0xFFFF.

You probably meant to set rack to 0x8000 for -32768.

> }
> }
> fputc((rack >> 8) & 0xFF, fp);
> answer = fputc(rack & 0xFF, fp);
> return answer;
>}
>
>What is the best strategy on x out of range or fputc returning EOF?

Other than EOF, fputc will return an unsigned (non-negative) value.
You already have the EOF case covered. If the assert condition is
false, you could return 2*EOF or EOF-1, neither of which can be
returned by the function under any other condition. Then it becomes
the calling function's responsibility to check the return value.

--
Remove del for email

Tim Rentsch

unread,

Sep 11, 2016, 10:00:43 AM9/11/16

to

Malcolm McLean <malcolm...@btinternet.com> writes:

> Write a two's complement 16-bit integer in binary format, regardless of host
> integer encoding.

#include <stdio.h>
#include <assert.h>
#include <limits.h>

int
fput16be( unsigned u, FILE *fp ){
/*********************************************************
* Normally called with a signed int argument value.
* Returns 1 on domain error, or EOF on output error,
* or 0 on success.
*********************************************************/
assert( INT_MAX < UINT_MAX ); /* Just a precaution */
return u + 32768u > 65535u ? 1
: fputc( u>>8 & 255, fp ) == EOF ? EOF
: fputc( u & 255, fp ) == EOF ? EOF
: 0;
}

Richard Damon

unread,

Sep 13, 2016, 8:41:28 AM9/13/16

to

First, you don't need all the complicated conversion unless you are
trying to handle a non-compliant implementation (and then the standard
can't be of help).

The conversion of a signed to an unsigned is ALWAYS to be done via
modular arithmetic, so

unsigned int rack = -1;
rack &= 0xFFFF;

must put the value of 65535 (0xFFFF) into rack, and similarly for other
negative numbers. If you are willing to depend on unt16_t existing, you
can remove the &= operation.

As to what to do on errors, that depends a LOT on your own goals.
Sometimes the best thing to do is do as best you can to push through. In
that case rather than assert, just take the bottom 16 bits of the value
and output that, and have let the user check the stream status for
errors at the end. Other cases you might want the function to return an
error value on out of bounds parameters and not do anything (in which
case you may want to also check the first fputc for and error too). Some
cases may want the assert like you have (which will fault in debug, and
just plow through in production assuming you build production with
asserts disabled).

Malcolm McLean

unread,

Sep 13, 2016, 9:24:31 AM9/13/16

to

Yes, it seems the problem doesn't really exist. There's already a mechanism
built in to solve it. The read is a bit tricky, but the write is easy, even on
one's complement machines. (The snag with these routines is that you can't
get hold of a non-two's complement, non CHAR_BIT == 8 implementation
to test them).

>
> As to what to do on errors, that depends a LOT on your own goals.
> Sometimes the best thing to do is do as best you can to push through. In
> that case rather than assert, just take the bottom 16 bits of the value
> and output that, and have let the user check the stream status for
> errors at the end. Other cases you might want the function to return an
> error value on out of bounds parameters and not do anything (in which
> case you may want to also check the first fputc for and error too). Some
> cases may want the assert like you have (which will fault in debug, and
> just plow through in production assuming you build production with
> asserts disabled).
>

I've written portable IEEE 754 binary read/writes. When someone posted a
question recently about reading integers, that reminded me that the portable
read routines are not exactly intuitive. The repository seemed to be an obvious
place for them.
That then raised the question of the complementary writes. But the real danger
is not that you'll run on a one's complement machine, but that you'll pass an
out of range value.

It's very hard to know what a general-purpose function should do in that case.

I'm think of also putting in code to detect a text file's format, and filter to UTF-8.

The the IEE754 repository will become the "portable file exchange repository".

David Brown

unread,

Sep 13, 2016, 10:19:22 AM9/13/16

to

On 13/09/16 15:24, Malcolm McLean wrote:
> Yes, it seems the problem doesn't really exist. There's already a mechanism
> built in to solve it. The read is a bit tricky, but the write is easy, even on
> one's complement machines. (The snag with these routines is that you can't
> get hold of a non-two's complement, non CHAR_BIT == 8 implementation
> to test them).

Are you just doing this for fun, or do you really expect the code to be
useful on ones-complement machines?

There are systems in current use that have CHAR_BIT != 8, but I'd be
surprised to find one with a hosted environment rather than a standalone
environment (i.e., embedded DSP systems), and thus you are unlikely to
have "fputc" (you certainly can't rely on it).

I just wondering what the practical uses are of writing code that is
supposed to be portable to machines that are not one's complement, or
have CHAR_BIT != 8, but which needs fputc to write to a file.

supe...@casperkitty.com

unread,

Sep 13, 2016, 10:47:35 AM9/13/16

to

On Tuesday, September 13, 2016 at 8:24:31 AM UTC-5, Malcolm McLean wrote:
> Yes, it seems the problem doesn't really exist. There's already a mechanism
> built in to solve it. The read is a bit tricky, but the write is easy, even on
> one's complement machines. (The snag with these routines is that you can't
> get hold of a non-two's complement, non CHAR_BIT == 8 implementation
> to test them).

If the authors of the C Standard want to actually promote portability, they
should specify a family of conditionally-supported integer types whose
layout and behavioral specifics are given in user code. Platforms which have
types that match what user code specifies could use native types directly;
those which don't could either emulate such types or refuse compilation. That
would allow someone with code that's written for a 36-bit ones'-complement
machine to port it to modern hardware by adding a few directives and changing
some declarations. Such code wouldn't run nearly as well on a modern machine
as it could run if ported to use more "modern" types, but it would still be
likely to run better than it did on the original hardware.

Malcolm McLean

unread,

Sep 13, 2016, 1:20:44 PM9/13/16

to

The github repository stuff is hobby code. Work-related code is copyright
and so I can't release it to the public.
Baby X started off as a few buttons and other items on top if XLib to get
little programs quickly running on Linux without messing about with various
toolkit and widget libraries. Then I extended it into a toolkit library in its
own right. Then I ported it to Windows. At which point it became rather
too big for the time and effort I had to put into it.

But the goal is total portablity. You just rewrite the machine-specific layer,
and Baby X programs will run anywhere. So portable binary read/write routines
should be added. And a one's complement architecture may yet come out.

However BabyX itself doesn't yet need to read or write anything in binary
(it does need to query the directory however, and if I make the font picker
part of core Baby X it will also need to parse font files, but that's done
by freetype, which is why the font picker isn't in the core).

So I don't yet have a serious use.

supe...@casperkitty.com

unread,

Sep 13, 2016, 2:07:27 PM9/13/16

to

On Tuesday, September 13, 2016 at 12:20:44 PM UTC-5, Malcolm McLean wrote:
> But the goal is total portablity. You just rewrite the machine-specific layer,
> and Baby X programs will run anywhere. So portable binary read/write routines
> should be added. And a one's complement architecture may yet come out.

The only real advantage to a ones'-complement integer architecture is
that one can read out the magnitude of a negative number without a carry
propagation step. That may have been a significant advantage in the
Blinkenlight era, but I cannot imagine any future scenarios where ones'
complement would have *any* advantage for integers. Further, ones'
complement doesn't work well for math larger than a machine word. A ones'-
complement machine where both "char" and "long" are 36-bit ones'-complement values may have been fine for C89, but operations on "long long" would be
more efficiently done as two's-complement *even on such a machine* than as
ones'-complement.

Tim Rentsch

unread,

Sep 13, 2016, 11:47:45 PM9/13/16

to

To me this reaction seems backwards. One should normally take the
stance that code will be written to be platform portable, and fall
back to platform-specific code only when writing platform-agnostic
code proves too arduous. In this particular case it's easier to
write code that works for all the integer encoding schemes than it
is to write code that works for only some of them, so the decision
there seems like a total no-brainer. Similarly CHAR_BIT > 8 is not
a problem if one doesn't mind using only 8 bits of however many
there are available - better to have code that works with a lower
efficiency on those platforms than to simply rule them out just
because CHAR_BIT > 8.

supe...@casperkitty.com

unread,

Sep 14, 2016, 11:25:42 AM9/14/16

to

On Tuesday, September 13, 2016 at 10:47:45 PM UTC-5, Tim Rentsch wrote:
> To me this reaction seems backwards. One should normally take the
> stance that code will be written to be platform portable, and fall
> back to platform-specific code only when writing platform-agnostic
> code proves too arduous.

The C language contains an absurd number of pitfalls for someone trying to
write code which is strictly ocnforming (and even if code is strictly
conforming, the Standard won't require any future implementations to process
it usefully). How much trouble should one be willing to go through to avoid
all the pitfalls that might arise if someone tries to run code on an
architecture that has been out of use for decades?

David Brown

unread,

Sep 15, 2016, 5:28:00 AM9/15/16

to

I think we have quite a different viewpoint here. I am not trying to
have an argument here, or say "I'm right and you are wrong". In
particular, I know we work on rather different kinds of code (the irony
being that I may well have worked on a wider range of targets than you).
What follows is an attempt at sharing a my different viewpoint on
portable code, as an exchange of ideas and opinions.

Note - I know that in this particular case, the integer encoding does
not affect the code. I'm making a more general point here.

I have nothing against writing code that happens to work on all
conforming systems. Most code involving integers is valid or invalid
regardless of the integer encoding scheme. My question is whether or
not it is worth /considering/ something like integer encoding.

In the case of Malcolm's code here, it is written in connection with his
Baby X project. Is there /any/ possibility that it would ever be run on
a machine that does not have two's complement integers? Is there any
possibility that someone will look at that code, and use it for
inspiration or learning, when writing code that will run on machines
with different integer encoding?

If the answers are no, then it is a waste of time (except for fun or
education) even thinking about the issue. It is certainly a waste of
time and effort to write code intentionally in a way that is agnostic to
the integer encoding. And if keeping it so portable means poorer
generated code on real-world systems, then you have definitely lost out.

Code is written for a variety of purposes, and you generally have a
range of goals in mind. Portability and re-use is often part of that,
but rarely the most important goal. And so you have a balancing act -
do you make assumptions that reduce portability, but improve other goals
such as code clarity, or run-time efficiency?

And when you are thinking about portability, what should be your aims?
Do you concern yourself with systems that are conforming C but almost
non-existent in reality (such as one's complement platforms)? Do you
concern yourself with systems that are totally hypothetical but could
exist (such as a system with CHAR_BIT == 27)? Do you concern yourself
with systems that are /not/ conforming, but /do/ exist? There are
billions of microcontrollers in use where the development tools support
a language that is mostly C, but not quite. For example, they might
have 8-bit int, or not support implicit promotion to int (or unsigned
int), in order to generate more efficient code on small processors.

So if I need a function that turns an int into an unsigned 16-bit
integer, I would normally write:

uint16_t toUint16(int x) {
return x;
}

It would be more portable if I wrote:

uint_least16_t toUint16(int x) {
uint_least16_t y = x;
return y & 0xffff;
}

But the first version will run on almost all real-life systems,
including most processors we are likely to see in the future. And on
the few devices that don't support such code, like the Analog Devices
SHARC dsps, you'll get a clear compile-time error - there is no risk of
silently producing incorrect code.

And on many real-life systems, the first version will give more
efficient run-time code - and it will never give /worse/ code.

supe...@casperkitty.com

unread,

Sep 15, 2016, 12:04:29 PM9/15/16

to

On Thursday, September 15, 2016 at 4:28:00 AM UTC-5, David Brown wrote:
> If the answers are no, then it is a waste of time (except for fun or
> education) even thinking about the issue. It is certainly a waste of
> time and effort to write code intentionally in a way that is agnostic to
> the integer encoding. And if keeping it so portable means poorer
> generated code on real-world systems, then you have definitely lost out.

What do you think the authors of the C89 Standard intended should happen
in cases where certain actions yielded a consistent predictable behavior
on an *easily-recognizable* 95%+ of platforms, but where a few easily-
recognizable targets would not support such reliable behavior, or where
code for a few easily-recognizable implementations might rely upon the
fact that such actions do something else?

It seems fashionable today to regard the failure of the Standard to
mandate that a behavioral guarantee be honored on ALL systems (even those
where it would be very costly and offer little benefit) as an indication
that programmers should be required to do without the guarantee even on
platforms where it would be useful and cheap. Do you agree with my
assessment of the fashionable viewpoint, and do you think that viewpoint
is reasonable?

Keith Thompson

unread,

Sep 15, 2016, 12:25:58 PM9/15/16

to

You should start a blog so you can make these points without re-writing
them every time.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

David Brown

unread,

Sep 15, 2016, 12:40:04 PM9/15/16

to

Whatever my viewpoints on the standards or on compilers, it is certainly
my view that your viewpoint does not need repeated ad nausea in every
thread.

If you want to comment on what was actually written here, feel free to
join in. But if you just want to repeat your philosophy about the C
standards folk meaning one thing and writing something different, and
then the evil compiler writers having fun in tricking programmers into
undefined behaviour, then it suffices to write "see almost all of my
other posts in this group".

Tim Rentsch

unread,

Sep 15, 2016, 3:08:06 PM9/15/16

to

Neither one of us is right, or wrong, since in both cases all
that is being offered is personal opinion.

> In particular, I know we work on rather different kinds of code
> (the irony being that I may well have worked on a wider range of
> targets than you).

The remark in parentheses is very likely true along some axes,
and very likely false along others.

(My comments are not meant to address any matters outside the
realm of conforming C. I hope we can confine the discussion to
that domain.)

> There are
> billions of microcontrollers in use where the development tools support
> a language that is mostly C, but not quite. For example, they might
> have 8-bit int, or not support implicit promotion to int (or unsigned
> int), in order to generate more efficient code on small processors.

I believe I understand your position (ie, just in the context of
talking about conforming C implementations), and the reasons you
have offered to motivate those opinions.

I believe you do not yet understand what I am advocating here.
I'm sorry to just leave it at that in this response, but I'm at a
loss right now as to what further explanation to give.

> So if I need a function that turns an int into an unsigned 16-bit
> integer, I would normally write:
>
> uint16_t toUint16(int x) {
> return x;
> }
>
> It would be more portable if I wrote:
>
> uint_least16_t toUint16(int x) {
> uint_least16_t y = x;
> return y & 0xffff;
> }
>
> But the first version will run on almost all real-life systems,
> including most processors we are likely to see in the future. And on
> the few devices that don't support such code, like the Analog Devices
> SHARC dsps, you'll get a clear compile-time error - there is no risk of
> silently producing incorrect code.
>
> And on many real-life systems, the first version will give more
> efficient run-time code - and it will never give /worse/ code.

It isn't hard to supply this functionality in a way that is
portable to platforms that do not have a uint16_t type, but still
is just as efficient on those platforms where it can be. As the
amount of effort needed is comparable in the two approaches, ISTM
that a more portable version is a better choice. Furthermore
cases like this can be accumulated in a general library so the
effort involved is amortized over all the code bases where such
things are needed.

Tim Rentsch

unread,

Sep 15, 2016, 3:17:40 PM9/15/16

to

After some reflective thought and deliberation, I have arrived at
a tentative conclusion that it would be better for all concerned
if I were to refrain from responding to these kinds of questions
until some of the buzzier bees find a way to escape from your
bonnet.

supe...@casperkitty.com

unread,

Sep 15, 2016, 3:26:57 PM9/15/16

to

On Thursday, September 15, 2016 at 11:40:04 AM UTC-5, David Brown wrote:

My intended focus was on your comment that it is not good to make code
less efficient for the purpose of supporting obsolete targets to which
code will never be ported anyway. I hope I'm correctly judging your
sentiment there, since I agree with it, and I'm sorry I got so long-
winded that I lost that essential point.

Is there any reason why that same principle should not be equally
applicable to the Standard's allowances for arcane architectures?

David Brown

unread,

Sep 15, 2016, 5:12:27 PM9/15/16

to

I think it is /usually/ not good to make code less efficient (or less
clear, or less flexible, or whatever) for realistic targets, in order to
support portability to unrealistic targets.

Exactly where the dividing line goes, will vary depending on the code,
the targets, and the requirements. And one must always consider the
possibility of future targets - not just obsolete ones.

But the general point is correct.

> Is there any reason why that same principle should not be equally
> applicable to the Standard's allowances for arcane architectures?
>

The people writing the standards have a different balance of needs than
I do when writing code. I will happily agree that it would be more
convenient for /me/ if certain parts of the standard were nailed down in
the way /I/ want them, rather than being left as implementation defined
or undefined behaviour. But the standard writers are not writing the
standards for /me/.

David Brown

unread,

Sep 15, 2016, 6:32:38 PM9/15/16

to

Agreed. I just wanted to be clear here, in case it looked like I was
being argumentative (that happens sometimes).

>
>> In particular, I know we work on rather different kinds of code
>> (the irony being that I may well have worked on a wider range of
>> targets than you).
>
> The remark in parentheses is very likely true along some axes,
> and very likely false along others.

Agreed. It might be interesting to compare sometime.

I realise you prefer to stick to conforming C in discussions - it can be
very difficult to reason about something that breaks the rules unless we
can limit exactly /which/ rules are broken, and in which ways (i.e., we
can provide new rules). Certainly it is impossible to write code that
will work on a particular near-C compiler unless we know exactly how it
works.

So I am not suggesting that code should be written in a way that will
work with any near-C compiler. I am merely raising the question, should
some of these compilers be considered when writing code that should be
portable?

As an example, portable embedded code is sometimes written with explicit
initialisation for all global or file-scope variables, even if their
initial value is 0:

static int x = 0;

rather than

static int x;

They do this because there are embedded compilers (notably TI's "Code
composer" compilers - sometimes referred to as "Code composter") which
do not zero-initialise such variables.

It is possible that it would surprise you just how inefficient some
compilers are. For the most part, I have access to good compilers
(mostly gcc, sometimes other good quality compilers). But I have had
occasion to use ones that do a much poorer job - I have one for an 8-bit
target that would generate two 8-bit "AND 0xff" instructions and a
number of register moves for the second version of the function above.

And in terms of "portability", one might consider "portable to different
programmers" as well as "portable to different compilers and targets".
There is a depressingly large proportion of C programmers who don't know
much about their tools, and fail to enable even basic optimisations.
Should we write code in a way that minimises the inefficiency this
causes, or should we just say that if those users don't care about their
runtime efficiency, we shouldn't care about their runtime efficiency
either? I am not trying to answer that question, merely raising it.

supe...@casperkitty.com

unread,

Sep 15, 2016, 7:17:50 PM9/15/16

to

On Thursday, September 15, 2016 at 4:12:27 PM UTC-5, David Brown wrote:
> The people writing the standards have a different balance of needs than
> I do when writing code.

Many people say that, but I've seen no evidence that C89 was ever intended
to define the full set of behaviors which should be usable on a quality
implementation for a commonplace platform.

Based upon what the authors of the Standard wrote in the Standard and its
associated rationale, I would infer the following:

1. The authors of the Standard expected that if some actions had usefully-
defined behaviors on implementations for many platforms but not quite
all, categorizing the action as UB would maintain the status quo; they
didn't think it would make the behaviors "less defined" than they had
been before the Standard was written.

2. One of the authors' priorities was to ensure that if a pre-C89
implementation for platform X could perform some task with some level
of efficiency, it should be possible for a C89 compiler for that same
platform to do so just as well.

3. The authors of the Standard expected that it would be fairly clear and
obvious to programmers and compiler writers alike what behavioral
guarantees should be offered on what platforms, and that such people
would be better equipped than the authors of the Standard to make such
judgments.

Have you seen evidence to contradict the above?

Have you seen any evidence to suggest that the authors of C89 intended
that programmers for commonplace platforms should ever have to jump
through hoops to achieve semantics which were consistently supported by
general-purpose implementations in 1988?

> I will happily agree that it would be more
> convenient for /me/ if certain parts of the standard were nailed down in
> the way /I/ want them, rather than being left as implementation defined
> or undefined behaviour. But the standard writers are not writing the
> standards for /me/.

The Standard isn't written to include all the behaviors you might want
on any given platform because it isn't intended to describe all of the
behaviors that a good implementation for any given platform should
provide.

supe...@casperkitty.com

unread,

Sep 15, 2016, 7:30:39 PM9/15/16

to

On Thursday, September 15, 2016 at 5:32:38 PM UTC-5, David Brown wrote:
> There is a depressingly large proportion of C programmers who don't know
> much about their tools, and fail to enable even basic optimisations.
> Should we write code in a way that minimises the inefficiency this
> causes, or should we just say that if those users don't care about their
> runtime efficiency, we shouldn't care about their runtime efficiency
> either? I am not trying to answer that question, merely raising it.

The majority of code in many applications will be orders of magnitude
faster than it needs to be, even without optimization. If failure to
enable an optimization causes a program to run unacceptably slowly,
research into where the program is spending its time should fairly
quickly identify an area of the code that should be inspected to find
out whether some optimizations may be enabled safely.

If, by contrast, a compiler decides that given something like:

/* Assume int is 32 bits */
uint32_t fetch_32le(uint8_t *p)
{
return p[0] | (p[1]<<8) | (p[2]<<16) | (p[3]<<24);
}

it should in some cases omit code that would check whether p[3] is
greater than 127, the only evidence that such a thing has occurred
may be some random data corruption that occurs for no apparent reason
when a function which is supposed to be run only with values less than
128 gets run with larger values. Note that adding an "assert" within
the function to check whether the value is less than 128 might not
help: if the optimizer is enabled, it will decide the value "can't" be
greater than 128, and omit the assertion, while if it isn't enabled the
function will never get called with values greater than 128.

The authors of the C Standard have indicated in their rationale that they
would expect that commonplace implementations would process the above
function correctly. Should the fact that the Standard would allow an
implementation to do otherwise be taken as implying that a quality
implementation for a two's-complement platform *should* do so?

Keith Thompson

unread,

Sep 15, 2016, 7:36:52 PM9/15/16

to

supe...@casperkitty.com writes:
[...]

> Based upon what the authors of the Standard wrote in the Standard and its
> associated rationale, I would infer the following:
>
> 1. The authors of the Standard expected that if some actions had usefully-
> defined behaviors on implementations for many platforms but not quite
> all, categorizing the action as UB would maintain the status quo; they
> didn't think it would make the behaviors "less defined" than they had
> been before the Standard was written.
>
> 2. One of the authors' priorities was to ensure that if a pre-C89
> implementation for platform X could perform some task with some level
> of efficiency, it should be possible for a C89 compiler for that same
> platform to do so just as well.
>
> 3. The authors of the Standard expected that it would be fairly clear and
> obvious to programmers and compiler writers alike what behavioral
> guarantees should be offered on what platforms, and that such people
> would be better equipped than the authors of the Standard to make such
> judgments.
>
> Have you seen evidence to contradict the above?

[...]

Keith Thompson

unread,

Sep 15, 2016, 7:37:31 PM9/15/16

to

supe...@casperkitty.com writes:
[...]

> The authors of the C Standard have indicated in their rationale that they
> would expect that commonplace implementations would process the above
> function correctly. Should the fact that the Standard would allow an
> implementation to do otherwise be taken as implying that a quality
> implementation for a two's-complement platform *should* do so?

You should start a blog so you can make these points without re-writing
them every time.

(Don't worry, this is the last time I'll do this.)

Malcolm McLean

unread,

Sep 16, 2016, 1:00:36 AM9/16/16

to

On Thursday, September 15, 2016 at 11:32:38 PM UTC+1, David Brown wrote:
>
> And in terms of "portability", one might consider "portable to different
> programmers" as well as "portable to different compilers and targets".
> There is a depressingly large proportion of C programmers who don't know
> much about their tools, and fail to enable even basic optimisations.
> Should we write code in a way that minimises the inefficiency this
> causes, or should we just say that if those users don't care about their
> runtime efficiency, we shouldn't care about their runtime efficiency
> either? I am not trying to answer that question, merely raising it.
>

The general rule is "write the thing in C or it's a thundering
nuisance to anyone else trying to use your code". The main portability
problem is type aliasing. It's common to see geometrical functions
which expose something like struct{double x, double y} Point, then
you've got a few functions like "length". For the sake of a few
trivial subroutines, something quite sophisticated, like a Voroni
finder, is broken and unusable, and needs rewriting before it can
be used.
In C++ the problem is far worse, because someone will have
written a "Point2" class with lots of nice features, maybe even
overloading the "-" operator to return a "Vector2". Cute. But then
it's not just a case of systematically going through and replacing
the the "Point"s with x and y. The whole Point2 machinery needs
ripping out and replacing with whatever you are using.

It's even worse if your type alias is on integers. Most integers
need to be big enough to index the arrays the functions that
operate on those arrays are passed. But essentially every function
uses integers. If the language does not have an accepted, conventional,
aways-used keyword for declaring an integer, it's on the skids. It
is no longer a suitable medium for communication of subroutines.

But a function that takes an "int" and writes it as a 16 bit-little
endian two's complement representation to a file should be portable
to anything that supports binary file io. There's no reason for it
not to be.

supe...@casperkitty.com

unread,

Sep 16, 2016, 1:18:01 AM9/16/16

to

On Friday, September 16, 2016 at 12:00:36 AM UTC-5, Malcolm McLean wrote:
> The general rule is "write the thing in C or it's a thundering
> nuisance to anyone else trying to use your code". The main portability
> problem is type aliasing. It's common to see geometrical functions
> which expose something like struct{double x, double y} Point, then
> you've got a few functions like "length". For the sake of a few
> trivial subroutines, something quite sophisticated, like a Voroni
> finder, is broken and unusable, and needs rewriting before it can
> be used.

It's a shame that some people think programmers should have to jump through
hoops to exchange data between a function that expects a pointer to a

struct point {double x, double y; };

and another function that expects a pointer to a

struct point2d {double x, double y; };

even when both structures contain members with the same names and types.
It's good that compilers require casts when converting between pointers
to layout-compatible types, since types may be layout compatible but have
different semantics, but that doesn't mean there shouldn't be a usable way
to convert pointers.

In C89, it was at least possible to use memmove() to block aliasing
assumptions, and a smart implementation might notice at runtime that the
two pointers were equal and skip the actual copy operation (but retain
the alias-blocking semantics). Since C99 changed the semantics of memmove
so that it is no longer a barrier to aliasing analysis, however, I don't
think there is an efficient way to force the compiler that some storage
which had been accessed as one type is going to be accessed as another.

Ian Collins

unread,

Sep 16, 2016, 2:12:38 AM9/16/16

to

You should start a blog so you can make these points without re-writing
them every time.

--
Ian

Malcolm McLean

unread,

Sep 16, 2016, 3:46:28 AM9/16/16

to

On Friday, September 16, 2016 at 7:12:38 AM UTC+1, Ian Collins wrote:
>
> You should start a blog so you can make these points without re-writing
> them every time.
>

Or help me specify B64.

It's something I want to start when I get around to it. All types will
be 64 bits, except strings which will be zero-padded multiples of
64 bits (so strcmp() becomes in almost all cases a single machine
instruction), maybe we won't have any types at all, I haven't decided.
.
But we certainly must avoid any pitfalls in the rules that would
allow an aggressive B64 optimiser to emit counter-intuitive code.

David Brown

unread,

Sep 16, 2016, 5:11:09 AM9/16/16

to

On 16/09/16 01:17, supe...@casperkitty.com wrote:
> On Thursday, September 15, 2016 at 4:12:27 PM UTC-5, David Brown wrote:
>> The people writing the standards have a different balance of needs than
>> I do when writing code.
>
> Many people say that, but I've seen no evidence that C89 was ever intended
> to define the full set of behaviors which should be usable on a quality
> implementation for a commonplace platform.
>
> Based upon what the authors of the Standard wrote in the Standard and its
> associated rationale, I would infer the following:
>
> 1. The authors of the Standard expected that if some actions had usefully-
> defined behaviors on implementations for many platforms but not quite
> all, categorizing the action as UB would maintain the status quo; they
> didn't think it would make the behaviors "less defined" than they had
> been before the Standard was written.

They called such behaviours "implementation defined", not "undefined".
They did this where they thought compilers should behave in a sensible
and predictable way, but that the best choice of behaviour would vary
between implementations.

And they made it perfectly clear that "undefined behaviour" does /not/
mean "launch nasal daemons" or "screw the programmer" - it merely means
"this standard does not define the behaviour". Implementations have
always been free to add their own behaviour in such cases - and
implementations have always done so. In some cases, that will be
"assume it never happens and optimise from there", in other cases it
will be "/this/ is how the code behaves".

If a particular compiler implemented something (with undefined behaviour
in the standards) in one way, and then changed how it implemented it,
then that is an issue between the compiler, its documentation, and its
users. If the compiler changed its documented behaviour, then that
could be a surprise to the user - but if the user has been relying on
undocumented behaviour, then they are at fault.

>
> 2. One of the authors' priorities was to ensure that if a pre-C89
> implementation for platform X could perform some task with some level
> of efficiency, it should be possible for a C89 compiler for that same
> platform to do so just as well.

This is something they wanted, but it was never an overriding priority.
Code correctness trumps code speed every time - and the C standards
committee, compiler writers, and any decent programmer knows this.

>
> 3. The authors of the Standard expected that it would be fairly clear and
> obvious to programmers and compiler writers alike what behavioral
> guarantees should be offered on what platforms, and that such people
> would be better equipped than the authors of the Standard to make such
> judgments.

That sounds like compiler documentation to me.

>
> Have you seen evidence to contradict the above?

I have never seen anything to support it - "proof by repeated assertion"
aside.

>
> Have you seen any evidence to suggest that the authors of C89 intended
> that programmers for commonplace platforms should ever have to jump
> through hoops to achieve semantics which were consistently supported by
> general-purpose implementations in 1988?

I have never seen any evidence to suggest that people have been forced
to jump through hoops to make their correct pre-C89 code remain correct
post-C89.

I have seen a few examples from you about how code written with ancient
tools happened to work at the time, but fails to work with more advanced
modern compilers. That is simply a matter of the older tools having far
fewer optimisations - the code was never guaranteed to work from the
documented behaviour of the tools. And on modern tools, you generally
have a range of options and extensions that give you all the control you
need in order to get the code results you want.

>
>> I will happily agree that it would be more
>> convenient for /me/ if certain parts of the standard were nailed down in
>> the way /I/ want them, rather than being left as implementation defined
>> or undefined behaviour. But the standard writers are not writing the
>> standards for /me/.
>
> The Standard isn't written to include all the behaviors you might want
> on any given platform because it isn't intended to describe all of the
> behaviors that a good implementation for any given platform should
> provide.
>

Ah, there you have written something that is true.

But as Keith says, start a blog. Then you won't have to keep posting
these lists of incorrect assumptions and theories again and again, and
when people add comments correcting you, there is a slight chance that
you will learn from them.

Ben Bacarisse

unread,

Sep 16, 2016, 5:23:48 AM9/16/16

to

Malcolm McLean <malcolm...@btinternet.com> writes:

> On Friday, September 16, 2016 at 7:12:38 AM UTC+1, Ian Collins wrote:
>>
>> You should start a blog so you can make these points without re-writing
>> them every time.
>>
> Or help me specify B64.

What is the case for yet another language given that BCPL and the newer
MCPL seem to be the sort of thing you are talking out?

<snip>
--
Ben.

BartC

unread,

Sep 16, 2016, 5:59:35 AM9/16/16

to

On 16/09/2016 08:46, Malcolm McLean wrote:
> On Friday, September 16, 2016 at 7:12:38 AM UTC+1, Ian Collins wrote:
>>
>> You should start a blog so you can make these points without re-writing
>> them every time.
>>
> Or help me specify B64.
>
> It's something I want to start when I get around to it. All types will
> be 64 bits, except strings which will be zero-padded multiples of
> 64 bits (so strcmp() becomes in almost all cases a single machine
> instruction),

Until you want to compare a slice or tail end of a larger string (and
with a slice of another).

The idea is intriguing but after decades of using byte-addressable
machines, going back to word-addressed - and enforced by a language even
though the hardware is perfectly byte-capable - is going to be hard.

Especially when you need to communicate with software /on the same
machine/ which uses struct members that have sizes and offets that are
byte-aligned.

--
Bartc

Malcolm McLean

unread,

Sep 16, 2016, 7:14:00 AM9/16/16

to

On Friday, September 16, 2016 at 10:59:35 AM UTC+1, Bart wrote:
> On 16/09/2016 08:46, Malcolm McLean wrote:
> > On Friday, September 16, 2016 at 7:12:38 AM UTC+1, Ian Collins wrote:
> >>
> >> You should start a blog so you can make these points without re-writing
> >> them every time.
> >>
> > Or help me specify B64.
> >
> > It's something I want to start when I get around to it. All types will
> > be 64 bits, except strings which will be zero-padded multiples of
> > 64 bits (so strcmp() becomes in almost all cases a single machine
> > instruction),
>
> Until you want to compare a slice or tail end of a larger string (and
> with a slice of another).
>
> The idea is intriguing but after decades of using byte-addressable
> machines, going back to word-addressed - and enforced by a language even
> though the hardware is perfectly byte-capable - is going to be hard.
>

You discourage that. Of course there will be a relatively slow function for
taking a substring of another string and for extracting characters. Bitwise
operations on two adjacent 64 bit values are not in any sense prohibited.

Once B64 takes over the world, hardware will be built to optimise for it.
Whilst I don't know much about chip design, I believe that simplifying
the instruction set so you only have 64 bit reads and writes, you can also
reduce the amount of circuitry on the chip, making it cheaper, smaller,
and less power-hungry.

>
> Especially when you need to communicate with software /on the same
> machine/ which uses struct members that have sizes and offets that are
> byte-aligned.
>

That will be a bit buffer. You build bit buffers either to communicate with non-
B64 functions or when the operation is logically bit-based rather than
based on data consisting of integers, real numbers and strings. Since our
integers have 64 bits, we can make bits index-addressable. Since it is
the rare datum that consists of more than 64 bits, we can extract embedded
integers, characters and reals very cleanly. The only fiddle is with strings,
where you'll have to copy to a B64 aligned buffer if you want to use the
native string routines.

supe...@casperkitty.com

unread,

Sep 16, 2016, 11:44:11 AM9/16/16

to

On Friday, September 16, 2016 at 4:11:09 AM UTC-5, David Brown wrote:

> On 16/09/16 01:17, supercat wrote:
> > 1. The authors of the Standard expected that if some actions had usefully-
> > defined behaviors on implementations for many platforms but not quite
> > all, categorizing the action as UB would maintain the status quo; they
> > didn't think it would make the behaviors "less defined" than they had
> > been before the Standard was written.
>
> They called such behaviours "implementation defined", not "undefined".
> They did this where they thought compilers should behave in a sensible
> and predictable way, but that the best choice of behaviour would vary
> between implementations.

That's popular myth #2. The term "Implementation-Defined" is only used
in cases where *ALL* implementations are required to specify a precise
behavior. It is not used in cases where 99% of implementations might be
expected to define a specific behavior, but where it was deemed plausible
that some implementations might not be able to.

As evidence, I offer the fact that shifting a negative number left is UB
even though so far as I can tell every two's-complement machine (i.e. the
majority of C implementations) implements them the same way.

> And they made it perfectly clear that "undefined behaviour" does /not/
> mean "launch nasal daemons" or "screw the programmer" - it merely means
> "this standard does not define the behaviour". Implementations have
> always been free to add their own behaviour in such cases - and
> implementations have always done so. In some cases, that will be
> "assume it never happens and optimise from there", in other cases it
> will be "/this/ is how the code behaves".

In the years before C89, the C language was defined largely by precedent,
and while the Standard Library was not always consistently defined, the
behavioral precedents for the language itself were very strongly established
for implementations targeting machines that were architecturally like the
PDP-11 (except for details of data formats and alignment). Compilers may
offer *options* to waive certain behavioral expectations, but the fundamental
core of the language that was popular treated things as defined. Further,
the nature of "optimize from there" has changed quite fundamentally.

It would not be astonishing, for example, for an optimizer to have integer
overflow yield a result which differs from the correct result by some
arbitrary multiple of (INT_MAX*2+2). On an 8086, the multiply instruction
takes two 16-bit operands and always yields a 32-bit result in two specific
registers. Given something like:

long mul_add_l(unsigned char x, unsigned char y, long l)
{ return x*10000+y*7000+65536L; }

the most efficient code would interpret the bottom 16 bits of one product
as an unsigned number, but use all 32 bits of the other. While I would
be surprised if any compilers of that era would actually find the most
efficient code, I would not be in any way astonished if they did so, nor
would I be astonished if a compiler added the lower 16 bits of the two
products, sign-extended the result, and added that to l. If any result
which is congruent to the correct result mod 65536 would be equally
satisfactory, letting the compiler select among the calculation approaches
would be a clear win for optimization. By contrast, writing the expression
as "(unsigned)x*10000+(unsigned)y*7000+65536L;" would force the compiler
to generate less-than-optimal code [and would also spend almost as many
characters in the source text preventing the compiler from launching nuclear
missiles as it spent doing useful work].

> If a particular compiler implemented something (with undefined behaviour
> in the standards) in one way, and then changed how it implemented it,
> then that is an issue between the compiler, its documentation, and its
> users. If the compiler changed its documented behaviour, then that
> could be a surprise to the user - but if the user has been relying on
> undocumented behaviour, then they are at fault.

Specifications for things generally won't bother promising not to do
something unless the spec-writer can imagine someone thinking such a
behavior was plausible. For example, if a future version of gcc were
to generate nonsense machine code when a program uses type punning, even
when using -fno-strict-aliasing, should the programmer be faulted for
relying upon behavior that was never defined (gcc's documentation for
-fno-strict-aliasing doesn't actually promise that pointer-based type
punning will behave as though writing a pointer converts an object to a
series of bytes, and reading a pointer interprets the underlying bytes
as an object).

> Ah, there you have written something that is true.
>
> But as Keith says, start a blog. Then you won't have to keep posting
> these lists of incorrect assumptions and theories again and again, and
> when people add comments correcting you, there is a slight chance that
> you will learn from them.

Probably not a bad idea, though I should mention that I have learned a fair
amount here.

supe...@casperkitty.com

unread,

Sep 16, 2016, 11:58:33 AM9/16/16

to

On Friday, September 16, 2016 at 4:59:35 AM UTC-5, Bart wrote:
> > It's something I want to start when I get around to it. All types will
> > be 64 bits, except strings which will be zero-padded multiples of
> > 64 bits (so strcmp() becomes in almost all cases a single machine
> > instruction),
>
> Until you want to compare a slice or tail end of a larger string (and
> with a slice of another).

A solution for that is to either pass around strings as base+offset+
length (if strings don't need to exceed 4GB, one could combine the
latter two values in a single register), or as pointer to descriptors
which contain that information. Zero-terminated strings have three
advantages over length-prefixed strings:

1. There's no hard-coded upper limit on length

2. Z-strings are more compact than prefixed strings whose length
field is longer than one byte.

3. One can cheaply get a pointer to a string starting at offset N
of a string *which is known to be at least N characters long*.

The only reason I see for using them in C is that most implementations
don't allow in-line construction of static character sequences in any
other format. I don't see any of them as compelling in B64.

Tim Rentsch

unread,

Sep 16, 2016, 12:12:05 PM9/16/16

to

David Brown <david...@hesbynett.no> writes:

> On 15/09/16 21:07, Tim Rentsch wrote:
>> David Brown <david...@hesbynett.no> writes:
>>
>>> On 14/09/16 05:47, Tim Rentsch wrote:
>>>> David Brown <david...@hesbynett.no> writes:

>>>> [...]

I still hope we can confine the discussion to the realm of
conforming C.

>>> So if I need a function that turns an int into an unsigned 16-bit
>>> integer, I would normally write:
>>>
>>> uint16_t toUint16(int x) {
>>> return x;
>>> }
>>>
>>> It would be more portable if I wrote:
>>>
>>> uint_least16_t toUint16(int x) {
>>> uint_least16_t y = x;
>>> return y & 0xffff;
>>> }
>>>
>>> But the first version will run on almost all real-life systems,
>>> including most processors we are likely to see in the future. And on
>>> the few devices that don't support such code, like the Analog Devices
>>> SHARC dsps, you'll get a clear compile-time error - there is no risk of
>>> silently producing incorrect code.
>>>
>>> And on many real-life systems, the first version will give more
>>> efficient run-time code - and it will never give /worse/ code.
>>
>> It isn't hard to supply this functionality in a way that is
>> portable to platforms that do not have a uint16_t type, but still
>> is just as efficient on those platforms where it can be. As the
>> amount of effort needed is comparable in the two approaches, ISTM
>> that a more portable version is a better choice. Furthermore
>> cases like this can be accumulated in a general library so the
>> effort involved is amortized over all the code bases where such
>> things are needed.
>
> It is possible that it would surprise you just how inefficient some

> compilers are. [...]

It is possible that it would surprise you just how extensive my
background and experience are. I have after all been programming
for almost 50 years, which may not show through - when writing in
comp.lang.c I try to observe the topicality guidelines.

Rather than wonder about what would surprise me, why don't you
take a run at supplying the aforementioned functionality in a
way that is both portable and efficient? If you did, that
would give us something to talk about.

> And in terms of "portability", one might consider "portable to
> different programmers" as well as "portable to different compilers and
> targets". There is a depressingly large proportion of C programmers
> who don't know much about their tools, and fail to enable even basic
> optimisations. Should we write code in a way that minimises the
> inefficiency this causes, or should we just say that if those users
> don't care about their runtime efficiency, we shouldn't care about
> their runtime efficiency either? I am not trying to answer that
> question, merely raising it.

Ignorant programmers should be educated. Incompetent programmers
should be fired. Any place that doesn't do that should put less
emphasis on programming problems and more on management problems.

Malcolm McLean

unread,

Sep 16, 2016, 1:16:52 PM9/16/16

to

B64 will be a near-subset of ANSI C, where all the types are 64 bits.
So we only have one integer type (I used to be a T-shirt salesman, we couldn't
handle stock of different sizes, so to make things easier for ourselves all the
T-shirts were extra large. WE then convinced the public that wearing on oversized
T-shirt was a fashionable thing to do).

Reals aren't an issue - float is too small to be safe, and the only justification for
it is micro-efficiency. Characters aren't a problem either, an individual variable
that holds a character is just 64 bits. but strings are a bit more difficult. Storing
them as arrays of 64-bit character types that hold 7-bit Ascii values is too
extravagant, and not binary-compatible with what will be legacy code written
in C. The obvious solution is to pad 64 bit values with extra trailing zeroes,
and pack the string in. Then you have a slight inefficiency for character-based
access, but the payoff is faster comparisons, and more programs that do heavy
string handling spend most of their time in sorts.

However a string has to start on a 64-bit boundary. So in-place string slicing is
difficult. You can of course pass around base plus offset and length, but it's
rather fiddly.

David Brown

unread,

Sep 16, 2016, 1:59:14 PM9/16/16

to

On 16/09/16 17:43, supe...@casperkitty.com wrote:
> On Friday, September 16, 2016 at 4:11:09 AM UTC-5, David Brown wrote:
>> On 16/09/16 01:17, supercat wrote:
>>> 1. The authors of the Standard expected that if some actions had usefully-
>>> defined behaviors on implementations for many platforms but not quite
>>> all, categorizing the action as UB would maintain the status quo; they
>>> didn't think it would make the behaviors "less defined" than they had
>>> been before the Standard was written.
>>
>> They called such behaviours "implementation defined", not "undefined".
>> They did this where they thought compilers should behave in a sensible
>> and predictable way, but that the best choice of behaviour would vary
>> between implementations.
>
> That's popular myth #2. The term "Implementation-Defined" is only used
> in cases where *ALL* implementations are required to specify a precise
> behavior. It is not used in cases where 99% of implementations might be
> expected to define a specific behavior, but where it was deemed plausible
> that some implementations might not be able to.
>

Making up numbers and claims does not help your argument.

> As evidence, I offer the fact that shifting a negative number left is UB
> even though so far as I can tell every two's-complement machine (i.e. the
> majority of C implementations) implements them the same way.

Left-shift of a negative number /is/ defined, as long as its value makes
sense. Perhaps you should read the standards before complaining about them.

The general rule for signed integers and their operations is that their
/value/ is significant, not their representation. Operations are
defined as long as the value makes sense.

-5 << 3 is defined to -5 * 2³, which is -40. It is only overflow of
signed integer shifts that is undefined behaviour. This is consistent
with all other arithmetic operations on signed integers.

Now, it would have been /possible/ for the C standards folks to say that
signed overflow was implementation dependent, not undefined behaviour.
However, they specifically choose undefined behaviour to give compiler
writers scope for optimisations - while also allowing compiler writers
to give defined behaviour if they wanted.

And how does every two's complement machine implement left shifting of
negative numbers? As long as the result does not lead to overflow, they
will all do it in the same way - they will follow the C standards. If
it leads to overflow, then who knows what they will do - I certainly
would not assume it is consistent. Certainly the behaviour for when the
shift amount is too big (greater than or equal to the width of the type)
will vary from processor to processor, and perhaps vary depending on
whether the value is known at compile time or not.

>
>> And they made it perfectly clear that "undefined behaviour" does /not/
>> mean "launch nasal daemons" or "screw the programmer" - it merely means
>> "this standard does not define the behaviour". Implementations have
>> always been free to add their own behaviour in such cases - and
>> implementations have always done so. In some cases, that will be
>> "assume it never happens and optimise from there", in other cases it
>> will be "/this/ is how the code behaves".
>
> In the years before C89, the C language was defined largely by precedent,

No, it was defined largely by "The C Programming Language".

> and while the Standard Library was not always consistently defined, the
> behavioral precedents for the language itself were very strongly established
> for implementations targeting machines that were architecturally like the
> PDP-11 (except for details of data formats and alignment). Compilers may
> offer *options* to waive certain behavioral expectations, but the fundamental
> core of the language that was popular treated things as defined. Further,
> the nature of "optimize from there" has changed quite fundamentally.

The nature of optimisation has changed hugely, as compilers have got
more sophisticated. If you want to go back to the days of limited
optimisations, you can do so - most compilers let you control those
features.

>
> It would not be astonishing, for example, for an optimizer to have integer
> overflow yield a result which differs from the correct result by some
> arbitrary multiple of (INT_MAX*2+2). On an 8086, the multiply instruction
> takes two 16-bit operands and always yields a 32-bit result in two specific
> registers. Given something like:
>
> long mul_add_l(unsigned char x, unsigned char y, long l)
> { return x*10000+y*7000+65536L; }
>
> the most efficient code would interpret the bottom 16 bits of one product
> as an unsigned number, but use all 32 bits of the other. While I would
> be surprised if any compilers of that era would actually find the most
> efficient code, I would not be in any way astonished if they did so, nor
> would I be astonished if a compiler added the lower 16 bits of the two
> products, sign-extended the result, and added that to l. If any result
> which is congruent to the correct result mod 65536 would be equally
> satisfactory, letting the compiler select among the calculation approaches
> would be a clear win for optimization. By contrast, writing the expression
> as "(unsigned)x*10000+(unsigned)y*7000+65536L;" would force the compiler
> to generate less-than-optimal code [and would also spend almost as many
> characters in the source text preventing the compiler from launching nuclear
> missiles as it spent doing useful work].
>

I can't figure out what you are trying to say here. I'll just assume
it's another one of your pathological examples designed specifically to
be undefined behaviour in some weird case, because you think a compiler
theoretically could used that to generate unexpected code.

It does not help when your example does not match your text (your
function makes no use of the parameter "l", which is a silly choice of
identifier).

>> If a particular compiler implemented something (with undefined behaviour
>> in the standards) in one way, and then changed how it implemented it,
>> then that is an issue between the compiler, its documentation, and its
>> users. If the compiler changed its documented behaviour, then that
>> could be a surprise to the user - but if the user has been relying on
>> undocumented behaviour, then they are at fault.
>
> Specifications for things generally won't bother promising not to do
> something unless the spec-writer can imagine someone thinking such a
> behavior was plausible. For example, if a future version of gcc were
> to generate nonsense machine code when a program uses type punning, even
> when using -fno-strict-aliasing, should the programmer be faulted for
> relying upon behavior that was never defined (gcc's documentation for
> -fno-strict-aliasing doesn't actually promise that pointer-based type
> punning will behave as though writing a pointer converts an object to a
> series of bytes, and reading a pointer interprets the underlying bytes
> as an object).

If the compiler documentation is imprecise, then it is the compiler
documentation that is at fault. In general, you can assume that the
compiler writers are trying to be as helpful as possible to their users
(I know /you/ like to assume they are evil spawns of Satan intent on
tricking unwary programmers - but the rest of us know they are trying to
help programmers generate efficient, correct code as easily as possible).

David Brown

unread,

Sep 16, 2016, 2:14:59 PM9/16/16

to

On 16/09/16 18:11, Tim Rentsch wrote:
> David Brown <david...@hesbynett.no> writes:
>

<snip>

>>
>> They do this because there are embedded compilers (notably TI's "Code
>> composer" compilers - sometimes referred to as "Code composter") which
>> do not zero-initialise such variables.
>
> I still hope we can confine the discussion to the realm of
> conforming C.

Perhaps the discussion can be limited to conforming C compilers - but
you should be aware that a lot of real-world programming is done with
almost-C compilers. You might not want to think about it, but you are
surrounded by devices programmed using such tools - and every time you
press the break pedal on your car, or visit the hospital, your life
depends on people writing correct and efficient code with these almost-C
tools. But I will try not to bring it into the thread again unless I
have a concrete and relevant example.

We are talking about a function "toUint16" that takes a signed integer
of type "int_least16_t" (or bigger signed integer type), known to be
between -32768 and 32767, and returns an unsigned integer of type
"uint_least16_t" such that 0..32767 maps to 0..32767, and -32768..-1
maps to 32768..65535 ?

I can't think of any solution that is independent of the size of the
types, and does not involve additional operations such as "& 0xffff" or
conditionals (which would lead to less efficient code on some
implementations).

If I haven't misunderstood the problem you are suggesting, then I am
very curious to know your suggested solution.

>
>> And in terms of "portability", one might consider "portable to
>> different programmers" as well as "portable to different compilers and
>> targets". There is a depressingly large proportion of C programmers
>> who don't know much about their tools, and fail to enable even basic
>> optimisations. Should we write code in a way that minimises the
>> inefficiency this causes, or should we just say that if those users
>> don't care about their runtime efficiency, we shouldn't care about
>> their runtime efficiency either? I am not trying to answer that
>> question, merely raising it.
>
> Ignorant programmers should be educated. Incompetent programmers
> should be fired. Any place that doesn't do that should put less
> emphasis on programming problems and more on management problems.
>

Yes, that's all true - in an ideal world. Sometimes people have to live
with what they have got, even if it is sub-par programmers (and sub-par
management).

supe...@casperkitty.com

unread,

Sep 16, 2016, 4:34:48 PM9/16/16

to

On Friday, September 16, 2016 at 12:59:14 PM UTC-5, David Brown wrote:
> > That's popular myth #2. The term "Implementation-Defined" is only used
> > in cases where *ALL* implementations are required to specify a precise
> > behavior. It is not used in cases where 99% of implementations might be
> > expected to define a specific behavior, but where it was deemed plausible
> > that some implementations might not be able to.
>
> Making up numbers and claims does not help your argument.

In what fraction of two's-complement implementations circa 1989 did left-
shifting of a negative value do anything other than multiply by 2^N,
shift the value bits left through the sign bit, or convert to unsigned,
do the shift, and then convert to signed (note that on two's-complement
implementations all three operations are equivalent).

Unless the answer is greater than 1%, I would consider my 99% figure to
be entirely accurate.

> > As evidence, I offer the fact that shifting a negative number left is UB
> > even though so far as I can tell every two's-complement machine (i.e. the
> > majority of C implementations) implements them the same way.
>
> Left-shift of a negative number /is/ defined, as long as its value makes
> sense. Perhaps you should read the standards before complaining about them.

I did read the standard. From N1570 6.5.7:

4 The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits
are filled with zeros.... If E1 has a signed type and nonnegative value,
and E1 x 2E2 is representable in the result type, then that is the
resulting value; otherwise, the behavior is undefined.

> The general rule for signed integers and their operations is that their
> /value/ is significant, not their representation. Operations are
> defined as long as the value makes sense.

That would be logical on two's-complement machines, but it's not what
the Standard actually says with regard to the left-shift operator.

> -5 << 3 is defined to -5 * 2³, which is -40. It is only overflow of
> signed integer shifts that is undefined behaviour. This is consistent
> with all other arithmetic operations on signed integers.

I would expect that some sign-magnitude implementations would have
regarded -5<<3 as 40, and would not be particularly astonished to discover
that some would trap (though I recall reading that an effort to find any
such implementations came up empty). I would expect that some ones-
complement implementations would interpret -5<<3 as -40, but that others
might interpret it as -47 (-5 is 0xFFFA in ones'-complement; shifting that
left with zero-fill would yield 0xFFD0, i.e. -47).

While there are a few two's-complement implementations where treating a
left shift of a negative value as defined behavior could sometimes cost
an extra instruction, I'm unaware of any two's-complement implementation
from the 1980s that didn't do so anyway.

> Now, it would have been /possible/ for the C standards folks to say that
> signed overflow was implementation dependent, not undefined behaviour.
> However, they specifically choose undefined behaviour to give compiler
> writers scope for optimisations - while also allowing compiler writers
> to give defined behaviour if they wanted.

If the Standard had made it Implementation-Defined behavior, that would
have required all implementations to ensure that any operations that
overflow always yield a fully-deterministic result. On some platforms
that would have cost nothing, but on others it would have represented a
significant expense.

> And how does every two's complement machine implement left shifting of
> negative numbers? As long as the result does not lead to overflow, they
> will all do it in the same way - they will follow the C standards. If
> it leads to overflow, then who knows what they will do - I certainly
> would not assume it is consistent. Certainly the behaviour for when the
> shift amount is too big (greater than or equal to the width of the type)
> will vary from processor to processor, and perhaps vary depending on
> whether the value is known at compile time or not.

Nothing in the C Standard requires that two's-complement machines handle
left-shift of negative numbers in that fashion, though I find it interesting
that you criticize me for making a claim about 99% of implementations while
you make a claim about 100%, notwithstanding the existence of hardware
platforms where guaranteeing such behavior would have a non-zero cost in
cases where a left-shift is followed by an operation that would require the
carry flag to be clear.

> The nature of optimisation has changed hugely, as compilers have got
> more sophisticated. If you want to go back to the days of limited
> optimisations, you can do so - most compilers let you control those
> features.

Unfortunately, compiler documentation often does a rather poor job of
documenting what is or is not guaranteed when using particular flags.
Further, the level of optimization that can make its way into a usable
executable is not optimized by requiring that programmers ensure that
programmers never give compilers any behavioral latitude, but rather
by making it safe for programmers to give compilers considerable but
not unlimited freedom.

> If the compiler documentation is imprecise, then it is the compiler
> documentation that is at fault. In general, you can assume that the
> compiler writers are trying to be as helpful as possible to their users
> (I know /you/ like to assume they are evil spawns of Satan intent on
> tricking unwary programmers - but the rest of us know they are trying to
> help programmers generate efficient, correct code as easily as possible).

Which is more important in a quality compiler:

1. The efficiency of executable code it can generate when given a strictly-
conforming source program.

2. The efficiency of executable code it can generate to perform a task
given a source program which exploits features of the target
architecture as well as compiler-specific features.

3. The efficiency of executable code it can generate to perform a task
when given a source program which exploits features of the target
architecture, but does not rely upon compiler-specific features that
would be incompatible with other compilers for the same platform.

4. The ability to process code which relies upon behavioral precedents set
by other implementations for similar targets, in a fashion consistent
with those precedents.

Today's compiler writers seem to regard #1 as their highest priority, and
are far less interested in #4.

Keith Thompson

unread,

Sep 16, 2016, 5:17:31 PM9/16/16

to

supe...@casperkitty.com writes:
> On Friday, September 16, 2016 at 12:59:14 PM UTC-5, David Brown wrote:

[...]

>> supercat wrote:
>> > As evidence, I offer the fact that shifting a negative number left is UB
>> > even though so far as I can tell every two's-complement machine (i.e. the
>> > majority of C implementations) implements them the same way.
>>
>> Left-shift of a negative number /is/ defined, as long as its value makes
>> sense. Perhaps you should read the standards before complaining about them.
>
> I did read the standard. From N1570 6.5.7:
>
> 4 The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits
> are filled with zeros.... If E1 has a signed type and nonnegative value,
> and E1 x 2E2 is representable in the result type, then that is the
> resulting value; otherwise, the behavior is undefined.

Right. Your statement was that "shifting a negative number
left is UB". It's UB only if the result is outside the range of
the signed type. Your statement implied (certainly to me, and
presumably to David) that left-shifting a negative number *always*
has undefined behavior. It doesn't.

If that's not what you meant, that's fine, but please recognize that
your statement was at best unclear.

[...]

supe...@casperkitty.com

unread,

Sep 16, 2016, 6:31:49 PM9/16/16

to

On Friday, September 16, 2016 at 4:17:31 PM UTC-5, Keith Thompson wrote:

> supercat writes:
> > I did read the standard. From N1570 6.5.7:
> >
> > 4 The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits
> > are filled with zeros.... If E1 has a signed type and nonnegative value,
> > and E1 x 2E2 is representable in the result type, then that is the
> > resulting value; otherwise, the behavior is undefined.
>
> Right. Your statement was that "shifting a negative number
> left is UB". It's UB only if the result is outside the range of
> the signed type. Your statement implied (certainly to me, and
> presumably to David) that left-shifting a negative number *always*
> has undefined behavior. It doesn't.

The language of the Standard only specifies behavior if E1 is unsigned, or
if *both* of the following conditions apply:

1. E1 is signed AND NON-NEGATIVE.

2. The result of multiplying E1 * 2^E2 would be representable in the
result type.

The expression -5 << 3 satisfies the second criterion, but fails the first.
Consequently, it invokes Undefined Behavior.

Keith Thompson

unread,

Sep 16, 2016, 7:59:15 PM9/16/16

to

You're absolutely right. Looks like I failed reading comprehension
for the day. My apologies.

Malcolm McLean

unread,

Sep 16, 2016, 11:38:23 PM9/16/16

to

On Friday, September 16, 2016 at 7:14:59 PM UTC+1, David Brown wrote:
> On 16/09/16 18:11, Tim Rentsch wrote:
> us something to talk about.
>
> We are talking about a function "toUint16" that takes a signed integer
> of type "int_least16_t" (or bigger signed integer type), known to be
> between -32768 and 32767, and returns an unsigned integer of type
> "uint_least16_t" such that 0..32767 maps to 0..32767, and -32768..-1
> maps to 32768..65535 ?
>
> I can't think of any solution that is independent of the size of the
> types, and does not involve additional operations such as "& 0xffff" or
> conditionals (which would lead to less efficient code on some
> implementations).
>

Yes, but it's a data reformatting problem. The actual question was
how to implement fput16le. In ANSI C, that function is making two
subroutine calls to fputc(). So a logical and is not going to make
the slightest difference.
Generally, data will come in as little-endian, big-endian, 16 bits,
8 bits, ten bits with a twiddle bit just for fun. You can't
control that because by definition data is "given", it comes
ultimately from some source outside of the program. And it goes
out as big endian, little endian, whatever. Often you can't control
that either. If the output is a Microsoft bitmap file, the image
dimensions must be little-endian and the pixels must be BGR.
Stupidly unconventional, but that's what the format says

But once we've written our import and export routines, the data
is ours. We control the representation. As long as we are clear
what the bits are and what they mean, and as long as we have enough
precision, there's no reason to keep images in the little-endian,
BGR format. We can pass them about as rgba buffers. Alpha is
redundant - it will always be 255 - but it's usually just easier
to have all your images in the system in the same internal format.

So these are once only functions. Unless you're doing digital
signals processing or something where sheer throughput is a challenge,
the functions are called twice on each datum, once on import, once
on export, and a logical and operation is not going to matter.

supe...@casperkitty.com

unread,

Sep 17, 2016, 2:27:19 AM9/17/16

to

On Friday, September 16, 2016 at 6:59:15 PM UTC-5, Keith Thompson wrote:
> supercat writes:
> > The expression -5 << 3 satisfies the second criterion, but fails the first.
> > Consequently, it invokes Undefined Behavior.
>
> You're absolutely right. Looks like I failed reading comprehension
> for the day. My apologies.

Returning to my earlier point, which seems more plausible:

1. The authors of the Standard wanted to invite even two's-complement
implementations to behave nonsensically when left-shifting a negative
value unless he programmer explicitly cast the value to "unsigned"
first, or...

2. The authors of the Standard thought it so obvious that quality
two's-complement implementations should treat left-shifts of negative
values as a multiplication (at least in cases where the product
would be representable) that there was no need to waste ink saying
so.

Before you recognized that you'd misread the Standard, you presumably would
have thought that a programmer would be entitled to expect that when x==-5,
a compiler would evaluate x<<3 as -40 with no side-effects. Does the fact
that the Standard imposes no requirements change that?

Keith Thompson

unread,

Sep 17, 2016, 3:50:35 PM9/17/16

to

supe...@casperkitty.com writes:
> On Friday, September 16, 2016 at 6:59:15 PM UTC-5, Keith Thompson wrote:
>> supercat writes:
>> > The expression -5 << 3 satisfies the second criterion, but fails
>> > the first. Consequently, it invokes Undefined Behavior.
>>
>> You're absolutely right. Looks like I failed reading comprehension
>> for the day. My apologies.
>
> Returning to my earlier point, which seems more plausible:
>
> 1. The authors of the Standard wanted to invite even two's-complement
> implementations to behave nonsensically when left-shifting a negative
> value unless he programmer explicitly cast the value to "unsigned"
> first, or...
>
> 2. The authors of the Standard thought it so obvious that quality
> two's-complement implementations should treat left-shifts of negative
> values as a multiplication (at least in cases where the product
> would be representable) that there was no need to waste ink saying
> so.

3. The authors of the standard didn't want to impose requirements on
left shifts of negative numbers. The natural behavior for a given
system could naturally vary depending on the representation used
for signed integers. The authors either did not know whether some
implementations might behave in odd ways (such as a run-time trap),
or knew of implementations that do so. Shifts of negative values
are not particularly important anyway; imposing requirements on
their behavior could, on some systems, impose extra costs on shifts
of non-negative values, which are important.

I find it slightly odd that a left shift of a negative number has
undefined behavior, while a right shift of a negative number yields
an implementation-defined value. I presume there's a valid reason
for that.

> Before you recognized that you'd misread the Standard, you presumably would
> have thought that a programmer would be entitled to expect that when x==-5,
> a compiler would evaluate x<<3 as -40 with no side-effects. Does the fact
> that the Standard imposes no requirements change that?

Certainly. On more careful reading, the standard talks about
shifting bits, not values. That's unambiguous for unsigned types
and for non-negative values of signed types, but representations for
negative values of signed types can vary, so it's not immediately
clear what shifting bits even means. I could, I suppose, think
about how it should work for the three permitted representations
(I haven't taken the time to do so), but that wouldn't say much
about how the shift instructions of real-world CPUs actually behave.
And prior to C99, the standard didn't even restrict the possible
representations for signed types to the three choices we have now.

(Perhaps it's time for the next edition of the C standard to mandate
2's-complement and tighten up some of the requirements, but I think
that's a separate discussion. The current standard has to cater
to the three possible representations -- and represention doesn't
imply a particular behavior.)

I don't even use shift operators on signed types; as far as I'm
concerned they only make sense for unsigned types. The fact that
they're well defined for non-negative signed values that don't
overflow is just a (small) bonus. YMMV.

If I have a possibly negative signed value and I want to multiply
it by 8, I just multiply it by 8. If the compiler can implement
that as a shift instruction, that's fine with me. If it can't,
that's fine with me too.

I just experimented with several compilers and found that
-5 << 3 == -40
I find this mildly interesting, but not particularly useful.

David Brown

unread,

Sep 17, 2016, 5:59:47 PM9/17/16

to

On 17/09/16 04:06, Keith Thompson wrote:
> supe...@casperkitty.com writes:
>> On Friday, September 16, 2016 at 4:17:31 PM UTC-5, Keith Thompson wrote:
>>> supercat writes:
>>>> I did read the standard. From N1570 6.5.7:
>>>>
>>>> 4 The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits
>>>> are filled with zeros.... If E1 has a signed type and nonnegative value,
>>>> and E1 x 2E2 is representable in the result type, then that is the
>>>> resulting value; otherwise, the behavior is undefined.
>>>
>>> Right. Your statement was that "shifting a negative number
>>> left is UB". It's UB only if the result is outside the range of
>>> the signed type. Your statement implied (certainly to me, and
>>> presumably to David) that left-shifting a negative number *always*
>>> has undefined behavior. It doesn't.
>>
>> The language of the Standard only specifies behavior if E1 is unsigned, or
>> if *both* of the following conditions apply:
>>
>> 1. E1 is signed AND NON-NEGATIVE.
>>
>> 2. The result of multiplying E1 * 2^E2 would be representable in the
>> result type.
>>
>> The expression -5 << 3 satisfies the second criterion, but fails the first.
>> Consequently, it invokes Undefined Behavior.
>
> You're absolutely right. Looks like I failed reading comprehension
> for the day. My apologies.
>

My apologies too - I made the same mistake. (And my apologies to Keith
too if he made his mistake by assuming I was correct.)

The silly thing about this is that when Supercat first mentioned this,
my first thought was "it doesn't matter that left-shifting a negative
number is undefined behaviour - you would normally stick to unsigned
types when doing bit manipulation". And then I dug out my C standards
and read them to see what they said - and got it wrong.

So back to my first thought, then. Why would one ever want to
left-shift a negative number?

It is presumably left undefined because hardware with integer
representation other than two's complement could not give a sensible,
useful and efficient implementation - even making it
implementation-defined would be an imposition for such targets.

It is a little odd that right-shift of negative numbers is
implementation-defined rather than undefined behaviour.

David Brown

unread,

Sep 17, 2016, 6:07:03 PM9/17/16

to

It changes nothing because:

a) The compiler I use most, gcc, does exactly what you expect here. The
documentation of this behaviour in gcc could be clearer, but combining
the documentation with common sense (the compiler will use the simplest
and most efficient reasonable instruction on the target cpu), you arrive
at this interpretation.

b) I don't shift negative numbers. It is rare that I would even shift
positive values in a signed type - about the only case would be if I
wanted to multiply an integer by a power of 2 (such as -5 * (1 << 3),
rather than -5 << 3). The compiler will no doubt turn the
multiplication into a shift if that is more efficient on the particular
architecture.

Keith Thompson

unread,

Sep 17, 2016, 6:17:46 PM9/17/16

to

David Brown <david...@hesbynett.no> writes:
> On 17/09/16 08:27, supe...@casperkitty.com wrote:

[...]

>> Before you recognized that you'd misread the Standard, you presumably
>> would have thought that a programmer would be entitled to expect that
>> when x==-5, a compiler would evaluate x<<3 as -40 with no
>> side-effects. Does the fact that the Standard imposes no
>> requirements change that?
>
> It changes nothing because:
>
> a) The compiler I use most, gcc, does exactly what you expect here.
> The documentation of this behaviour in gcc could be clearer, but
> combining the documentation with common sense (the compiler will use
> the simplest and most efficient reasonable instruction on the target
> cpu), you arrive at this interpretation.

I probably wouldn't rely on that, partly because ...

> b) I don't shift negative numbers.

... but also because gcc's behavior could easily vary for different
target CPUs. It's likely that a shift on a signed integer would
use the same instruction as a shift on an unsigned integer, which
would yield the "expected" -40 on a 2's-complement system (and AFAIK
gcc only supports 2's-complement), but there could easily be some
pitfall that I haven't thought about.

Ensuring that I only shift unsigned integers, or at least that I
never shift negative integers, is a better use of my time than
verifying that shifts of negative integers are going to "work"
the way I might expect them to.

Malcolm McLean

unread,

Sep 17, 2016, 7:49:25 PM9/17/16

to

On Saturday, September 17, 2016 at 10:59:47 PM UTC+1, David Brown wrote:
>
> So back to my first thought, then. Why would one ever want to
> left-shift a negative number?
>
> It is presumably left undefined because hardware with integer
> representation other than two's complement could not give a sensible,
> useful and efficient implementation - even making it
> implementation-defined would be an imposition for such targets.
>
> It is a little odd that right-shift of negative numbers is
> implementation-defined rather than undefined behaviour.
>

The temptation is to write fput16le like this

/* don't use this, broken example */
fput16le(int x)
{
fputc(x & 0xFF, fp);
fputc( (x >> 8) &0xFF, fp);
}

I used that code for a long time with no problems. But it's
undefined behaviour and thus technically incorrect as C code.

As Tim pointed out a while back, if you simply cast to unsigned,
it does the right thing. Even on one's complement hardware.

Ben Bacarisse

unread,

Sep 17, 2016, 8:19:27 PM9/17/16

to

Malcolm McLean <malcolm...@btinternet.com> writes:
<snip>

> The temptation is to write fput16le like this
>
> /* don't use this, broken example */
> fput16le(int x)
> {
> fputc(x & 0xFF, fp);
> fputc( (x >> 8) &0xFF, fp);
> }
>
> I used that code for a long time with no problems. But it's
> undefined behaviour and thus technically incorrect as C code.

Is this another case of Malcolm-speak? It's not undefined in the C
standard sense of the word. (The implicit int in the declaration is
presumably not intended.)

<snip>
--
Ben.

supe...@casperkitty.com

unread,

Sep 17, 2016, 11:35:32 PM9/17/16

to

On Saturday, September 17, 2016 at 2:50:35 PM UTC-5, Keith Thompson wrote:
> 3. The authors of the standard didn't want to impose requirements on
> left shifts of negative numbers. The natural behavior for a given
> system could naturally vary depending on the representation used
> for signed integers. The authors either did not know whether some
> implementations might behave in odd ways (such as a run-time trap),
> or knew of implementations that do so. Shifts of negative values
> are not particularly important anyway; imposing requirements on
> their behavior could, on some systems, impose extra costs on shifts
> of non-negative values, which are important.

There are at least two different sensible behaviors for ones'-complement
systems, and at least three different sensible behaviors for sign-magnitude
systems. Mandating a particular behavior from such systems would almost
certainly have broken some code that relied upon its system using the
"other" behavior. There is only one sensible behavior for two's-complement
systems.

> I find it slightly odd that a left shift of a negative number has
> undefined behavior, while a right shift of a negative number yields
> an implementation-defined value. I presume there's a valid reason
> for that.

On a sign-magnitude system, a left-shift of a negative number might
plausibly be trapped as an overflow condition, since the sign bit would
likely get flipped [trapping is one of the three plausible behaviors on
such systems]. I don't think any sign-magnitude systems *actually* trap
in that case, but it would be a sensible behavior.

> > Before you recognized that you'd misread the Standard, you presumably would
> > have thought that a programmer would be entitled to expect that when x==-5,
> > a compiler would evaluate x<<3 as -40 with no side-effects. Does the fact
> > that the Standard imposes no requirements change that?
>
> Certainly. On more careful reading, the standard talks about
> shifting bits, not values. That's unambiguous for unsigned types
> and for non-negative values of signed types, but representations for
> negative values of signed types can vary, so it's not immediately
> clear what shifting bits even means.

Is the number of conforming, non-two's-complement, implementations of C99
greater than zero? So far as I can tell, there is no reason to expect
that any practical conforming implementation of any future standard will
ever use anything other than two's-complement as its primary integer
representation.

If one regards two's-complement mathematically as being performed by
sign-extending to an infinite-precision type, performing the operation
on those values, and then regarding the result as defined if and only
if the top bit of the result type matches all the bits to its left, there
is no ambiguity whatsoever as to what a left-shift means.

> I could, I suppose, think
> about how it should work for the three permitted representations
> (I haven't taken the time to do so), but that wouldn't say much
> about how the shift instructions of real-world CPUs actually behave.
> And prior to C99, the standard didn't even restrict the possible
> representations for signed types to the three choices we have now.

While might not make much sense to define the behavior on a hypothetical
sign-magnitude machine, that shouldn't prevent programmers from being
able to use it on machines that actually exist.

> (Perhaps it's time for the next edition of the C standard to mandate
> 2's-complement and tighten up some of the requirements, but I think
> that's a separate discussion. The current standard has to cater
> to the three possible representations -- and represention doesn't
> imply a particular behavior.)

The proper thing to do would be to recognize that implementations other
than two's-complement are not "normal", and require that such
implementations define macros to indicate their deviations from typical
behaviors. At the same time, it may be useful to allow implementations
to use integer forms beyond the indicated three provided they document
such behavior. If a 32-bit platform treated 0x80000000 as a NaN, that
could be useful (providing the benefits of overflow trapping but at a
much lower cost) but extending the concept to 64 bits would be expensive
*unless* it were allowed to define INT_MIN as -0x7FFFFFFF00000000 rather
than -0x7FFFFFFFFFFFFFFF-1]. At present, however, the Standard would not
allow such behavior, thus making support for such behavior impractical
with multi-precision math.

> I don't even use shift operators on signed types; as far as I'm
> concerned they only make sense for unsigned types. The fact that
> they're well defined for non-negative signed values that don't
> overflow is just a (small) bonus. YMMV.

How would you multiply a signed number by 2^N in such a fashion as to
yield defined behavior in all cases where the result would be representable, without relying upon implementation-defined behavior? While casting to
unsigned would yield working code on sane implementations, it does not
describe the actual intention of the code nearly as well as a simple left
shift. Note also that x*(1<<y) will fail for -1<<31, and -(x*(-1<<y))
will fail for (-2147483647-1)<<0.

> If I have a possibly negative signed value and I want to multiply
> it by 8, I just multiply it by 8. If the compiler can implement
> that as a shift instruction, that's fine with me. If it can't,
> that's fine with me too.

Some applications use variable powers of two. While many compilers would
be likely to simplify x*(1<<y) as x<<y, the former will invoke UB in cases
where the latter would not, and the former will also avoid the need to
ensure that code uses the properly-sized constant 1. Note that using a
long long constant when x is 32 bits will force many compiler to generate
less efficient code than they could with an "int" constant, since invoking
expression x*(1LL<<y) with y==32 would result in an implementation-defined
conversion from "long long" to "int", but on e.g. x86 platforms the simplest
code for x<<y would yield x when y is 32.

> I just experimented with several compilers and found that
> -5 << 3 == -40
> I find this mildly interesting, but not particularly useful.

Since -5 is ...11111111011 and since ...11111111011000 is -40, that should
hardly be surprising.

supe...@casperkitty.com

unread,

Sep 17, 2016, 11:46:26 PM9/17/16

to

On Saturday, September 17, 2016 at 5:07:03 PM UTC-5, David Brown wrote:
> a) The compiler I use most, gcc, does exactly what you expect here. The
> documentation of this behaviour in gcc could be clearer, but combining
> the documentation with common sense (the compiler will use the simplest
> and most efficient reasonable instruction on the target cpu), you arrive
> at this interpretation.

That logic is not applicable to gcc; it will identify constructs that will
probably work, but not those that the authors of gcc won't be more than
happy to break in the name of "optimization".

Consider something like the following:

// Assume usmall is an unsigned type half the size of "unsigned int"
unsigned mul(usmall x, usmall y) { return x*y; }

The authors of the Standard have indicated that one of the reasons they
chose to have short unsigned types promote as signed is that doing so
would not break code like the above on silent-wraparound two's-complement
hardware (which they observed was common to the majority of then-current
implementations).

If gcc inlines the above function, however, it may arbitrarily alter the
effect of the surrounding code.

> b) I don't shift negative numbers. It is rare that I would even shift
> positive values in a signed type - about the only case would be if I
> wanted to multiply an integer by a power of 2 (such as -5 * (1 << 3),
> rather than -5 << 3). The compiler will no doubt turn the
> multiplication into a shift if that is more efficient on the particular
> architecture.

I would regard x<<y as the idiomatic means of multiplying x by 1<<y, but
without any need to worry about sizing the constant according to the size
of x.

Malcolm McLean

unread,

Sep 18, 2016, 3:09:52 AM9/18/16

to

On Sunday, September 18, 2016 at 4:35:32 AM UTC+1, supe...@casperkitty.com wrote:
>
> Is the number of conforming, non-two's-complement, implementations of C99
> greater than zero? So far as I can tell, there is no reason to expect
> that any practical conforming implementation of any future standard will
> ever use anything other than two's-complement as its primary integer
> representation.
>

Here's an idea for you.
Data consists of reals, integers, strings and "connections" or graph edges,
which in programming we represent by addresses. Most other data is
made up of combinations of those four elements (I don't know if anyone
can provide a good counterexample of data that isn't well so-described).

Now modern machines operate on 64 bits. But we seldom if ever need
all 64 bits. So why not "steal" a bit or two to tag each 64 bit value
with its type? Then we've got two advantages, fast machine-level
operational validity checking, and the ability to tighten up
high level languages.

But of course our integers now look like this:

integer tag No-one will ever use these bits Data (big negative number)
/ / /
01 111111 11111111 11011110 10101011 00000111 11110000 10110111 00000001

David Brown

unread,

Sep 18, 2016, 9:58:12 AM9/18/16

to

On 18/09/16 02:32, Keith Thompson wrote:
> David Brown <david...@hesbynett.no> writes:
>> On 17/09/16 08:27, supe...@casperkitty.com wrote:
> [...]
>>> Before you recognized that you'd misread the Standard, you presumably
>>> would have thought that a programmer would be entitled to expect that
>>> when x==-5, a compiler would evaluate x<<3 as -40 with no
>>> side-effects. Does the fact that the Standard imposes no
>>> requirements change that?
>>
>> It changes nothing because:
>>
>> a) The compiler I use most, gcc, does exactly what you expect here.
>> The documentation of this behaviour in gcc could be clearer, but
>> combining the documentation with common sense (the compiler will use
>> the simplest and most efficient reasonable instruction on the target
>> cpu), you arrive at this interpretation.
>
> I probably wouldn't rely on that, partly because ...
>
>> b) I don't shift negative numbers.
>
> ... but also because gcc's behavior could easily vary for different
> target CPUs. It's likely that a shift on a signed integer would
> use the same instruction as a shift on an unsigned integer, which
> would yield the "expected" -40 on a 2's-complement system (and AFAIK
> gcc only supports 2's-complement), but there could easily be some
> pitfall that I haven't thought about.

There could be some other effect here, but I doubt it - as you say, gcc
only supports 2's complement systems. In all cpus that I know,
"arithmetic left shift" and "logical left shift" are the same, and will
give power-of-two multiplication until you hit overflows.

Having said that, I fully agree with your opinion below. Even though I
think left-shift of negative numbers is something you can rely on with
gcc (and any other compiler that documents the behaviour), I still can't
see any reason to do it.

David Brown

unread,

Sep 18, 2016, 10:08:56 AM9/18/16

to

On 18/09/16 05:46, supe...@casperkitty.com wrote:
> On Saturday, September 17, 2016 at 5:07:03 PM UTC-5, David Brown wrote:
>> a) The compiler I use most, gcc, does exactly what you expect here. The
>> documentation of this behaviour in gcc could be clearer, but combining
>> the documentation with common sense (the compiler will use the simplest
>> and most efficient reasonable instruction on the target cpu), you arrive
>> at this interpretation.
>
> That logic is not applicable to gcc; it will identify constructs that will
> probably work, but not those that the authors of gcc won't be more than
> happy to break in the name of "optimization".
>
> Consider something like the following:
>
> // Assume usmall is an unsigned type half the size of "unsigned int"
> unsigned mul(usmall x, usmall y) { return x*y; }
>
> The authors of the Standard have indicated that one of the reasons they
> chose to have short unsigned types promote as signed is that doing so
> would not break code like the above on silent-wraparound two's-complement
> hardware (which they observed was common to the majority of then-current
> implementations).
>

Sometimes promoting "usmall" to signed int gives the simplest, clearest
and "least surprising" results - sometimes promoting it to unsigned int
would have made more sense. The standards committee choose to pick one
method, rather than a combination (as that would have been more
complicated), or "no promotion" (as that would have been in conflict
with older C compilers, and perhaps inconvenient to many users). And
they picked "signed int", as the most useful choice in their opinion and
experience.

So that is the way C works. It is not simply the way gcc works, or the
way "modern" C works - it is the way /C/ works. Get over it, or find a
different programming language. And for the good of all of us here,
/please/ stop dragging out this example in every second post you make.

> If gcc inlines the above function, however, it may arbitrarily alter the
> effect of the surrounding code.
>
>> b) I don't shift negative numbers. It is rare that I would even shift
>> positive values in a signed type - about the only case would be if I
>> wanted to multiply an integer by a power of 2 (such as -5 * (1 << 3),
>> rather than -5 << 3). The compiler will no doubt turn the
>> multiplication into a shift if that is more efficient on the particular
>> architecture.
>
> I would regard x<<y as the idiomatic means of multiplying x by 1<<y, but
> without any need to worry about sizing the constant according to the size
> of x.
>

I think the best way to multiply x by 1 << y is usually x * (1 << y). I
don't need to "worry about the sizing of the constant according to the
size of x" - C's promotion rules handle that fine. I might need to
write "1L << y" or "1LL << y" if /y/ is too big, but the size of x does
not matter. And I might need to cast x to a larger size to avoid an
overflow of the whole expression, but that applies no matter how you
write it.

Malcolm McLean

unread,

Sep 18, 2016, 10:16:27 AM9/18/16

to

No, it's what is called mis-speaking.

James R. Kuyper

unread,

Sep 18, 2016, 10:20:08 AM9/18/16

to

On 09/17/2016 08:19 PM, Ben Bacarisse wrote:
> Malcolm McLean <malcolm...@btinternet.com> writes:
> <snip>
>> The temptation is to write fput16le like this
>>
>> /* don't use this, broken example */
>> fput16le(int x)
>> {
>> fputc(x & 0xFF, fp);
>> fputc( (x >> 8) &0xFF, fp);
>> }
>>
>> I used that code for a long time with no problems. But it's
>> undefined behaviour and thus technically incorrect as C code.
>
> Is this another case of Malcolm-speak? It's not undefined in the C

> standard sense of the word. ...

If x is negative, the result of the shift operation is
implementation-defined. That's not as bad as undefined behavior, but
it's bad enough to be a problem.

Keith Thompson

unread,

Sep 18, 2016, 10:27:52 AM9/18/16

to

supe...@casperkitty.com writes:
> On Saturday, September 17, 2016 at 2:50:35 PM UTC-5, Keith Thompson wrote:

[...]

>> (Perhaps it's time for the next edition of the C standard to mandate
>> 2's-complement and tighten up some of the requirements, but I think
>> that's a separate discussion. The current standard has to cater
>> to the three possible representations -- and represention doesn't
>> imply a particular behavior.)
>
> The proper thing to do would be to recognize that implementations
> other than two's-complement are not "normal", and require that such
> implementations define macros to indicate their deviations from typical
> behaviors.

Too complicated. If non-two's-complement systems are really obsolete,
why not just have the next C standard require two's-complement? If
there are any remaining one's-complement or sign-and-magnitude systems,
let them claim conformance to the old C11 standard, or just let them be
non-conforming.

That's assuming we can make that assumption. It's likely that there's
no good reason to use anything other than two's-complement, but I'm not
100% confident that that will never change. This requires input from
people who know more about this stuff than I do.

And of course two's-complement representation doesn't necessarily imply
the usual two's complement behavior in all cases; it tells us how
negative integers are represented, but it doesn't tell us the result of
evaluating INT_MAX+1 or -5<<3. That's another set of assumptions that
might or might not be sensible to make in a hypothetical new standard.

> At the same time, it may be useful to allow implementations
> to use integer forms beyond the indicated three provided they document
> such behavior. If a 32-bit platform treated 0x80000000 as a NaN, that
> could be useful (providing the benefits of overflow trapping but at a
> much lower cost) but extending the concept to 64 bits would be expensive
> *unless* it were allowed to define INT_MIN as -0x7FFFFFFF00000000 rather
> than -0x7FFFFFFFFFFFFFFF-1]. At present, however, the Standard would not
> allow such behavior, thus making support for such behavior impractical
> with multi-precision math.

C already explicitly allows the two's-complement representation with
sign bit 1 and all value bits 0 to be a trap representation. As for
-0x7FFFFFFF00000000, I'm skeptical that any such systems exist or are
worth considering.

>> I don't even use shift operators on signed types; as far as I'm
>> concerned they only make sense for unsigned types. The fact that
>> they're well defined for non-negative signed values that don't
>> overflow is just a (small) bonus. YMMV.
>
> How would you multiply a signed number by 2^N in such a fashion as to
> yield defined behavior in all cases where the result would be
> representable, without relying upon implementation-defined behavior?

I'd probably use the "*" operator, for example -5 * (1<<3).

[...]

supe...@casperkitty.com

unread,

Sep 18, 2016, 1:03:30 PM9/18/16

to

On Sunday, September 18, 2016 at 9:08:56 AM UTC-5, David Brown wrote:
> Sometimes promoting "usmall" to signed int gives the simplest, clearest
> and "least surprising" results - sometimes promoting it to unsigned int
> would have made more sense. The standards committee choose to pick one
> method, rather than a combination (as that would have been more
> complicated), or "no promotion" (as that would have been in conflict
> with older C compilers, and perhaps inconvenient to many users). And
> they picked "signed int", as the most useful choice in their opinion and
> experience.

It is the most useful choice in cases where there the result is used as
a signed value. In the more frequent cases where it's used as an unsigned
int, it would almost always be more useful to process values as unsigned,
but the authors of the Standard noted that implementations were already
doing that, nothing in the Standard would prohibit them from doing so,
and I see no evidence that the authors of C89 thought that commonplace
implementations might ever stop doing so.

The C89 rationale observes that on silent-wraparound two's-complement
implementations, signed and unsigned promotion would yield identical results
unless two conditions BOTH applied:

1. The result was outside the range of INT, and

2. The result was converted to a larger type, used as the left operand
of >>, or used as either operand to /, %, >, <, >=, or <=.

The Standard *allows* implementations to have different signed and unsigned
behaviors in cases where 1 applies but 2 does not; I see no reason why
the authors would have mentioned the second condition unless the authors
of the Standard thought commonplace implementations would apply it even
without a mandate.

> So that is the way C works. It is not simply the way gcc works, or the
> way "modern" C works - it is the way /C/ works. Get over it, or find a
> different programming language. And for the good of all of us here,
> /please/ stop dragging out this example in every second post you make.

Do you think the authors of the Standard thought there was any likelihood
that commonplace implementations would stop behaving in the described
fashion? Do you think the authors of the Standard were prone to mandating
that implementations do things which they would obviously do anyway?

Many C compilers work precisely as described the Standard's rationale,
100% of the time, and the vast majority work in such fashion 99.9% of
the time. I would suggest that a quality implementation should reliably
work in such fashion 100% of the time absent a compelling reason to do
otherwise, and I would not regard a 0.1% chance of being able to find an
optimization as being a compelling reason.

> > I would regard x<<y as the idiomatic means of multiplying x by 1<<y, but
> > without any need to worry about sizing the constant according to the size
> > of x.
> >
>
> I think the best way to multiply x by 1 << y is usually x * (1 << y). I
> don't need to "worry about the sizing of the constant according to the
> size of x" - C's promotion rules handle that fine. I might need to
> write "1L << y" or "1LL << y" if /y/ is too big, but the size of x does
> not matter. And I might need to cast x to a larger size to avoid an
> overflow of the whole expression, but that applies no matter how you
> write it.

The expression x*(1<<y) will fail if x is -1 and y is INT_BITS. I would
also suggest that in expressions which use >> for round-to-negative-infinity
division (which is how quality two's-complement implementations treat >>,
since there is no other round-to-negative-infinity division operator),
and where all scaling is by powers of two, it is more natural to use a
mix of << and >> than to throw in the * operator as well. If C had proper
operators for Euclidian division and modulus, I might be inclined to favor
those over >> or &, but it doesn't.

David Brown

unread,

Sep 18, 2016, 1:38:21 PM9/18/16

to

On 18/09/16 18:48, Keith Thompson wrote:
> supe...@casperkitty.com writes:
>> On Saturday, September 17, 2016 at 2:50:35 PM UTC-5, Keith Thompson wrote:
> [...]
>>> (Perhaps it's time for the next edition of the C standard to mandate
>>> 2's-complement and tighten up some of the requirements, but I think
>>> that's a separate discussion. The current standard has to cater
>>> to the three possible representations -- and represention doesn't
>>> imply a particular behavior.)
>>
>> The proper thing to do would be to recognize that implementations
>> other than two's-complement are not "normal", and require that such
>> implementations define macros to indicate their deviations from typical
>> behaviors.
>
> Too complicated. If non-two's-complement systems are really obsolete,
> why not just have the next C standard require two's-complement? If
> there are any remaining one's-complement or sign-and-magnitude systems,
> let them claim conformance to the old C11 standard, or just let them be
> non-conforming.

That would have my vote. (But I would not want signed integer overflow
to be defined behaviour.) Drop padding bits in types too, any character
sets that don't match ASCII for the first 128 characters, CHAR_BIT that
is not 8, 16 or 32, and int sizes that are not 16 or 32 bits. I'm sure
there are a good many other things that are implementation defined that
could be fixed, or at least more limited, without affecting any systems
that would be likely to support C20 (or whatever the next standard
becomes). It might not affect much code, but it could simplify the
standards document - and that would be great!

>
> That's assuming we can make that assumption. It's likely that there's
> no good reason to use anything other than two's-complement, but I'm not
> 100% confident that that will never change. This requires input from
> people who know more about this stuff than I do.

From the point of view of hardware and the gate-level implementation of
a cpu, there is no good reason to use anything other than two's
complement. Long ago, you could argue that signed magnitude forms were
more efficient if you did a lot of multiplication or division, or in
combination with BCD for making numbers that are easily converted to
human-readable format. These are not issues for modern systems.

(But while I know more about gate-level implementation of ALUs than most
people, I don't know /everything/ here - presumably the C standards
people would ask many others before making such decisions.)

Ben Bacarisse

unread,

Sep 18, 2016, 2:36:50 PM9/18/16

to

Malcolm McLean <malcolm...@btinternet.com> writes:

> On Sunday, September 18, 2016 at 4:35:32 AM UTC+1, supe...@casperkitty.com wrote:
>>
>> Is the number of conforming, non-two's-complement, implementations of C99
>> greater than zero? So far as I can tell, there is no reason to expect
>> that any practical conforming implementation of any future standard will
>> ever use anything other than two's-complement as its primary integer
>> representation.
>>
> Here's an idea for you.
> Data consists of reals, integers, strings and "connections" or graph edges,
> which in programming we represent by addresses. Most other data is
> made up of combinations of those four elements (I don't know if anyone
> can provide a good counterexample of data that isn't well
> so-described).

Functions? Types? And since, presumably, "array of" is one on the
combining operators, I'm not sure why you pick strings as being basic.

<snip>
--
Ben.

Keith Thompson

unread,

Sep 18, 2016, 4:10:56 PM9/18/16

to

David Brown <david...@hesbynett.no> writes:
> On 18/09/16 18:48, Keith Thompson wrote:
>> supe...@casperkitty.com writes:
>>> On Saturday, September 17, 2016 at 2:50:35 PM UTC-5, Keith Thompson wrote:
>> [...]
>>>> (Perhaps it's time for the next edition of the C standard to mandate
>>>> 2's-complement and tighten up some of the requirements, but I think
>>>> that's a separate discussion. The current standard has to cater
>>>> to the three possible representations -- and represention doesn't
>>>> imply a particular behavior.)
>>>
>>> The proper thing to do would be to recognize that implementations
>>> other than two's-complement are not "normal", and require that such
>>> implementations define macros to indicate their deviations from typical
>>> behaviors.
>>
>> Too complicated. If non-two's-complement systems are really obsolete,
>> why not just have the next C standard require two's-complement? If
>> there are any remaining one's-complement or sign-and-magnitude systems,
>> let them claim conformance to the old C11 standard, or just let them be
>> non-conforming.
>
> That would have my vote. (But I would not want signed integer overflow
> to be defined behaviour.) Drop padding bits in types too, any character
> sets that don't match ASCII for the first 128 characters,

Padding bits might still be sensible in some cases. For example, there
have been systems in the past, presumably optimized for floating-point,
that represent integers as floating-point numbers with the exponent
fixed at zero (or something like that). It's worth asking whether we
can drop the idea of padding bits, but I'm not ready to assume that we
can do so.

> CHAR_BIT that
> is not 8, 16 or 32, and int sizes that are not 16 or 32 bits.

64-bit int is perfectly sensible for 64-bit systems. The main
reason, I think, that it's not more common is that it leaves a
gap; if you have 8-bit char and 64-int, and short is the only type
in between, then you can't have both 16-bit and 32-bit integers.
Extended integer types, appropriate typedefs in <stdint.h>, should
solve that, but I don't think any implementations have taken
advantage of that (probably because of backward compatibility).

luser droog

unread,

Sep 18, 2016, 4:26:25 PM9/18/16

to

I've run into that with my toy lisp interpreter which tries
to do bitfields arithmetically on signed integers.

https://github.com/luser-dr00g/sexp.c/blob/4c8ccf05d45f36bba2e866736cc101d1d404f45c/sexp.c#L253
assert((-1>>1)==-1); /*require 2's-complement and right-shift must be sign-preserving */

Is there a better way to make this assertion?
Can it (even theoretically) be checked at compile time
to produce an #error?

supe...@casperkitty.com

unread,

Sep 18, 2016, 4:54:21 PM9/18/16

to

On Sunday, September 18, 2016 at 9:27:52 AM UTC-5, Keith Thompson wrote:
> Too complicated. If non-two's-complement systems are really obsolete,
> why not just have the next C standard require two's-complement? If
> there are any remaining one's-complement or sign-and-magnitude systems,
> let them claim conformance to the old C11 standard, or just let them be
> non-conforming.

There are many tasks which can be performed most efficiently using features
or guarantees that could be cheaply supported by most platforms. If a
using some feature would make 5% of programs more efficient than they would
be without it, but only 95% of implementations can support it, which be
most helpful:

1. Say that the Standard will be inapplicable to those programs unless they
are written inefficiently.

2. Say that the Standard will be inapplicable to the implementations that
can't support that feature or guarantee.

3. Say that the Standard will describe the behavior of the programs that use
that feature or guarantee on the 95% of platforms where it is available,
while also describing the behavior of the 95% of programs that have no
need for the feature on 100% of platforms, including those that don't
support the feature.

I would think #3 would be the most useful course of action, and that it's
what the authors of the Standard intended, but they didn't imagine that
compiler writers targeting the x86 architecture in the 21st century would
not want to support the same behaviors as compiler writers for that platform
were supporting in 1987 (the present x86 architecture debuted with the 80386
in 1985).

> That's assuming we can make that assumption. It's likely that there's
> no good reason to use anything other than two's-complement, but I'm not
> 100% confident that that will never change. This requires input from
> people who know more about this stuff than I do.

Any platform which can perform Standard-compliant unsigned math efficiently
will perform two's-complement math reasonably efficiently. The only
platforms which can perform other forms of integer math significantly more
efficiently than two's-complement are those where either:

1. Compatibility with other systems compels the use of other formats,
something which isn't likely to happen.

2. Unsigned math is significantly less efficient than signed math because
it has to be "emulated" using sequences of signed-math instructions.

There are some implementations where unsigned math is less efficient than
signed math, but that's mainly because C doesn't have any unsigned types
that don't require a compiler to truncate values deterministically after
each computation.

> And of course two's-complement representation doesn't necessarily imply
> the usual two's complement behavior in all cases; it tells us how
> negative integers are represented, but it doesn't tell us the result of
> evaluating INT_MAX+1 or -5<<3. That's another set of assumptions that
> might or might not be sensible to make in a hypothetical new standard.

There are many cases where a compiler could reap some huge advantages to
be able to treat integers as though they promote to arbitrary larger types
whose upper bits may be non-deterministically retained or dropped whenever
the compiler sees fit. There are also cases where it may be useful to have
a compiler guarantee that it will trap in implementation-defined fashion
in cases where arithmetic overflow occurs that might affect a computed
result, but not necessarily in cases where any lost precision would end up
being ignored anyway. In cases where the above behaviors would meet an
application's requirements, a compiler that could guarantee to uphold them
would be capable of meeting application requirements more efficiently than
one which couldn't (if the result of a particular computation is sometimes
ignored, a compiler that recognized those cases could omit not just the
overflow check but the entire computation; if user code checks for and
traps on overflows, however, a compiler could would be required to perform
the checks even for the otherwise-irrelevant computations.

> C already explicitly allows the two's-complement representation with
> sign bit 1 and all value bits 0 to be a trap representation. As for
> -0x7FFFFFFF00000000, I'm skeptical that any such systems exist or are
> worth considering.

If a 32-bit system recognizes 0x80000000 as a NaN, it could easily
recognize any 64-bit number whose upper word is 0x80000000 as a NaN,
and uphold NaN semantics. If the system needs to add 0x123456789 to
0x7FFFFFFFFFFFFFFF, it could process the lower word (yielding 0x23456788
plus a carry), and then the upper word (yielding an overflow, and thus
a 0x80000000 NaN). If the only valid NaN form were 0x8000000000000000
it would be necessary to revisit the lower word after each computation
to ensure that it was zeroed if the upper portion of the number yielded
a NaN result.

Some applications require that the usable range of integer values extend
all the way from -2^63-1 to +2^63-1, but most applications don't need the
entire range. Being able to detect that overflows have occurred would be
a useful feature, and including integer NaN support in future architectures
would not be hard, but unless a language would allow implementations to
expose it to a programmer in useful fashion there wouldn't be much point.

> > How would you multiply a signed number by 2^N in such a fashion as to
> > yield defined behavior in all cases where the result would be
> > representable, without relying upon implementation-defined behavior?
>
> I'd probably use the "*" operator, for example -5 * (1<<3).

That operator will yield UB in cases where << would not, and I haven't seen
any nice expressions which don't rely upon implementation-defined
conversions from unsigned to signed, and which don't fail for either -1<<(#
of value bits) or for INT_MIN<<0, both of which are perfectly defined on
two's-complement systems. Conversion to unsigned would work on commonplace systems, but I think it's more more intuitive to say that shifting a number
which contains an infinite number of 1 bits followed by "1011" left by
three bits should yield an infinite number of 1 bits followed by "1011000"
than to say that one should (using 16 bits for numerical simplicity) add
65536 to convert -5 to an unsigned int (yielding 65531), multiply that by
8, subtracting 458752 to bring the result back in range (yielding 65496),
and then convert that to a signed value by subtacting 65536 (yielding
-40).

supe...@casperkitty.com

unread,

Sep 18, 2016, 4:58:47 PM9/18/16

to

On Sunday, September 18, 2016 at 3:10:56 PM UTC-5, Keith Thompson wrote:

> David Brown writes:
> > That would have my vote. (But I would not want signed integer overflow
> > to be defined behaviour.) Drop padding bits in types too, any character
> > sets that don't match ASCII for the first 128 characters,
>
> Padding bits might still be sensible in some cases. For example, there
> have been systems in the past, presumably optimized for floating-point,
> that represent integers as floating-point numbers with the exponent
> fixed at zero (or something like that). It's worth asking whether we
> can drop the idea of padding bits, but I'm not ready to assume that we
> can do so.

There are many tasks where padding bits would be a non-issue, but there
are others that can be performed more efficiently on systems which don't
use padding bits. There's no reason systems with padding bits should not
be able to perform in defined fashion those tasks for which they would
impose no impediment, but there is also no reason why tasks that could
benefit from a guaranteed lack of padding bits should not be performed
efficiently on padding-free systems.

James R. Kuyper

unread,

Sep 18, 2016, 5:04:35 PM9/18/16

to

On 09/18/2016 04:26 PM, luser droog wrote:
> On Sunday, September 18, 2016 at 9:20:08 AM UTC-5, James R. Kuyper wrote:

[Re: right shift of an int value]

...
>> If x is negative, the result of the shift operation is
>> implementation-defined. That's not as bad as undefined behavior, but
>> it's bad enough to be a problem.
>
> I've run into that with my toy lisp interpreter which tries
> to do bitfields arithmetically on signed integers.
>
> https://github.com/luser-dr00g/sexp.c/blob/4c8ccf05d45f36bba2e866736cc101d1d404f45c/sexp.c#L253
> assert((-1>>1)==-1); /*require 2's-complement and right-shift must be sign-preserving */
>
> Is there a better way to make this assertion?
> Can it (even theoretically) be checked at compile time
> to produce an #error?

Theoretically, an implementation's freedom to define the result when x
is negative includes defining (-1>>1)==-1, while defining (x>>1), for
some or all x<-1, to have a value different from the one you were
expecting. The only way to be certain is to check every value of x from
INT_MIN to -1. That could be problematic if INT_MIN ==
-9223372036854775807, which is true on some platforms.
The better solution is to avoid applying >> to negative values, using
some other method of calculating the quantity you want.

supe...@casperkitty.com

unread,

Sep 18, 2016, 5:15:15 PM9/18/16

to

On Sunday, September 18, 2016 at 3:10:56 PM UTC-5, Keith Thompson wrote:
> 64-bit int is perfectly sensible for 64-bit systems. The main
> reason, I think, that it's not more common is that it leaves a
> gap; if you have 8-bit char and 64-int, and short is the only type
> in between, then you can't have both 16-bit and 32-bit integers.
> Extended integer types, appropriate typedefs in <stdint.h>, should
> solve that, but I don't think any implementations have taken
> advantage of that (probably because of backward compatibility).

The most efficient behavior on many 64-bit systems would be to apply the
type "int" to 32-bit storage locations whose address is exposed to outside
code, but allow a compiler to substitute 64-bit variables when it sees fit.
As written, however, the Standard wouldn't allow that (even if some
variables were declared as int_least64_t, the implementation would be
required to pick the same size for all of them and document that choice).

There is a fair amount of code that expects that computations on a uint32_t
will wrap mod 4294967296, and making "int" larger than 32 bits will break
such code. The remedy I'd like to see would be for the Standard to define
fixed-sized types which have wrapping semantics independent of the size of
"int" in all cases where code compiles; had such types been added in C99 or
C11, porting 32-bit code to 64 bits would be a lot easier.

BartC

unread,

Sep 18, 2016, 6:15:14 PM9/18/16

to

On 18/09/2016 23:33, Keith Thompson wrote:

>> is not 8, 16 or 32, and int sizes that are not 16 or 32 bits.
>
> 64-bit int is perfectly sensible for 64-bit systems. The main
> reason, I think, that it's not more common is that it leaves a
> gap; if you have 8-bit char and 64-int, and short is the only type
> in between, then you can't have both 16-bit and 32-bit integers.
> Extended integer types, appropriate typedefs in <stdint.h>, should
> solve that, but I don't think any implementations have taken
> advantage of that (probably because of backward compatibility).

Short short?

'short' and 'long' might have been better implemented as relative (not
absolute) modifiers of 'int': respectively halving or doubling the
width, whatever that is, rather than allowing them to do nothing at the
whim of the implementation.

With 'short short' allowed, and either raising an error, or capping the
result /because it can't go any further/, when a lower or upper width
limit is reached. Eg:

Standard int: 16 32 64

char 8 8 8
short short int 8/? 8/? 16
short int 8/? 16 32
int 16 32 64
long int 32 64 64/128/?
long long int 32 64/128/? 64/256/?

And, perhaps, the possibility of allowing the standard int width to be set.

--
Bartc

Malcolm McLean

unread,

Sep 18, 2016, 6:29:59 PM9/18/16

to

On Sunday, September 18, 2016 at 7:36:50 PM UTC+1, Ben Bacarisse wrote:
> Malcolm McLean <malcolm...@btinternet.com> writes:
>
>

> > Data consists of reals, integers, strings and "connections" or
> > graph edges, which in programming we represent by addresses.
> > Most other data is made up of combinations of those four elements
> > (I don't know if anyone can provide a good counterexample of
> > data that isn't well so-described).
>
> Functions? Types? And since, presumably, "array of" is one on the
> combining operators, I'm not sure why you pick strings as being basic.
>

A function is of course data to an interpreter or a compiler. Some
languages allow functions to be treated as data objects and passed
about. Generally however you make a distinction between functions
and data. If you do genuinely operate on a function, as opposed to
doing dynamic run-time binding, you represent it somehow, normally
as a graph. That will be how Mathematica and Maple are built.

You can say that everything is atomically an array of bytes, or
even of bits. The question what do the bits mean. If we have the
string "Fred" we can say that the first byte means "the English
letter F", and that's useful. But you'll seldom get the data in
that way. No one-says, what's the first letter of your name? The
second. People, do say, "Fred, what's your age?", "your payroll
id?" and so on - these are the "fields".

Then actually "character" is a bit of an Anglo-centric concept,
"word" isn't. Virtually every language has a concept of the
"word", only ones ultimately influenced by the Phoenician
people have a concept of "letter". Words are better thought
of as a basic unit of language than characters.

supe...@casperkitty.com

unread,

Sep 18, 2016, 6:46:41 PM9/18/16

to

On Sunday, September 18, 2016 at 5:15:14 PM UTC-5, Bart wrote:
> 'short' and 'long' might have been better implemented as relative (not
> absolute) modifiers of 'int': respectively halving or doubling the
> width, whatever that is, rather than allowing them to do nothing at the
> whim of the implementation.

For a long time, the majority of C implementations defined integer types
consistently as:

char 8 bits
short 16 bits
int 16 or 32 bits
long 32 bits

I don't see how making the sizes relative would be an improvement on that.
It would be more useful to define a means by which a piece of source code
could indicate its expectations for integer types, with a requirement that
compilers must either process the code in accordance with its specifications
or indicate in a defined fashion that it cannot do so. If a program
indicates that it needs integer overflow to behave a certain way, then
within that program integer overflow would not cause Undefined Behavior.
Such a spec would likely prevent the program from being compiled on some
implementations, but having a program runs correctly on 10% of
implementations and be refused by 90% would be better than having it run
correctly, 100% of the time, on 99% of implementations, and run correctly
99% of the time on the remaining 1%.

Ben Bacarisse

unread,

Sep 18, 2016, 8:20:34 PM9/18/16

to

Malcolm McLean <malcolm...@btinternet.com> writes:

> On Sunday, September 18, 2016 at 7:36:50 PM UTC+1, Ben Bacarisse wrote:
>> Malcolm McLean <malcolm...@btinternet.com> writes:
>>
>>
>> > Data consists of reals, integers, strings and "connections" or
>> > graph edges, which in programming we represent by addresses.
>> > Most other data is made up of combinations of those four elements
>> > (I don't know if anyone can provide a good counterexample of
>> > data that isn't well so-described).
>>
>> Functions? Types? And since, presumably, "array of" is one on the
>> combining operators, I'm not sure why you pick strings as being basic.
>>
> A function is of course data to an interpreter or a compiler.

Obviously I meant as data in ordinary programs.

> Some
> languages allow functions to be treated as data objects and passed
> about.

And operated on and computed with.

> Generally however you make a distinction between functions
> and data. If you do genuinely operate on a function, as opposed to
> doing dynamic run-time binding, you represent it somehow, normally
> as a graph. That will be how Mathematica and Maple are built.

Ah, so if Mathematica represents integers or strings as lists, you will
remove them from your definition of basic data?

> You can say that everything is atomically an array of bytes, or
> even of bits. The question what do the bits mean.

Exactly. You've chosen a level of interpretation and nothing will shift
you from it. But, other interpretations are also possible. One of the
most useful is the model of computation provided by a programming
language. I prefer that one to drawing a line somewhere just above the
bits and saying "that's what data types are".

<snip>
--
Ben.

Ian Collins

unread,

Sep 19, 2016, 1:03:41 AM9/19/16

to

On 09/19/16 10:46 AM, supe...@casperkitty.com wrote:
> On Sunday, September 18, 2016 at 5:15:14 PM UTC-5, Bart wrote:
>> 'short' and 'long' might have been better implemented as relative (not
>> absolute) modifiers of 'int': respectively halving or doubling the
>> width, whatever that is, rather than allowing them to do nothing at the
>> whim of the implementation.
>
> For a long time, the majority of C implementations defined integer types
> consistently as:
>
> char 8 bits
> short 16 bits
> int 16 or 32 bits
> long 32 bits
>
> I don't see how making the sizes relative would be an improvement on that.

Most (all?) compilers for 32 targets still do.

> It would be more useful to define a means by which a piece of source code
> could indicate its expectations for integer types, with a requirement that
> compilers must either process the code in accordance with its specifications
> or indicate in a defined fashion that it cannot do so. If a program
> indicates that it needs integer overflow to behave a certain way, then
> within that program integer overflow would not cause Undefined Behavior.

It probably already can with the appropriate static assertions.

--
Ian

David Brown

unread,

Sep 19, 2016, 3:49:24 AM9/19/16

to

Such questions do need to be asked if the C standard were to drop
support for "obsolete" hardware for future versions. I am confident
that systems with padding bits are sufficiently obsolete and obscure
that they can be dropped - but I would definitely want the committee to
investigate such points properly, and not just take the word of some guy
off Usenet!

Perhaps the more significant change is that currently the C standards
document are each written to succeed and replace previous versions.
There is only one current valid ISO C standard - the version known as
C11. A "C compiler" made today has to follow that version of the
standard. If a future C standard dropped support for some older
hardware, then there would really need to be a fork in the standards -
so that C11 would still be a valid standard for old systems, with C20
(or whatever) going onward for modern ones.

>
>> CHAR_BIT that
>> is not 8, 16 or 32, and int sizes that are not 16 or 32 bits.
>
> 64-bit int is perfectly sensible for 64-bit systems. The main
> reason, I think, that it's not more common is that it leaves a
> gap; if you have 8-bit char and 64-int, and short is the only type
> in between, then you can't have both 16-bit and 32-bit integers.
> Extended integer types, appropriate typedefs in <stdint.h>, should
> solve that, but I don't think any implementations have taken
> advantage of that (probably because of backward compatibility).
>

No, 64-bit int is not a sensible idea (IMHO). Extended integer types
could solve some of the problems it would create, but at significant
effort to toolchain authors, library writers (someone has to make a
"printf" that supports the different sizes in some way), and significant
inconvenience in data exchange. It is data exchange (by network
protocols, file formats, etc.) that drives the need for standardisation
here.

And what is to be gained with 64-bit int? There is some hardware that
can operate more efficiently on a 64-bit object than a 32-bit object -
but such hardware is niche or obsolete (i.e., old Cray machines), and
therefore excluded. Is there any software benefit in having 64-bit int?
I think not - it is rare indeed that you actually need such a large
integer type, and C already has perfectly good ones that you can use in
that case (int64_t and long long int).

It is not a coincidence that the only systems with 64-bit int are
ancient, and pre-date the common usage of 64-bit processors.

Malcolm McLean

unread,

Sep 19, 2016, 5:37:51 AM9/19/16

to

On Monday, September 19, 2016 at 8:49:24 AM UTC+1, David Brown wrote:
> On 19/09/16 00:33, Keith Thompson wrote:
>
> No, 64-bit int is not a sensible idea (IMHO). Extended integer types
> could solve some of the problems it would create, but at significant
> effort to toolchain authors, library writers (someone has to make a
> "printf" that supports the different sizes in some way), and significant
> inconvenience in data exchange. It is data exchange (by network
> protocols, file formats, etc.) that drives the need for standardisation
> here.
>

Most printf formatting routines will work with 64 bit ints. The algorithm just
take modulus ten and then divided until the integer is exhausted. As for
data exchange, that should be independent of the representation the
machine happens to use. With real values it's a bit fiddly (see my IEEE 754
portability routines), but for binary integer exchange, just a case of
knowing the gotchas embedded in the standard.

>
> And what is to be gained with 64-bit int? There is some hardware that
> can operate more efficiently on a 64-bit object than a 32-bit object -
> but such hardware is niche or obsolete (i.e., old Cray machines), and
> therefore excluded. Is there any software benefit in having 64-bit int?
> I think not - it is rare indeed that you actually need such a large
> integer type, and C already has perfectly good ones that you can use in
> that case (int64_t and long long int).
>

You don't need any other type. If 64 bits are as efficient as 32 bits, there's no
reason to use 32 bit types in main core current memory (you might want to
compress to save memory in backing store or rarely accessed memory, but
cutting 64 bits down to 32 isn't a particularly effective compression, though
it's simple and fast).

David Brown

unread,

Sep 19, 2016, 7:20:30 AM9/19/16

to

On 19/09/16 11:37, Malcolm McLean wrote:
> On Monday, September 19, 2016 at 8:49:24 AM UTC+1, David Brown wrote:
>> On 19/09/16 00:33, Keith Thompson wrote:
>>
>> No, 64-bit int is not a sensible idea (IMHO). Extended integer types
>> could solve some of the problems it would create, but at significant
>> effort to toolchain authors, library writers (someone has to make a
>> "printf" that supports the different sizes in some way), and significant
>> inconvenience in data exchange. It is data exchange (by network
>> protocols, file formats, etc.) that drives the need for standardisation
>> here.
>>
> Most printf formatting routines will work with 64 bit ints. The algorithm just
> take modulus ten and then divided until the integer is exhausted. As for
> data exchange, that should be independent of the representation the
> machine happens to use. With real values it's a bit fiddly (see my IEEE 754
> portability routines), but for binary integer exchange, just a case of
> knowing the gotchas embedded in the standard.

That's not the issue. For supporting extended integer types well, you
are going to need new type specifiers for printf, as well as a way to
write literals in the new types, and so on. The size of int itself is a
minor point.

>>
>> And what is to be gained with 64-bit int? There is some hardware that
>> can operate more efficiently on a 64-bit object than a 32-bit object -
>> but such hardware is niche or obsolete (i.e., old Cray machines), and
>> therefore excluded. Is there any software benefit in having 64-bit int?
>> I think not - it is rare indeed that you actually need such a large
>> integer type, and C already has perfectly good ones that you can use in
>> that case (int64_t and long long int).
>>
> You don't need any other type. If 64 bits are as efficient as 32 bits, there's no
> reason to use 32 bit types in main core current memory (you might want to
> compress to save memory in backing store or rarely accessed memory, but
> cutting 64 bits down to 32 isn't a particularly effective compression, though
> it's simple and fast).

I don't know what world you live in, but in my experience programs need
to communicate - you have a great deal of data passing in and out.
Doing the conversion between file or network types of different sizes,
and internal 64 bit types, all at the boundary IO functions - that would
be massively inefficient. And while a modern desktop cpu can add two
64-bit ints as fast as it can add two 32-bit ints, it takes twice as
much memory bandwidth to load and store them, and twice as much cache
space to track them - it would be hugely inefficient.

Read the next paragraph again.

BartC

unread,

Sep 19, 2016, 7:47:02 AM9/19/16

to

On 19/09/2016 08:49, David Brown wrote:

> No, 64-bit int is not a sensible idea (IMHO). Extended integer types
> could solve some of the problems it would create, but at significant
> effort to toolchain authors, library writers (someone has to make a
> "printf" that supports the different sizes in some way), and significant
> inconvenience in data exchange. It is data exchange (by network
> protocols, file formats, etc.) that drives the need for standardisation
> here.
>
> And what is to be gained with 64-bit int? There is some hardware that
> can operate more efficiently on a 64-bit object than a 32-bit object -
> but such hardware is niche or obsolete (i.e., old Cray machines), and
> therefore excluded. Is there any software benefit in having 64-bit int?
> I think not - it is rare indeed that you actually need such a large
> integer type, and C already has perfectly good ones that you can use in
> that case (int64_t and long long int).

In my dynamic language I've made the integer type 64 bits only. That
saved having to deal with combinations of 32 and 64 bits, but didn't use
any more memory as the variant types always allowed for 64 bits anyway
(for doubles and pointers for example).

(For compact, packed storage, such as C-compatible structs and arrays,
it's possible to specifier narrower widths.)

But in the implementation, most int values are 32 bits. Including the
lengths of lists, arrays and strings.

In C however you don't seem to have that choice if you want to make use
of size_t for such lengths. On a 64-bit system size_t will itself be 64
bits. It's bad enough that pointers will be 64 bits even though most of
the time the entire task - code and data - can be addressed with 32 bits.

Anyway I agree that there is no real need for the default 'int' width
for a language like C to be bigger than 32 bits. When needed, you
specifically use something wider, although it would have been better if
'long' was actually always wider than 'int', as it means using the
unwieldy 'long long' instead, or int64_t.

--
bartc

Malcolm McLean

unread,

Sep 19, 2016, 9:22:31 AM9/19/16

to

My programs don't communicate all that much. They might get a mouse movement
from the user - maybe 32 bytes - each frame. They then need to generate a frame in
response to that, and pass it on to the graphics subsystem. You potentially have a big
IO bottleneck in that direction, but it's all wrapped in interfaces. It's well understood
that the system is meant to stand up to a frame refresh every video cycle.
In between, they are doing lots of mathematics to generate a frame, and that's where
the challenge lies - users expect reality simulations, but we can't simulate much
reality in 1/50th of a second, even on a modern processor.

David Brown

unread,

Sep 19, 2016, 10:20:27 AM9/19/16

to

On 19/09/16 13:46, BartC wrote:
> On 19/09/2016 08:49, David Brown wrote:
>
>> No, 64-bit int is not a sensible idea (IMHO). Extended integer types
>> could solve some of the problems it would create, but at significant
>> effort to toolchain authors, library writers (someone has to make a
>> "printf" that supports the different sizes in some way), and significant
>> inconvenience in data exchange. It is data exchange (by network
>> protocols, file formats, etc.) that drives the need for standardisation
>> here.
>>
>> And what is to be gained with 64-bit int? There is some hardware that
>> can operate more efficiently on a 64-bit object than a 32-bit object -
>> but such hardware is niche or obsolete (i.e., old Cray machines), and
>> therefore excluded. Is there any software benefit in having 64-bit int?
>> I think not - it is rare indeed that you actually need such a large
>> integer type, and C already has perfectly good ones that you can use in
>> that case (int64_t and long long int).
>
> In my dynamic language I've made the integer type 64 bits only. That
> saved having to deal with combinations of 32 and 64 bits, but didn't use
> any more memory as the variant types always allowed for 64 bits anyway
> (for doubles and pointers for example).
>

Tradeoffs are different for interpreted and dynamic languages - having a
single integer type keeps it simpler, and the performance costs of using
64-bit overall are negligible (except, as you point out, for
compatibility with external functions, or for large arrays).

> (For compact, packed storage, such as C-compatible structs and arrays,
> it's possible to specifier narrower widths.)
>
> But in the implementation, most int values are 32 bits. Including the
> lengths of lists, arrays and strings.
>
> In C however you don't seem to have that choice if you want to make use
> of size_t for such lengths. On a 64-bit system size_t will itself be 64
> bits. It's bad enough that pointers will be 64 bits even though most of
> the time the entire task - code and data - can be addressed with 32 bits.
>

You don't /have/ to use size_t for lengths or indexes - you can use int
or any other type you want for most of the time. The value will get
bumped up to a size_t before calling functions that take a size_t
parameter, but that's all.

It is always a risk to generalise, but I would expect that most uses of
"size_t" will be as short-lived local variables or as parameters or
return values for functions. In these cases, the variable will spend
all or most of its time in a register, where there is no disadvantage in
being 64 bits.

supe...@casperkitty.com

unread,

Sep 19, 2016, 10:34:20 AM9/19/16

to

On Monday, September 19, 2016 at 2:49:24 AM UTC-5, David Brown wrote:
> Perhaps the more significant change is that currently the C standards
> document are each written to succeed and replace previous versions.
> There is only one current valid ISO C standard - the version known as
> C11.

Such a notion makes it impossible to know whether any particular piece of
source text which has well-defined behavior today will continue to have
that same well-defined behavior in future. The only way a standard can be
meaningful is if C89, C99, and C11 continue to be regarded as standards
regardless of what the Committee says. If the Committee insists that those
aren't standards anymore, then the Committee should be ignored.

supe...@casperkitty.com

unread,

Sep 19, 2016, 10:41:25 AM9/19/16

to

On Monday, September 19, 2016 at 12:03:41 AM UTC-5, Ian Collins wrote:
> > It would be more useful to define a means by which a piece of source code
> > could indicate its expectations for integer types, with a requirement that
> > compilers must either process the code in accordance with its specifications
> > or indicate in a defined fashion that it cannot do so. If a program
> > indicates that it needs integer overflow to behave a certain way, then
> > within that program integer overflow would not cause Undefined Behavior.
>
> It probably already can with the appropriate static assertions.

What static assertion could a program use to indicate that it will only
work on implementations where signed overflow will never have any side-
effect other than yielding a value which may behave as though it has non-
deterministic "upper bits"? Or, for that matter, on implementations where
unsigned-to-signed conversions are always performed via two's-complement
mod reduction? One might not expect that an implementation would use
saturating conversion if a programs is compiled on a Tuesdays but mod
reduction if compiled on other days, but such a behavior would be allowable
by the Standard provided that it was described somewhere in nominally-
human-readable documentation.

Keith Thompson

unread,

Sep 19, 2016, 11:22:17 AM9/19/16

to

BartC <b...@freeuk.com> writes:
> On 18/09/2016 23:33, Keith Thompson wrote:
>>> is not 8, 16 or 32, and int sizes that are not 16 or 32 bits.
>>
>> 64-bit int is perfectly sensible for 64-bit systems. The main
>> reason, I think, that it's not more common is that it leaves a
>> gap; if you have 8-bit char and 64-int, and short is the only type
>> in between, then you can't have both 16-bit and 32-bit integers.
>> Extended integer types, appropriate typedefs in <stdint.h>, should
>> solve that, but I don't think any implementations have taken
>> advantage of that (probably because of backward compatibility).
>
> Short short?

No, I'm not suggesting that, and it's not at all necessary.

Extended integer types typically have names with the form of reserved
identifiers. An implementation with 8-bit char and 64-bit int might
have, say, 32-bit short and a 16-bit type named, for example, "__int16".
<stdint.h> could then have:

typedef signed char int8_t;
typedef __int16 int16_t;
typedef short int32_t;
typedef int int64_t;

(and so on for uintN_t, [u]int_leastN_t, et al).

If you were starting with C90 (which didn't have extended integer
types or <stdint.h>) and you wanted to permit 64-bit int while
still providing 8-bit, 16-bit, and 32-bit integer types, you might
reasonably consider adding "short short". But with the features
added in C99, that was no longer necessary.

One might argue that "long long" was equally unnecessary, and I
suppose that's true. But "long long" is in the language, and it's
not going away.

Keith Thompson

unread,

Sep 19, 2016, 11:25:23 AM9/19/16

to

supe...@casperkitty.com writes:
[...]

> What static assertion could a program use to indicate that it will only
> work on implementations where signed overflow will never have any side-
> effect other than yielding a value which may behave as though it has non-
> deterministic "upper bits"?

No such static assertion is possible in unextended standard C.

Since you're obviously fascinated with the idea of supporting
such assertions, why don't you write up a concrete proposal?
(You could post it on your blog.)

Keith Thompson

unread,

Sep 19, 2016, 11:56:55 AM9/19/16

to

David Brown <david...@hesbynett.no> writes:
[...]

> Perhaps the more significant change is that currently the C standards
> document are each written to succeed and replace previous versions.
> There is only one current valid ISO C standard - the version known as
> C11. A "C compiler" made today has to follow that version of the
> standard. If a future C standard dropped support for some older
> hardware, then there would really need to be a fork in the standards -
> so that C11 would still be a valid standard for old systems, with C20
> (or whatever) going onward for modern ones.

The previous editions of the standard are officially obsolete in the
eyes of ISO. That doesn't prevent anyone from using them. I'm not
convinced that a formal fork would be necessary. Implementations
could still claim C99 or C11 conformance.

[...]

> No, 64-bit int is not a sensible idea (IMHO). Extended integer types
> could solve some of the problems it would create, but at significant
> effort to toolchain authors, library writers (someone has to make a
> "printf" that supports the different sizes in some way), and significant
> inconvenience in data exchange. It is data exchange (by network
> protocols, file formats, etc.) that drives the need for standardisation
> here.

N1570 6.2.5p5:

A "plain" int object has the natural size suggested by the
architecture of the execution environment [...].

For a 64-bit target, surely 64 bits is "the natural size suggested by
the architecture".

> And what is to be gained with 64-bit int? There is some hardware that
> can operate more efficiently on a 64-bit object than a 32-bit object -
> but such hardware is niche or obsolete (i.e., old Cray machines), and
> therefore excluded. Is there any software benefit in having 64-bit int?
> I think not - it is rare indeed that you actually need such a large
> integer type, and C already has perfectly good ones that you can use in
> that case (int64_t and long long int).

I haven't looked into the relative performance of 32-bit and 64-bit
arithmetic on modern 64-bit systems. It seems plausible (but is quite
possibly wrong) that an N-bit system will perform most efficiently on
N-bit data.

On the other hand, I'm sure designers have expended some effort to make
32-bit arithmetic efficient -- largely *because* C implementations
typically have 32-bit int.

> It is not a coincidence that the only systems with 64-bit int are
> ancient, and pre-date the common usage of 64-bit processors.

Are there *no* modern 64-bit processors on which 64-bit arithmetic is
faster than 32-bit arithmetic? Can you be sure there will be no such
processors in the future?

supe...@casperkitty.com

unread,

Sep 19, 2016, 12:10:58 PM9/19/16

to

On Monday, September 19, 2016 at 10:56:55 AM UTC-5, Keith Thompson wrote:
> Are there *no* modern 64-bit processors on which 64-bit arithmetic is
> faster than 32-bit arithmetic? Can you be sure there will be no such
> processors in the future?

There exist processors where 64-bit *arithmetic* is faster than 32-bit
arithmetic. I would consider it unlikely that there will ever exist
general-purpose platforms where large arrays of 64-bit values can be
processed as efficiently as arrays of 32-bit values, since the latter
will use memory bandwidth and cache capacity twice as effectively as
the former, and those are the primary factors limiting speed in many
applications. Unless a platform is intended to spend most of its time
manipulating values that are larger than 32 bits, the performance gains
from allowing 32-bit reads and writes will be sufficient to justify the
relatively slight cost.

Keith Thompson

unread,

Sep 19, 2016, 12:25:17 PM9/19/16

to

Keith Thompson <ks...@mib.org> writes:
[...]

> I haven't looked into the relative performance of 32-bit and 64-bit
> arithmetic on modern 64-bit systems. It seems plausible (but is quite
> possibly wrong) that an N-bit system will perform most efficiently on
> N-bit data.

[...]

One data point of questionable significance: the GNU C library
for x86_64 uses 64 bits for int_fast16_t and int_fast32_t.
(The <stdint.h> header is provided by glibc, not by gcc, and it
does this if __WORDSIZE == 64.)

Malcolm McLean

unread,

Sep 19, 2016, 12:33:57 PM9/19/16

to

But not much data consists of large arrays of integers. I'm not saying you can't
find examples. But data has to represent something, and the world is
continuous, so typically you have a few counts, and then large numbers
of real values. Which are often in a special format (e.g. an image has width and
height, integers, then width * height pixel values, which are continuous,
but usually represented on 0-1. Audio consists of a sample rate, integer,
and a count of sample, integer, then usually millions of wave samples, which
are continuous, and usually either represented as 32 bit floats or scaled on
-32767 to 32767 and stored in sixteen bits).

But I can't think of a common desktop file format that consists mostly of
massive arrays of integral numbers.

David Brown

unread,

Sep 19, 2016, 4:03:43 PM9/19/16

to

Of curiosity, do you know of any implementations that actually have
extended integer types? gcc has __int128_t on some targets (typically
64-bit targets), but it is not an extended integer type. I can't
remember exactly why it does not qualify as an extended integer type
(there is no way to express constants of that type, unless it happens to
match long long, but I don't know if that is an issue for qualifying as
an extended integer type).

BartC

unread,

Sep 19, 2016, 4:12:43 PM9/19/16

to

On 19/09/2016 18:51, Keith Thompson wrote:
> BartC <b...@freeuk.com> writes:

>> Short short?
>
> No, I'm not suggesting that, and it's not at all necessary.
>
> Extended integer types typically have names with the form of reserved
> identifiers. An implementation with 8-bit char and 64-bit int might
> have, say, 32-bit short and a 16-bit type named, for example, "__int16".
> <stdint.h> could then have:
>
> typedef signed char int8_t;
> typedef __int16 int16_t;
> typedef short int32_t;
> typedef int int64_t;
>
> (and so on for uintN_t, [u]int_leastN_t, et al).
>
> If you were starting with C90 (which didn't have extended integer
> types or <stdint.h>) and you wanted to permit 64-bit int while
> still providing 8-bit, 16-bit, and 32-bit integer types, you might
> reasonably consider adding "short short". But with the features
> added in C99, that was no longer necessary.

The C99 features aren't quite the same thing. Sometimes you don't want
to specify hard integer widths. You just want an 'int', whatever width
that happens to be, and also have a wider int and a narrower one. (For
example, what type do you need to represent the least or most
significant half of an int?)

The C99 intN_t types also introduce some problems: what printf format do
you use for int32_t for example, and which for int64_t?

With the non-specific types, you know that %d, %ld and %lld correspond
to int, long int, and long long int respectively (even if there are only
two different widths rather than tree. Or just one).

--
bartc

David Brown

unread,

Sep 19, 2016, 4:12:53 PM9/19/16

to

I don't think "natural size" is well enough defined here for this to be
determined. Certainly 64-bit ints is a reasonable match according to
the standards - but (baring things like Crays) the same cpus can work
equally "naturally" with 32-bit values or 16-bit values (or even 8-bit
values). 32-bit seems to be a value that works very well in practice.

>
>> And what is to be gained with 64-bit int? There is some hardware that
>> can operate more efficiently on a 64-bit object than a 32-bit object -
>> but such hardware is niche or obsolete (i.e., old Cray machines), and
>> therefore excluded. Is there any software benefit in having 64-bit int?
>> I think not - it is rare indeed that you actually need such a large
>> integer type, and C already has perfectly good ones that you can use in
>> that case (int64_t and long long int).
>
> I haven't looked into the relative performance of 32-bit and 64-bit
> arithmetic on modern 64-bit systems. It seems plausible (but is quite
> possibly wrong) that an N-bit system will perform most efficiently on
> N-bit data.

It is wrong, for most systems, at least for most purposes. Most cpus
can handle arithmetic and other operations between registers equally
well with all sizes up to their bit width. Loads and stores may be more
efficient with certain sizes - usually smaller sizes means less memory
bandwidth and more efficient use of caches. Some operations, such as
multiplication and in particular division, may be slower for bigger sizes.

There are some 32-bit architectures that are a little slower for some
8-bit and 16-bit operations - such as the DEC alpha (I think) and older
ARM designs. I'd class these devices as obsolete, however.

>
> On the other hand, I'm sure designers have expended some effort to make
> 32-bit arithmetic efficient -- largely *because* C implementations
> typically have 32-bit int.

It's a chicken and egg situation - most cpus are designed with a view to
running C efficiently.

>
>> It is not a coincidence that the only systems with 64-bit int are
>> ancient, and pre-date the common usage of 64-bit processors.
>
> Are there *no* modern 64-bit processors on which 64-bit arithmetic is
> faster than 32-bit arithmetic? Can you be sure there will be no such
> processors in the future?
>

There are none that I know of - but I don't know them all, and I
certainly don't know so much about future processors!

Keith Thompson

unread,

Sep 19, 2016, 4:13:39 PM9/19/16

to

David Brown <david...@hesbynett.no> writes:
[...]

> Of curiosity, do you know of any implementations that actually have
> extended integer types?

No.

> gcc has __int128_t on some targets (typically
> 64-bit targets), but it is not an extended integer type. I can't
> remember exactly why it does not qualify as an extended integer type
> (there is no way to express constants of that type, unless it happens to
> match long long, but I don't know if that is an issue for qualifying as
> an extended integer type).

On a system with 64-bit long long, an extended 128-bit integer type,
if it exists in both signed and unsigned types, would require
[u]intmax_t to be 128 bits. gcc makes [u]intmax_t 64 bits.
Supposedly the x86_64 ABI requires [u]intmax_t to be 64 bits.

Making [u]intmax_t would require coordination between the compiler
and the standard library, as well as among compilers that expect to
generate code passing values of those types to functions compiled
by other compilers.

David Brown

unread,

Sep 19, 2016, 4:15:05 PM9/19/16

to

On 19/09/16 20:54, Keith Thompson wrote:
> Keith Thompson <ks...@mib.org> writes:
> [...]
>> I haven't looked into the relative performance of 32-bit and 64-bit
>> arithmetic on modern 64-bit systems. It seems plausible (but is quite
>> possibly wrong) that an N-bit system will perform most efficiently on
>> N-bit data.
> [...]
>
> One data point of questionable significance: the GNU C library
> for x86_64 uses 64 bits for int_fast16_t and int_fast32_t.
> (The <stdint.h> header is provided by glibc, not by gcc, and it
> does this if __WORDSIZE == 64.)
>

That's interesting - I should probably have read this post before making
my previous post!

We can certainly conclude that we would want the C standards folks to
investigate thoroughly before making any decisions.

Keith Thompson

unread,

Sep 19, 2016, 4:27:50 PM9/19/16

to

BartC <b...@freeuk.com> writes:
> On 19/09/2016 18:51, Keith Thompson wrote:
>> BartC <b...@freeuk.com> writes:
>
>>> Short short?
>>
>> No, I'm not suggesting that, and it's not at all necessary.
>>
>> Extended integer types typically have names with the form of reserved
>> identifiers. An implementation with 8-bit char and 64-bit int might
>> have, say, 32-bit short and a 16-bit type named, for example, "__int16".
>> <stdint.h> could then have:
>>
>> typedef signed char int8_t;
>> typedef __int16 int16_t;
>> typedef short int32_t;
>> typedef int int64_t;
>>
>> (and so on for uintN_t, [u]int_leastN_t, et al).
>>
>> If you were starting with C90 (which didn't have extended integer
>> types or <stdint.h>) and you wanted to permit 64-bit int while
>> still providing 8-bit, 16-bit, and 32-bit integer types, you might
>> reasonably consider adding "short short". But with the features
>> added in C99, that was no longer necessary.
>
> The C99 features aren't quite the same thing. Sometimes you don't want
> to specify hard integer widths. You just want an 'int', whatever width
> that happens to be, and also have a wider int and a narrower one. (For
> example, what type do you need to represent the least or most
> significant half of an int?)

C has no direct way to specify a type half as wide as int, and it never
has. K&R1, published in 1978, says that the implementation for the
Honeywell 6000 use 36 bits for short, int, and long.

The intent is that short and long should provide different lengths
of integers where; practical; int will normally reflect the most
"natural" size for a particular machine. As you can see, each
compiler is free to interpret short and long as appropriate for its
own hardware. About all you should count on is that short is no
longer than long.

That last sentence understates things a bit; you could also safely
assume that short is no longer than int, and that int is no longer than
long.

> The C99 intN_t types also introduce some problems: what printf format do
> you use for int32_t for example, and which for int64_t?

"%" PRId32 for int32_t, "%" PRId64 for int64_t. The macros are defined in
<inttypes.h>.

#include <inttypes.h>
#include <stdio.h>
int main(void) {
int32_t x = INT32_MAX;
int64_t y = INT64_MAX;
printf("x = %" PRId32 ", y = %" PRId64 "\n", x, y);
}

Or if that's too ugly for you, you can safely convert an int32_t value
to long and use "%ld", or an int64_t value to long long and use "%lld".

#include <inttypes.h>
#include <stdio.h>
int main(void) {
int32_t x = INT32_MAX;
int64_t y = INT64_MAX;
printf("x = %ld, y = %lld\n", (long)x, (long long)y);

}

> With the non-specific types, you know that %d, %ld and %lld correspond
> to int, long int, and long long int respectively (even if there are only
> two different widths rather than tree. Or just one).

Yes.

supe...@casperkitty.com

unread,

Sep 19, 2016, 5:32:39 PM9/19/16

to

On Monday, September 19, 2016 at 3:12:53 PM UTC-5, David Brown wrote:
> There are some 32-bit architectures that are a little slower for some
> 8-bit and 16-bit operations - such as the DEC alpha (I think) and older
> ARM designs. I'd class these devices as obsolete, however.

The 32-bit ARM processors have no operations other than loads and stores
that operate on values smaller than 32 bits, and they're hardly obsolete.
If anything, it is the idea that processors should operate on types smaller
than their natural word size (except for loads and stores) which is obsolete.

The DEC Alpha had no operations smaller than 64 bits, other than 32-bit
loads and stores. Updating an 8-bit or 16-bit value in memory would
require a read-modify-write sequence.

When using a decent compiler, operations on values smaller than 32 bits
should generally not be much slower than operations on 32-bit values,
since the processor need not clip off the upper bits except in cases where
their values would actually matter. Given, e.g.

int32_t test(void)
{
int16_t sum=0;
for (int i=0; i<100000; i++)
sum+=someFunction();
return sum;
}

a compiler would be required to generate a sign-extension instruction
to clip the "sum" value to 16 bits (note that storing an out-of-range
value to an "int16_t" is required to either yield an Implementation-
Defined value or *consistently* raise an implementation-defined signal;
it's not UB) but there's no need to perform the truncation on every
pass through the loop. The compiler could compute the sum as a 32-bit
value without generating any code to truncate it at all during the loop,
and simply truncate the value at the end. The 32-bit register used to
compute the sum might overflow, but if the compiler doesn't try to use
that as an excuse to perform UB-based optimization the overflow in the
upper bits shouldn't matter since anything above the bottom 16 bits will
be ignored anyhow.

Robert Wessel

unread,

Sep 19, 2016, 11:58:20 PM9/19/16

to

On Mon, 19 Sep 2016 14:32:29 -0700 (PDT), supe...@casperkitty.com
wrote:

>On Monday, September 19, 2016 at 3:12:53 PM UTC-5, David Brown wrote:
>> There are some 32-bit architectures that are a little slower for some
>> 8-bit and 16-bit operations - such as the DEC alpha (I think) and older
>> ARM designs. I'd class these devices as obsolete, however.
>
>The 32-bit ARM processors have no operations other than loads and stores
>that operate on values smaller than 32 bits, and they're hardly obsolete.
>If anything, it is the idea that processors should operate on types smaller
>than their natural word size (except for loads and stores) which is obsolete.
>
>The DEC Alpha had no operations smaller than 64 bits, other than 32-bit
>loads and stores. Updating an 8-bit or 16-bit value in memory would
>require a read-modify-write sequence.

The omission of byte loads and stores was (fairly) quickly recognized
as an error, and Alpha added byte loads and stores in the second
iteration. ARM has the group of Load and Store Word or Unsigned Byte
instructions, in addition to a number of other instructions that
reference memory (Swap Byte, Store with Translation).

Malcolm McLean

unread,

Sep 20, 2016, 1:36:33 AM9/20/16

to

On Monday, September 19, 2016 at 9:12:53 PM UTC+1, David Brown wrote:
>
> > For a 64-bit target, surely 64 bits is "the natural size suggested by
> > the architecture".
>
> I don't think "natural size" is well enough defined here for this to be
> determined. Certainly 64-bit ints is a reasonable match according to
> the standards - but (baring things like Crays) the same cpus can work
> equally "naturally" with 32-bit values or 16-bit values (or even 8-bit
> values). 32-bit seems to be a value that works very well in practice.
>

int needs to be able to index a normal array, but not necessarily
one set up in subsidiary mass memory of some description.

David Brown

unread,

Sep 20, 2016, 3:02:22 AM9/20/16

to

On 20/09/16 00:44, Keith Thompson wrote:
> David Brown <david...@hesbynett.no> writes:
> [...]
>> Of curiosity, do you know of any implementations that actually have
>> extended integer types?
>
> No.
>
>> gcc has __int128_t on some targets (typically
>> 64-bit targets), but it is not an extended integer type. I can't
>> remember exactly why it does not qualify as an extended integer type
>> (there is no way to express constants of that type, unless it happens to
>> match long long, but I don't know if that is an issue for qualifying as
>> an extended integer type).
>
> On a system with 64-bit long long, an extended 128-bit integer type,
> if it exists in both signed and unsigned types, would require
> [u]intmax_t to be 128 bits. gcc makes [u]intmax_t 64 bits.
> Supposedly the x86_64 ABI requires [u]intmax_t to be 64 bits.
>
> Making [u]intmax_t would require coordination between the compiler
> and the standard library, as well as among compilers that expect to
> generate code passing values of those types to functions compiled
> by other compilers.
>

OK.

This would clearly not be a problem for making /smaller/ extended
integer types.

David Brown

unread,

Sep 20, 2016, 3:45:18 AM9/20/16

to

On 19/09/16 23:32, supe...@casperkitty.com wrote:
> On Monday, September 19, 2016 at 3:12:53 PM UTC-5, David Brown wrote:
>> There are some 32-bit architectures that are a little slower for some
>> 8-bit and 16-bit operations - such as the DEC alpha (I think) and older
>> ARM designs. I'd class these devices as obsolete, however.
>
> The 32-bit ARM processors have no operations other than loads and stores
> that operate on values smaller than 32 bits, and they're hardly obsolete.

The older and (IMHO obsolete) ARM devices I was referring to have no
operations for loading or storing 8-bit or 16-bit values. So storing a
value smaller than 32-bit was slow - the processor would load in the
previous 32-bit block, use masking and shifting to insert the new 8 or
16 bit value, then store the 32-bit block again.

Modern ARMs are quite happy with operations on smaller values. The
reason they don't have "8-bit add" instructions is that they are not
needed - they only have 8-bit and 16-bit instructions when such
instructions are different from the 32-bit instructions. (There would
be a difference in the way some flags are updated, but that is usually
irrelevant, and it is not worth the cost in instruction code space.)

Where smaller size matters, ARMs support them. They even have
instructions for treating a 32-bit register as a pair of 16-bit values
or a quad of 8-bit values that can be added to another pair or quad as a
single operation.

> If anything, it is the idea that processors should operate on types smaller
> than their natural word size (except for loads and stores) which is obsolete.

Modern processors and modern compilers work together to give you
efficient code regardless of the size of the operand.

supe...@casperkitty.com

unread,

Sep 20, 2016, 10:49:24 AM9/20/16

to

On Tuesday, September 20, 2016 at 2:45:18 AM UTC-5, David Brown wrote:
> Modern ARMs are quite happy with operations on smaller values. The
> reason they don't have "8-bit add" instructions is that they are not
> needed - they only have 8-bit and 16-bit instructions when such
> instructions are different from the 32-bit instructions. (There would
> be a difference in the way some flags are updated, but that is usually
> irrelevant, and it is not worth the cost in instruction code space.)

On a 32-bit two's-complement system, given "int16_t x,y,z;" the behavior
of x=y+z; will be defined in cases where the result would overflow a 16-bit
value, and the lack of 16-bit "add" instruction would require the addition
of an extra instruction (or, on the ARM7-TDMI, two instructions) to yield
the Standard-mandated result. Using "int32_t x,y,z;" would be faster. If
values will be stored in RAM rather than registers, however, using int16_t
will be just as fast as int32_t, and if caching is involved may be faster.

What would be optimal for efficiency would be a type that allowed a compiler
to arbitrarily use 16 or 32 bits at its convenience based upon context, but
the C Standard does not allow for such types.

David Brown

unread,

Sep 20, 2016, 1:43:43 PM9/20/16

to

On 20/09/16 16:49, supe...@casperkitty.com wrote:
> On Tuesday, September 20, 2016 at 2:45:18 AM UTC-5, David Brown wrote:
>> Modern ARMs are quite happy with operations on smaller values. The
>> reason they don't have "8-bit add" instructions is that they are not
>> needed - they only have 8-bit and 16-bit instructions when such
>> instructions are different from the 32-bit instructions. (There would
>> be a difference in the way some flags are updated, but that is usually
>> irrelevant, and it is not worth the cost in instruction code space.)
>
> On a 32-bit two's-complement system, given "int16_t x,y,z;" the behavior
> of x=y+z; will be defined in cases where the result would overflow a 16-bit
> value, and the lack of 16-bit "add" instruction would require the addition
> of an extra instruction (or, on the ARM7-TDMI, two instructions) to yield
> the Standard-mandated result.

The standard does not mandate behaviour here - the conversion of an
int32_t to an int16_t is implementation defined (if the value does not fit).

Sometimes this will mean an extra "truncate" instruction is needed. But
it is rare - or at least, it is rare that it will have a significant
effect. One does not often write code that does nothing but add a
couple of numbers. And if the result is going to be stored in memory,
the truncation happens automatically as part of the store instruction.

> Using "int32_t x,y,z;" would be faster. If
> values will be stored in RAM rather than registers, however, using int16_t
> will be just as fast as int32_t, and if caching is involved may be faster.
>
> What would be optimal for efficiency would be a type that allowed a compiler
> to arbitrarily use 16 or 32 bits at its convenience based upon context, but
> the C Standard does not allow for such types.
>

Yes, the C standard allows the compiler the freedom to use whatever type
suits it - as long as the observable behaviour matches the behaviour
given in the code. It is unlikely that a compiler will choose to store
a value as 16-bit in memory when the code says 32-bit, but it is
conceivable. It is certainly possible for the compiler to work with
32-bit values internally even though the code says 16-bit, as long as
the results are correct.

supe...@casperkitty.com

unread,

Sep 20, 2016, 2:16:08 PM9/20/16

to

On Tuesday, September 20, 2016 at 12:43:43 PM UTC-5, David Brown wrote:

> On 20/09/16 16:49, supercat wrote:
> The standard does not mandate behaviour here - the conversion of an
> int32_t to an int16_t is implementation defined (if the value does not fit).

The Standard mandates that an implementation must compute a value in
consistent fashion or else signal in consistent fashion. The cheapest
consistent fashion on many ARM chips would be sign extension.

> Sometimes this will mean an extra "truncate" instruction is needed. But
> it is rare - or at least, it is rare that it will have a significant
> effect. One does not often write code that does nothing but add a
> couple of numbers. And if the result is going to be stored in memory,
> the truncation happens automatically as part of the store instruction.

The straightforward code-generation cost isn't huge, so the payoff from
eliminating it wouldn't be huge either, but it should be relatively low
hanging fruit. Since compiler writers are pushing aggressive optimizers
to pursue optimizations with even smaller payoffs, I would think that
the payoff should be worth pursuing.