On Sunday, September 18, 2016 at 9:27:52 AM UTC-5, Keith Thompson wrote:
> Too complicated. If non-two's-complement systems are really obsolete,
> why not just have the next C standard require two's-complement? If
> there are any remaining one's-complement or sign-and-magnitude systems,
> let them claim conformance to the old C11 standard, or just let them be
> non-conforming.
There are many tasks which can be performed most efficiently using features
or guarantees that could be cheaply supported by most platforms. If a
using some feature would make 5% of programs more efficient than they would
be without it, but only 95% of implementations can support it, which be
most helpful:
1. Say that the Standard will be inapplicable to those programs unless they
are written inefficiently.
2. Say that the Standard will be inapplicable to the implementations that
can't support that feature or guarantee.
3. Say that the Standard will describe the behavior of the programs that use
that feature or guarantee on the 95% of platforms where it is available,
while also describing the behavior of the 95% of programs that have no
need for the feature on 100% of platforms, including those that don't
support the feature.
I would think #3 would be the most useful course of action, and that it's
what the authors of the Standard intended, but they didn't imagine that
compiler writers targeting the x86 architecture in the 21st century would
not want to support the same behaviors as compiler writers for that platform
were supporting in 1987 (the present x86 architecture debuted with the 80386
in 1985).
> That's assuming we can make that assumption. It's likely that there's
> no good reason to use anything other than two's-complement, but I'm not
> 100% confident that that will never change. This requires input from
> people who know more about this stuff than I do.
Any platform which can perform Standard-compliant unsigned math efficiently
will perform two's-complement math reasonably efficiently. The only
platforms which can perform other forms of integer math significantly more
efficiently than two's-complement are those where either:
1. Compatibility with other systems compels the use of other formats,
something which isn't likely to happen.
2. Unsigned math is significantly less efficient than signed math because
it has to be "emulated" using sequences of signed-math instructions.
There are some implementations where unsigned math is less efficient than
signed math, but that's mainly because C doesn't have any unsigned types
that don't require a compiler to truncate values deterministically after
each computation.
> And of course two's-complement representation doesn't necessarily imply
> the usual two's complement behavior in all cases; it tells us how
> negative integers are represented, but it doesn't tell us the result of
> evaluating INT_MAX+1 or -5<<3. That's another set of assumptions that
> might or might not be sensible to make in a hypothetical new standard.
There are many cases where a compiler could reap some huge advantages to
be able to treat integers as though they promote to arbitrary larger types
whose upper bits may be non-deterministically retained or dropped whenever
the compiler sees fit. There are also cases where it may be useful to have
a compiler guarantee that it will trap in implementation-defined fashion
in cases where arithmetic overflow occurs that might affect a computed
result, but not necessarily in cases where any lost precision would end up
being ignored anyway. In cases where the above behaviors would meet an
application's requirements, a compiler that could guarantee to uphold them
would be capable of meeting application requirements more efficiently than
one which couldn't (if the result of a particular computation is sometimes
ignored, a compiler that recognized those cases could omit not just the
overflow check but the entire computation; if user code checks for and
traps on overflows, however, a compiler could would be required to perform
the checks even for the otherwise-irrelevant computations.
> C already explicitly allows the two's-complement representation with
> sign bit 1 and all value bits 0 to be a trap representation. As for
> -0x7FFFFFFF00000000, I'm skeptical that any such systems exist or are
> worth considering.
If a 32-bit system recognizes 0x80000000 as a NaN, it could easily
recognize any 64-bit number whose upper word is 0x80000000 as a NaN,
and uphold NaN semantics. If the system needs to add 0x123456789 to
0x7FFFFFFFFFFFFFFF, it could process the lower word (yielding 0x23456788
plus a carry), and then the upper word (yielding an overflow, and thus
a 0x80000000 NaN). If the only valid NaN form were 0x8000000000000000
it would be necessary to revisit the lower word after each computation
to ensure that it was zeroed if the upper portion of the number yielded
a NaN result.
Some applications require that the usable range of integer values extend
all the way from -2^63-1 to +2^63-1, but most applications don't need the
entire range. Being able to detect that overflows have occurred would be
a useful feature, and including integer NaN support in future architectures
would not be hard, but unless a language would allow implementations to
expose it to a programmer in useful fashion there wouldn't be much point.
> > How would you multiply a signed number by 2^N in such a fashion as to
> > yield defined behavior in all cases where the result would be
> > representable, without relying upon implementation-defined behavior?
>
> I'd probably use the "*" operator, for example -5 * (1<<3).
That operator will yield UB in cases where << would not, and I haven't seen
any nice expressions which don't rely upon implementation-defined
conversions from unsigned to signed, and which don't fail for either -1<<(#
of value bits) or for INT_MIN<<0, both of which are perfectly defined on
two's-complement systems. Conversion to unsigned would work on commonplace systems, but I think it's more more intuitive to say that shifting a number
which contains an infinite number of 1 bits followed by "1011" left by
three bits should yield an infinite number of 1 bits followed by "1011000"
than to say that one should (using 16 bits for numerical simplicity) add
65536 to convert -5 to an unsigned int (yielding 65531), multiply that by
8, subtracting 458752 to bring the result back in range (yielding 65496),
and then convert that to a signed value by subtacting 65536 (yielding
-40).