Fixing left shift of negative numbers

4,464 views
Skip to first unread message

Myriachan

unread,
Jul 27, 2014, 10:25:59 PM7/27/14
to std-pr...@isocpp.org
It seems really broken to me that left shift of negative numbers is completely undefined.  I don't see why this is necessary at all.  It goes against everything my whole generation of programmers was taught, especially in college classes like "discrete mathematics".

I think x << y for negative x should work like this:  If the shift as implemented as multiplies by 2 does not cause a change in the sign bit, the result is defined; otherwise it is undefined.  This definition should work for sign-magnitude, one's-complement and two's-complement implementations.

Only sign-magnitude implementations would have some difficulty on some hardware; depending on the hardware design, they may have to mask off the high bit as they do the shift in order to preserve its value during the shift if the hardware shift instruction will shift "through" the sign bit.  Sign-magnitude implementations are rare these days, so really, this difference ought not be such a huge deal that we need to make two's-complement and one's-complement implementations suffer some annoying math rules that only some small percentage of C/C++ programmers even know.  (Even though shifting left a negative number is as bad as dividing by zero, very few among C/C++ users know that this is a fatal operation with an optimizing compiler!)

Melissa

Thiago Macieira

unread,
Jul 28, 2014, 12:28:22 AM7/28/14
to std-pr...@isocpp.org
On Sunday 27 July 2014 19:25:59 Myriachan wrote:
> I think x << y for negative x should work like this: If the shift as
> implemented as multiplies by 2 does not cause a change in the sign bit, the
> result is defined; otherwise it is undefined. This definition should work
> for sign-magnitude, one's-complement and two's-complement implementations.

How does this change anything?

Since a left shifting of a negative number may change the sign in one's and
two's complement (depending on the second-to-last bit), the result is undefined
as per your rule.

So it's undefined for most people.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358

David Krauss

unread,
Jul 28, 2014, 12:31:38 AM7/28/14
to std-pr...@isocpp.org

On 2014–07–28, at 12:28 PM, Thiago Macieira <thi...@macieira.org> wrote:

> On Sunday 27 July 2014 19:25:59 Myriachan wrote:
>> I think x << y for negative x should work like this: If the shift as
>> implemented as multiplies by 2 does not cause a change in the sign bit, the
>> result is defined; otherwise it is undefined. This definition should work
>> for sign-magnitude, one's-complement and two's-complement implementations.
>
> How does this change anything?
>
> Since a left shifting of a negative number may change the sign in one's and
> two's complement (depending on the second-to-last bit), the result is undefined
> as per your rule.
>
> So it's undefined for most people.

That’s true of positive numbers too. On a two’s complement machine, signed and unsigned left shifts are the same operation. The signed overflow bit is defined the same way for positive and negative operands.

Myriachan

unread,
Jul 28, 2014, 1:12:17 PM7/28/14
to std-pr...@isocpp.org
I'm not saying it's undefined if it *may* change the sign bit, I'm saying it's undefined if it *does* change the sign bit. This means that it will be defined unless you overflow or underflow the result. The end definition thus makes left shift an identical operation for signed and unsigned integers in one's-complement and two's complement representation so long as overflow/underflow does not occur; only sign-magnitude changes.

Note that for purposes of undefinition, the result is undefined if the sign bit flips at any multiply by 2 along the way, even if the end result would have the same sign bit.

Finally, I think that if an implementation is configured such that it may set <int>::is_modulo to true, regardless if whether it actually does, such as in GCC and Clang -fwrapv, then overflow/underflow should be defined as (int)((unsigned)x << y). is_modulo's definition would be updated to include <<.

Thiago Macieira

unread,
Jul 28, 2014, 3:55:32 PM7/28/14
to std-pr...@isocpp.org
On Monday 28 July 2014 10:12:17 Myriachan wrote:
> I'm not saying it's undefined if it *may* change the sign bit, I'm saying
> it's undefined if it *does* change the sign bit. This means that it will
> be defined unless you overflow or underflow the result. The end definition
> thus makes left shift an identical operation for signed and unsigned
> integers in one's-complement and two's complement representation so long as
> overflow/underflow does not occur; only sign-magnitude changes.

How do you define whether it changes the sign bit? Sounds to me like you're
asking for it to be "defined if it worked, undefined if it didn't", which sounds
a lot like "it's undefined". If you can't give me the rules for when it does
work reliably, what's the point?

> Note that for purposes of undefinition, the result is undefined if the sign
> bit flips at any multiply by 2 along the way, even if the end result would
> have the same sign bit.

Signed integers have UB overflow so you can't define it like that.

> Finally, I think that if an implementation is configured such that it may
> set <int>::is_modulo to true, regardless if whether it actually does, such
> as in GCC and Clang -fwrapv, then overflow/underflow should be defined as
> (int)((unsigned)x << y). is_modulo's definition would be updated to
> include <<.

David Krauss

unread,
Jul 28, 2014, 7:05:47 PM7/28/14
to std-pr...@isocpp.org
On 2014–07–29, at 3:55 AM, Thiago Macieira <thi...@macieira.org> wrote:

How do you define whether it changes the sign bit? Sounds to me like you're
asking for it to be "defined if it worked, undefined if it didn't", which sounds
a lot like "it's undefined". If you can't give me the rules for when it does
work reliably, what's the point?

The spec doesn’t need to talk about a sign bit. It could only say the value is implementation-defined, as it already does for right shifts of negative values.

Like -8>>3, -1<<3 “works” on all two’s-complement and most sign-magnitude machines. Only the tiny minority of one’s-complement machines are the exception. And it’s probably been a looong time since anyone actually built a C++ compiler for such.

A quick check shows that GCC does not flag -1<<3 as UB in a constant expression. It does flag -8<<30.

Myriachan

unread,
Jul 28, 2014, 7:17:00 PM7/28/14
to std-pr...@isocpp.org


On Monday, July 28, 2014 12:55:32 PM UTC-7, Thiago Macieira wrote:
On Monday 28 July 2014 10:12:17 Myriachan wrote:
> I'm not saying it's undefined if it *may* change the sign bit, I'm saying
> it's undefined if it *does* change the sign bit.  This means that it will
> be defined unless you overflow or underflow the result.  The end definition
> thus makes left shift an identical operation for signed and unsigned
> integers in one's-complement and two's complement representation so long as
> overflow/underflow does not occur; only sign-magnitude changes.

How do you define whether it changes the sign bit? Sounds to me like you're
asking for it to be "defined if it worked, undefined if it didn't", which sounds
a lot like "it's undefined". If you can't give me the rules for when it does
work reliably, what's the point?

> Note that for purposes of undefinition, the result is undefined if the sign
> bit flips at any multiply by 2 along the way, even if the end result would
> have the same sign bit.

Signed integers have UB overflow so you can't define it like that.

> Finally, I think that if an implementation is configured such that it may
> set <int>::is_modulo to true, regardless if whether it actually does, such
> as in GCC and Clang -fwrapv, then overflow/underflow should be defined as
> (int)((unsigned)x << y).  is_modulo's definition would be updated to
> include <<.

In one's-complement or two's-complement representation, x << y would be defined when the high y+1 bits of x are equal.  In sign-magnitude representation, x << y would be defined when the magnitude portion does not overflow.

To mathematically describe this: For x of signed type, and y of integer type in range (std::numeric_limits<decltype(+x)>::digits, 0]:

x << y for nonnegative x is defined when x <= std::numeric_limits<decltype(+x)>::max() / pow(2, y), with this division rounded toward zero.
x << y for negative x is defined when x >= std::numeric_limits<decltype(+x)>::min() / pow(2, y), with this division rounded toward zero.

pow(2, y) is meant to be of integer type of infinite precision; I can't write superscript y here.  I don't literally mean the C function pow(), for that would be of type double.

Justification for the rounding toward zero:

For 5 of 6 cases--the 6th case being negative numbers on two's-complement--the maximal value is a Mersenne number, such as 0x7FFFFFFF for a 32-bit int.  The negative case on one's complement and sign-magnitude works identically, with the inequality inverted.  Dividing by some number of bits with rounding toward zero results in something like 0x007FFFFF for e.g. y=8.  (Rounding toward zero results in the same behavior for positive and negative.)  Input x=0x007FFFFF, y=8 results in 0x7FFFFF00, which is valid.  Input x=0x00800000, y=8--x being one higher--results in 0x80000000, which has overflowed and is therefore undefined.  Thus 0x007FFFFF is the correct highest valid input x for y=8 and 32-bit promoted x.

For the two's-complement negative case, rounding is irrelevant, because the minimum integer is a power of two, and is thus divisible by pow(2, y).  The resulting defined domain x >= -0x00800000 occurs in the example y=8 with 32-bit promoted x.

Melissa

David Krauss

unread,
Jul 28, 2014, 7:25:47 PM7/28/14
to std-pr...@isocpp.org
On 2014–07–29, at 7:16 AM, Myriachan <myri...@gmail.com> wrote:

In one's-complement or two's-complement representation, x << y would be defined when the high y+1 bits of x are equal.

Negative numbers do not naturally shift at all in one’s-complement representations. Support for such machines precludes requiring support for shifting negative numbers.

x << y for nonnegative x is defined when x <= std::numeric_limits<decltype(+x)>::max() / pow(2, y), with this division rounded toward zero.
x << y for negative x is defined when x >= std::numeric_limits<decltype(+x)>::min() / pow(2, y), with this division rounded toward zero.

This is closer to [expr.shift] §5.8 than your previous descriptions. It’s usually a good idea to use the existing spec as a basis for proposals.

Justification for the rounding toward zero:

For 5 of 6 cases--the 6th case being negative numbers on two's-complement--the maximal value is a Mersenne number,

You lost me. Is this a practical application? The rounding toward zero is just how fractions are always converted to integers.

An overall justification is that GCC already implements your proposal, and it’s the most popular C++ compiler (and the only up-to-date one targeted to exotic machines).

Thiago Macieira

unread,
Jul 28, 2014, 7:40:10 PM7/28/14
to std-pr...@isocpp.org
On Monday 28 July 2014 16:16:59 Myriachan wrote:
> In one's-complement or two's-complement representation, x << y would be
> defined when the high y+1 bits of x are equal. In sign-magnitude
> representation, x << y would be defined when the magnitude portion does not
> overflow.

What if we just leave it implementation-defined?

Shifting of unsigned integers is a well-defined operation. Shifting signed
integers is implementation-defined and may change the sign bit.

Done. It works, we just don't know what it will do.

I'm not sure you considered a sign-magnitude machine that stores the sign in
the lowest bit instead of the highest. That is:

magnitude:sign shifted
0000 001:1 0000 011:0 (-1 << 1 == 3)
or 0000 001:1 0000 011:1 (-1 << 1 == -3)

Depending on whether it shifts in a zero or a sign bit.

David Krauss

unread,
Jul 28, 2014, 7:44:01 PM7/28/14
to std-pr...@isocpp.org

On 2014–07–29, at 7:40 AM, Thiago Macieira <thi...@macieira.org> wrote:

> What if we just leave it implementation-defined?

Specifically, an implementation-defined value.

> I'm not sure you considered a sign-magnitude machine that stores the sign in
> the lowest bit instead of the highest. That is:

That’s not allowed by C. C++ specifically inherits the numeric limits of C, not specifically the allowed representations, but nobody is designing machines with C++-only integer representations.

Myriachan

unread,
Jul 28, 2014, 9:29:43 PM7/28/14
to std-pr...@isocpp.org


On Monday, July 28, 2014 4:40:10 PM UTC-7, Thiago Macieira wrote:
On Monday 28 July 2014 16:16:59 Myriachan wrote:
> In one's-complement or two's-complement representation, x << y would be
> defined when the high y+1 bits of x are equal.  In sign-magnitude
> representation, x << y would be defined when the magnitude portion does not
> overflow.

What if we just leave it implementation-defined?

That would be fine by me, because that's already an improvement over the status quo, which is undefined rather than implementation-defined.

However, I think my definition is what pretty much everyone is used to and avoids unexpected behavior.
 
Shifting of unsigned integers is a well-defined operation. Shifting signed
integers is implementation-defined and may change the sign bit.

Done. It works, we just don't know what it will do.

I'm not sure you considered a sign-magnitude machine that stores the sign in
the lowest bit instead of the highest. That is:

        magnitude:sign                shifted
         0000 001:1                0000 011:0         (-1 << 1 == 3)
or        0000 001:1                0000 011:1        (-1 << 1 == -3)

Depending on whether it shifts in a zero or a sign bit.

The way I defined shifting in my replies, on all sign-magnitude implementations, shifting would have to use bit masking operations in order to achieve correct results.  Use unsigned integers if you want shifting to match underlying memory bits; the difference between unsigned and signed bit shifting has always been known as "logical" and "arithmetic" shifting for a reason.  Effectively, left shift becomes a multiply by 2 to the power of the right side.

On one's-complement machines, shift left would probably have to be implemented as a loop of add instructions to repeatedly add a number to itself, since true shifting would have strange results.

In my opinion, the stronger burden should always be on the esoteric architectures.  That's just how I am.

Melissa

Douglas Boffey

unread,
Jul 30, 2014, 9:05:53 AM7/30/14
to std-pr...@isocpp.org

On Tuesday, 29 July 2014 02:29:43 UTC+1, Myriachan wrote:

On one's-complement machines, shift left would probably have to be implemented as a loop of add instructions to repeatedly add a number to itself, since true shifting would have strange results.
 
What’s wrong with negating, shifting left and negating again?
 
To me, the only logical way to mandate operator<<(signed whatever, unsigned whatever) would be to preserve any sign bit (e.g. on a 2’s complement machine, S-x(1)-x(2)-...-x(n) << 1U should become S-x(2)-x(3)-...-x(n)-0, etc.  But to mandate a specific behaviour could/would? make the operation inefficient.

Bengt Gustafsson

unread,
Jul 30, 2014, 1:40:50 PM7/30/14
to std-pr...@isocpp.org
Can anyone mention a computer that does not use 2's complement and is supported by a compiler that is being updated? 

Myriachan

unread,
Aug 1, 2014, 12:14:12 PM8/1/14
to std-pr...@isocpp.org


On Wednesday, July 30, 2014 6:05:53 AM UTC-7, Douglas Boffey wrote:

On Tuesday, 29 July 2014 02:29:43 UTC+1, Myriachan wrote:

On one's-complement machines, shift left would probably have to be implemented as a loop of add instructions to repeatedly add a number to itself, since true shifting would have strange results.
 
What’s wrong with negating, shifting left and negating again?

If you defined operator << this way, INT_MIN << 0 would be undefined on two's-complement implementations, because -INT_MIN is undefined, and thus wouldn't survive the first step.  If (INT_MIN << 0) == INT_MIN were mandated as a special case, then your idea is actually a simple way to define it.

To me, the only logical way to mandate operator<<(signed whatever, unsigned whatever) would be tois convenient so long as it achieves the same result.  If the Standard defined shifting left as iterative multiplication by 2, compilers would still use a shift-left immediat preserve any sign bit (e.g. on a 2’s complement machine, S-x(1)-x(2)-...-x(n) << 1U should become S-x(2)-x(3)-...-x(n)-0, etc.  But to mandate a specific behaviour could/would? make the operation inefficient.

The "as-if" rule of C++ means that a compiler can implement shifting more or less whichever way it wants, so long as it returns the same result.  Even if the Standard defined shifting left as iterative multiplication by 2, you can bet that x86-series compilers would still implement << 17 as "shl eax, 17" or "imul eax, eax, 131072".


I've translated my ideas to what I think would be the Standardese.  I feel that defining the shift operators the way two's-complement processors implement them in machine code is the best way to go, since this is the way we all learn it in school or whatever learning means we used.  It's also the way that other programming languages that've copied C syntax work.  Python, Perl, Java, C#, etc. all use two's-complement semantics with their shift operators.  (And in the case of Python, the "sign bit" is considered to be at infinity, since its integer type is arbitrary-precision.)

Non-two's-complement implementations of C/C++ already have to deal with two's-complement semantics; converting from signed integer type to unsigned integer type on such architectures is a conversion to two's-complement representation ([conv.integral] 4.7.2).

(1)  The shift operators << and >> group left to right.
    shift-expression:
        additive-expression
        shift-expression << additive-expression
        shift-expression >> additive-expression
    The operands shall be of integral or unscoped enumeration type and integral promotions are performed.  The type of the result is the promoted left operand.  The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.

(2)  The value of E1 << E2 is E1 × 2^E2.  If E1 has an unsigned type, the value of this result is E1 × 2^E2 reduced modulo one more than the maximum value representable in the result type.  Otherwise, if E1 has a signed type, and the implementation gives the result type modulo semantics ([numeric.limits.members] 18.3.2.4.60-62), then E1 × 2^E2, first reduced modulo one more than the maximum value representable in the corresponding unsigned type, converted to the result type, is the resulting value.  Otherwise, if E1 has a signed type, and U1 × 2^E2, where U1 is E1 converted to the corresponding unsigned type of the result type, is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value.  Otherwise, the behavior is undefined.  [ Note: this operation on unsigned and non-negative signed integers, as well as shift left of negative signed integers on two's-complement and sign-and-magnitude implementations, has a result equivalent to E1 left-shifted E2 bit positions, with vacated bits zero-filled, if the result is defined. — end note ] [ Note: by clause 3 of this paragraph, it is possible for signed integers to change their sign, depending upon the representation. — end note ] [ Example: 1 << 31 has a value of INT_MIN on a two's-complement implementation whose int type has 32 bits, because the shift is calculated as unsigned int then converted back, and that result is representable as int.  This is regardless of whether the implementation gives int modulo semantics. — end example ]

(3)  The value of E1 >> E2 is E1 / 2^E2, rounded toward −∞.  [ Example: (-4) >> 3 has a value of -1 and not 0. — end example ] [ Note: on all integer representations, for E1 of unsigned type or E1 of signed type and non-negative value, the result is equivalent to the integer portion of the quotient of E1 / 2^E2.  The result is also equivalent to E1 right-shifted E2 bit positions, with vacated bits zero-filled, and with bits shifted past the ones' position discarded. — end note ] [ Note: on two's-complement implementations, for E1 of signed type, the result is equivalent to E1 right-shifted E2 bit positions, with vacated bits filled with a copy of the sign bit, and with bits shifted past the ones' position discarded. — end note ]

This also requires a minor adjustment to std::numeric_limits<T>::is_modulo to note that << is included alongside +, - and * as operators that wrap.

The changes in the above compared to the current Standard are basically these:
0. No change in the rule about E2 needing to be non-negative and less than the number of bits in E1.
1. << is fully-defined on signed integers if -fwrapv is enabled.  is_modulo's definition would change to include <<.
2. << is fully-defined on signed integers if the result, when converted to the unsigned type, fits in the unsigned type.  The unsigned type rule is already in the Standard; (1 << 31) == INT_MIN is actually already the case on 32-bit two's-complement implementations by the rules of the Standard.  It's only negative values of E1 (the input value) for which my proposal changes anything.  Since I used conversion to unsigned as the definition, this effectively gives it two's-complement semantics regardless of implementation.  It will still be undefined on overflowing the unsigned type, though, as in (-2) << 31.
3. Defined >> to have two's-complement semantics.

Reply all
Reply to author
Forward
0 new messages