Wierd gotchya with integer 0x80000000

Mark Hammond

unread,

Jan 10, 1997, 3:00:00 AM1/10/97

to

Hi all,

In almost all cases, the following is true:

eval(hex(n))==n

However, hex(0x80000000) yields "-0x80000000", which is an invalid
integer literal. Using it results in an error. Consider:

>>> def test(n):
... if n <> eval(hex(n)): print "Value bad"
...
>>> test(0)
>>> test(-0x7FFFFFFF)
>>> test(0x7FFFFFFF)
>>> test(0x80000000)
Traceback (innermost last):
File "<stdin>", line 1, in ?
File "<stdin>", line 2, in test
File "<string>", line 0, in ?
OverflowError: integer negation

Any clues?

Mark.
----------------------------------------------------------------------
Mark Hammond - MHam...@skippinet.com.au
Check out Python - _the_ language for the Web/CGI/Windows/MFC/Unix/etc
<http://www.python.org> & <http://www.python.org/ftp/python/pythonwin>

Piet van Oostrum

unread,

Jan 10, 1997, 3:00:00 AM1/10/97

to

>>>>> pe...@sorted.org (Pete Bentley) (PB) writes:

PB> Python's behaviour is technically correct, although perhaps
PB> hex(0x8000000) could be special-cased to return "-0x0" and the
PB> parser modified so that -0x0 gets represented as 0x8000000 but
PB> I think the current behaviour is more in tune with the principal of
PB> least surprises.

No, it should return "0x8000000".
--
Piet van Oostrum <pi...@cs.ruu.nl>
URL: http://www.cs.ruu.nl/~piet [PGP]

Guido van Rossum

unread,

Jan 10, 1997, 3:00:00 AM1/10/97

to

> In almost all cases, the following is true:
>
> eval(hex(n))==n
>
> However, hex(0x80000000) yields "-0x80000000", which is an invalid
> integer literal. Using it results in an error.

This is an old problem with no easy solution. First of all, this
number is *not* "some form of negative zero" as some reader commented.
It is "the largest negative number" in 32-bit "two's complement"
integer representation (which all current CPUs use).

The following discussion assumes 32-bit integers; on 64-bit machines,
the numbers are (much) larger, but the principle remains the same.

Two's complement uses 2**31 bit patterns to represent nonnegative
numbers, from zero to 2**31-1 (== 2147483647). Note that all these
patterns have a 0 for their highest bit. The remaining 2**31 bit
patterns, all with a 1 for the high bit, represent negative numbers,
from -1 to -2**31 (== -2147483648). The representation for negative
numbers uses all 1 bits (0xFFFFFFFF) for -1, 0xFFFFFFFE for -2, and so
on. It so happens that -2147483647 (the negative of the largest
positive number) is 0x80000001. So there's one bit pattern left:
0x80000000 (a sign bit followed by 31 zero bits). The obvious
interpretation is -2147483648 (this is obvious if you consider what
happens if you add 1 to it). But this is a negative number which is
its own negated value! If you negate it, you get 2147483648, which is
0x80000000 -- but interpreted as a signed 32-bit integer, this is
negative, and in fact the same number we started out with...

In machine language, and in most versions of C (C normally doesn't
check for arithmetic overflows and such, even though the standard
doesn't forbit an application that does), this is no big deal. If x
happens to be 0x80000000, then -x has the same value as x. This is
usually easy to avoid (just don't try to add up the national debt :-),
occasionally useful (though the "unsigned" C data type should be used
where this behavior is relied upon), and occasionally causes
mysteriously wrong results (like when the national debt comes out as 2
dollars).

It is for this latter kind of program failures that Python implements
thorough overflow checking on all arithmetic operations. Thus, while
in C, 2147483647+1 (which generates an overflow, which is ignored)
equals -2147483648, in Python, the overflow is noticed, and the
OverflowError exception is raised. For the same reason, entering the
integer literal 2147483648 also raises OverflowError, and when x
contains the value -2147483648 (0x80000000, the most negative 32-bit
integer), the expression -x raises OverflowError, since its negation
cannot be represented as a 32-bit signed integer. (There are a few
cases where Python does not check for overflow -- these are the input
of octal and hex literals that represent values in the range
[2**31...2**32), and the shift operators << and >>.)

Now, what can be done about this???

- Forbidding the integer value -2147483648 (raising OverflowError when
it is generated) has the disadvantage that for some system or library
functions, this value can "naturally" occur or may be needed as input
to represent some kind of special condition; it also would be an
anomaly when using 32-bit integers to represent bit masks (again,
these occur frequently when interfacing with other software).

- Not raising OverflowError on the expression -x, when x has the value
-2147483648, breaks the promise that Python will raise OverflowError
when the mathematically expected result of an expression involving
32-bit integers cannot be represented as a signed 32-bit integer.

- Automatically converting to a long integer breaks the promise that
the type of any arithmetic expression involving only 32-bit integers is
also a 32-bit integer. (This solution has some merits but would
require a major restructuring of Python's integer implementation, and
opens the door to a whole can of worms about its type system. That's
why I don't want to do this now.)

My favorite:

- I tend to like the solution for hex() and oct() which simply omits
the '-' sign when the value happens to be the most negative value;
then hex(x) and oct(x) would yield '0x80000000' and '020000000000'.

- This is not a solution for str(x), repr(x) and `x` though -- these
will yield a decimal integer with a sign. This has the same problem
as hex -- eval(`x`) equals x unless x is the most negative integer. I
think we'll just have to live with the fact that sometimes, eval(`-x`)
will not yield x -- the same is true when x is not a concrete type,
anyway.

Some additional tidbits of Python knowledge relating to this subject:

- This whole discussion affects Python's "standard" integers, which
are limited to the size of the "long int" data type of the C compiler
used to build the Python interpreter that is used -- normally, either
32 or 64 bits.

- Python's "long" integers, which are really *arbitrarily long*,
aren't affected by the limitations of 32-bit arithmetic, and in fact
have be used to verify some of the numbers above.

- To emulate C's "unsigned" types in Python, you can use Python long
integers and the % operator, using a modulus of 1L<<32; Python's %
operator (unlike C's) always yields a result in the range
[0...modulus). For example:

>>> x = 0xFFFFFFFFL
>>> y = 0xFFFFFFFFL
>>> hex((x+y) % (1<<32L))
'0xFFFFFFFEL'

- You can't input the expression -2147483648, because it is evaluated
as the (positive) literal 2147483648 followed by a negation; but that
literal is too large to be represented in 32 bits.

- You *can* input this same value as string.atoi('-2147483648').

- You can also input it as 0x80000000 or 020000000000. Hex and octal
literals are accepted in the range [0...2**32) -- although positive
hex or octal literals in the range [2**31...2**32) will result in
*negative* integer values! This is such common practice in C, and the
use of hex/octal literals is mostly required for interfacing with code
written in C (e.g. device drivers), that it would be a major hassle if
this were not allowed.

- Python internally works in binary, not in hex, decimal or octal.
This is true for nearly all computer languages (COBOL and related
languages are an exception).

--Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum

unread,

Jan 10, 1997, 3:00:00 AM1/10/97

to

> No, it should return "0x8000000".

Agreed (this is what I proposed). Next question: should perhaps *all*
hex/octal conversions produce the "unsigned" representation (which is
acceptable to the parser)? So hex(-1) would return 0xFFFFFFFF?

Skip Montanaro

unread,

Jan 10, 1997, 3:00:00 AM1/10/97

to

Guido> Next question: should perhaps *all* hex/octal conversions produce
Guido> the "unsigned" representation (which is acceptable to the
Guido> parser)? So hex(-1) would return 0xFFFFFFFF?

That sounds like the best option to me. When I start finddling with hex or
octal numbers, it's strictly to look at how the bits are set, not what the
magnitude of the number is. Bit 31's role as a sign bit is unimportant to
me.

--
Skip Montanaro | Musi-Cal: http://concerts.calendar.com/
sk...@calendar.com | "It doesn't matter where you get your appetite as
(518)372-5583 | long as you eat at home." -- Sloan Wainwright

Christian Tismer

unread,

Jan 10, 1997, 3:00:00 AM1/10/97

to

Guido van Rossum wrote:
>
> > No, it should return "0x8000000".
>

> Agreed (this is what I proposed). Next question: should perhaps *all*
> hex/octal conversions produce the "unsigned" representation (which is
> acceptable to the parser)? So hex(-1) would return 0xFFFFFFFF?

For Integers: Yes.
In general: No, that's impossible.
Since longs never will overflow, the hex would be a bit too
long for my main memory ;^)

- chris

Christian Tismer

unread,

Jan 10, 1997, 3:00:00 AM1/10/97

to

Mark Hammond wrote:
>
> Hi all,

>
> In almost all cases, the following is true:
>
> eval(hex(n))==n
>
> However, hex(0x80000000) yields "-0x80000000", which is an invalid

> integer literal. Using it results in an error. Consider:

This is indeed an omission in the implementation since this special
case always must be handled specially on two's complement machines.
More painful, since there are machines with more that 32 bit, you
may happen to get the correct value, which would be indeed
'0x100000000' . The X86 CPU correctly sets the overflow bit for
the "neg" instruction, because this singleton does not have an inverse
under 2's complement.
About 15 years ago I worked with a Telefunken TR440, having one's
complement. There was a total symmetry, with the drawback of having
two zeroes :-))
As a usual practive, compilers mapped boolean values on +- zero,
that was really funny.
AND it was true that -x == not x, negation was one machine cycle,
there fore its name.
Much better to have just one zero, of course :-))

Today, we almost everywhere find -x == not x + 1 which once was
two machine cycles and therefore called two's complement.
The drawback is the asymmetry, but I found it useful sometimes
to take 0x80000000 as a missing data value or overflow marker.

>
> >>> def test(n):
> ... if n <> eval(hex(n)): print "Value bad"
> ...
> >>> test(0)
> >>> test(-0x7FFFFFFF)
> >>> test(0x7FFFFFFF)
> >>> test(0x80000000)
> Traceback (innermost last):
> File "<stdin>", line 1, in ?
> File "<stdin>", line 2, in test
> File "<string>", line 0, in ?
> OverflowError: integer negation
>
> Any clues?

Really wierd. I'll look into the sources, but this time a simple
patch will be probably impossible.

Do you think it makes sense to change the language in that way
that an overflow condition always results in a long value?
Or, why do we have the distinction at all? Wouldn't it suffice
to generate an overflow when it doesn't fit, say into an array,
but who needs restricted numbers?

- chris

support: <http://www.appliedbiometrics.com/python>

Kragen Sittler

unread,

Jan 21, 1997, 3:00:00 AM1/21/97

to

In article <199701101610.LAA09839@monty>,

Guido van Rossum <gu...@CNRI.Reston.VA.US> wrote:
>> However, hex(0x80000000) yields "-0x80000000", which is an invalid
>> integer literal. Using it results in an error.
>
>This is an old problem with no easy solution. First of all, this
>number is *not* "some form of negative zero" as some reader commented.
>It is "the largest negative number" in 32-bit "two's complement"
>integer representation (which all current CPUs use).

All statements containing the word 'all' are false. :)

Notably, the MuP21 does not have a subtract instruction or a negate
instruction; if you want to represent negative numbers, it's your
choice whether you do it with two's complement or one's complement, or
even sign-magnitude -- although when you add one to 0x1fffff, you get
0x000000. (The MuP21 is a 21-bit processor; the 21st bit is a carry
bit, and is not stored in RAM.)

I believe the same to be true of its descendants, the i21 and F21.

I am not aware of any other current CPUs that don't use two's
complement.

Peace,
Kragen
--
Peace,
Kragen
+1 408 975 2632