Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Precision issue

1 view
Skip to first unread message

Ladvánszky Károly

unread,
Oct 10, 2003, 4:56:43 AM10/10/03
to
Entering 3.4 in Python yields 3.3999999999999999.
I know it is due to the fact that 3.4 can not be precisely expressed by the
powers of 2. Can the float handling rules of the underlying layers be set
from Python so that 3.4 yield 3.4?

Thanks,

Károly


Alex Martelli

unread,
Oct 10, 2003, 5:54:12 AM10/10/03
to
Ladvánszky Károly wrote:

It seems, from the question, that you might not have entirely understood
and grasped the explanations you can find at:
http://www.python.org/doc/current/tut/node14.html
and I quote, in particular:
"""
no matter how many base 2 digits you're willing to use, the decimal value
0.1 cannot be represented exactly as a base 2 fraction.
"""
and the same holds for 3.4 for exactly the same reason. As long as
binary is used -- and today's machines don't offer options -- that's it.

Only by using Decimal or Rational fractional numbers would that be possible,
and today's hardware doesn't really support them, so you would need to do
everything in software. If you don't mind the resulting huge slowdown in
computation speed (many apps don't really do many computations, so don't
care) there are quite a few packages on the net, though none, AFAIK, which
is considered "ready for production use". The speediest way to do Rational
arithmetic is, I suspect, with gmpy (the mpq type) -- but "speedy" is in
the eye of the beholder. Let me give you an example...:

according to timeit.py, after x=3.4 (a native float), int(x*10) takes
2.46 microseconds; but after x=mpq(3.4) [having imported mpq fm gmpy],
int(x*10) takes 9.72 microseconds! That's FOUR times slower...

Also, mpq(3.4)'s default representation is as a fraction, 17/5; so,
you would still need some formatting work to display it as 3.4 instead.


Alex

Gerhard Häring

unread,
Oct 10, 2003, 5:19:23 AM10/10/03
to
Ladvánszky Károly wrote:

A float is a float is a float ;)

What can be done is to change the formatting of floats in print
statements, for example. IIRC there was some magic in Python to that
effect that was removed somewhere in the 2.x line.

If you're concerned about the output, why don't you just explicitely
format your float numbers? Something like:

>>> print "%.2f" % 3.4
3.40

-- Gerhard

Duncan Booth

unread,
Oct 10, 2003, 6:36:16 AM10/10/03
to
Alex Martelli <al...@aleax.it> wrote in
news:8lvhb.258000$R32.8...@news2.tin.it:

> Ladvánszky Károly wrote:
>
>> Entering 3.4 in Python yields 3.3999999999999999.
>> I know it is due to the fact that 3.4 can not be precisely expressed
>> by the powers of 2. Can the float handling rules of the underlying
>> layers be set from Python so that 3.4 yield 3.4?
>
> It seems, from the question, that you might not have entirely
> understood and grasped the explanations you can find at:
> http://www.python.org/doc/current/tut/node14.html
> and I quote, in particular:

I know this is an FAQ, but the one thing I've never seen explained
satisfactorily is why repr(3.4) has to be '3.3999999999999999' rather than
'3.4'?

Surely the important thing is that the equality eval(repr(x))==x has to
hold for floating point numbers, and that holds just as true for the short
3.4 as it does for the 17 digit version?

Microsoft .Net has a numeric format "R" which does a similar job. The R
specifier guarantees that a floating point numeric value converted to a
string will be parsed back into the same numeric value. It does this by
first trying a general format with 15 digits of precision then parsing that
back to a number. If the result is not the same as the original it then
falls back to the 17 digit value. There's no reason why Python couldn't do
the same:

def float_repr(x):
s = "%.15g" % x
if float(s)==x: return s
return "%.17g" % x

This would be MUCH friendlier for newcomers to the language.

--
Duncan Booth dun...@rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?

Michael Hudson

unread,
Oct 10, 2003, 7:41:07 AM10/10/03
to
Duncan Booth <dun...@NOSPAMrcp.co.uk> writes:

> Alex Martelli <al...@aleax.it> wrote in
> news:8lvhb.258000$R32.8...@news2.tin.it:
>
> > Ladvánszky Károly wrote:
> >
> >> Entering 3.4 in Python yields 3.3999999999999999.
> >> I know it is due to the fact that 3.4 can not be precisely expressed
> >> by the powers of 2. Can the float handling rules of the underlying
> >> layers be set from Python so that 3.4 yield 3.4?
> >
> > It seems, from the question, that you might not have entirely
> > understood and grasped the explanations you can find at:
> > http://www.python.org/doc/current/tut/node14.html
> > and I quote, in particular:
>
> I know this is an FAQ, but the one thing I've never seen explained
> satisfactorily is why repr(3.4) has to be '3.3999999999999999' rather than
> '3.4'?

I believe "computational and code complexity" is the main answer to
that one.

Start here

http://citeseer.nj.nec.com/gay90correctly.html

?

> Surely the important thing is that the equality eval(repr(x))==x has to
> hold for floating point numbers, and that holds just as true for the short
> 3.4 as it does for the 17 digit version?
>
> Microsoft .Net has a numeric format "R" which does a similar job. The R
> specifier guarantees that a floating point numeric value converted to a
> string will be parsed back into the same numeric value. It does this by
> first trying a general format with 15 digits of precision then parsing that
> back to a number. If the result is not the same as the original it then
> falls back to the 17 digit value. There's no reason why Python couldn't do
> the same:
>
> def float_repr(x):
> s = "%.15g" % x
> if float(s)==x: return s
> return "%.17g" % x
>
> This would be MUCH friendlier for newcomers to the language.

It would be nice, but I think it's pretty hard to do efficiently. Tim
Peters would be more certain than me :-)

"Patches welcome" might apply, too. I don't think your suggested
float repr will fly, I'm afraid...

Cheers,
mwh

--
34. The string is a stark data structure and everywhere it is
passed there is much duplication of process. It is a perfect
vehicle for hiding information.
-- Alan Perlis, http://www.cs.yale.edu/homes/perlis-alan/quotes.html

Stephen Horne

unread,
Oct 10, 2003, 7:47:45 AM10/10/03
to
On Fri, 10 Oct 2003 09:54:12 GMT, Alex Martelli <al...@aleax.it>
wrote:

>Ladvánszky Károly wrote:
>
>> Entering 3.4 in Python yields 3.3999999999999999.
>> I know it is due to the fact that 3.4 can not be precisely expressed by
>> the powers of 2. Can the float handling rules of the underlying layers be
>> set from Python so that 3.4 yield 3.4?
>
>It seems, from the question, that you might not have entirely understood
>and grasped the explanations you can find at:
>http://www.python.org/doc/current/tut/node14.html
>and I quote, in particular:
>"""
>no matter how many base 2 digits you're willing to use, the decimal value
>0.1 cannot be represented exactly as a base 2 fraction.
>"""

There are simple workarounds for this, though. For instance, if
someone needs one or two decimal digits of precision, they can simply
hold all values scaled by 10 or 100 - while neither 0.01 nor 0.1 can
be precisely represented as a binary value, 1 can be.

Actually, scaling by 100 is overkill - the nearest power of two is
128, and 100/128 is equivalent to 25/32, so a scale factor of 25
should be sufficient to allow two decimal digits of precision.
However, there is probably no advantage to scaling by 25 instead of
100 - just the disadvantage that the purpose of the scaling is less
obvious.

Anyway, this could be what Ladvánszky Károly meant, I suppose, by
'float handling rules of the underlying layers'. Of course this can't
be done using the existing float class as Python doesn't define the
float handling rules - they are presumably defined in most cases by
the floating point logic built into the CPU.

Perhaps Ladvánszky Károly has used Ada, where you can request a fixed
point or floating point type with particular properties and it is up
to the compiler to find or create one to suit. Though IIRC its floats
are still always binary floats - only its fixed point values can
handle decimals as Ladvánszky Károly has requested.

There are also, of course, languages which support different numeric
types such as a decimal type - Java has BigDecimal and C# has Decimal
(the C# one works using a fixed point scaling where the scaling must
be a power of 10, Java BigDecimal is IIRC more powerful - arbitrary
scale and precision, I think).

The issue of alternate numeric representations does get raised from
time to time, as I'm sure Alex knows better than me. There are
packages around. One key problem is that different people want
different things. A person who wants a fixed-point number class, for
instance, is not going to want the additional overhead from a rational
number class. Even a symbolic expression class has been suggested in
the past.

One common need for decimals is for currency values. This need can be
avoided by simply storing currency values in pence/cents rather than
pounds/dollars. Similarly, percentages can be handled using integer
calculations. For example, adding 17.5% (for UK VAT, perhaps) can be
handled using floats as follows...

result = value * 1.175

or using integers as follows...

result = (value * 1175) / 1000

In the example above, the parentheses are unnecessary but included to
emphasise the order of the calculations, which is important.

In my experience, this method handles most cases where results need to
be consistent with decimal arithmetic - store values using appropriate
units and the problem usually goes away.


--
Steve Horne

steve at ninereeds dot fsnet dot co dot uk

Duncan Booth

unread,
Oct 10, 2003, 8:09:15 AM10/10/03
to
Michael Hudson <m...@python.net> wrote in
news:7h3r81l...@pc150.maths.bris.ac.uk:

>> I know this is an FAQ, but the one thing I've never seen explained
>> satisfactorily is why repr(3.4) has to be '3.3999999999999999' rather
>> than '3.4'?
>
> I believe "computational and code complexity" is the main answer to
> that one.
>
> Start here
>
> http://citeseer.nj.nec.com/gay90correctly.html

<snip>

The code I gave isn't exactly complex, even when you rewrite it in C.

>> def float_repr(x):
>> s = "%.15g" % x
>> if float(s)==x: return s
>> return "%.17g" % x
>>
>> This would be MUCH friendlier for newcomers to the language.
>
> It would be nice, but I think it's pretty hard to do efficiently. Tim
> Peters would be more certain than me :-)
>
> "Patches welcome" might apply, too. I don't think your suggested
> float repr will fly, I'm afraid...
>

I'm happy to do a patch if there is any chance of it being accepted.
Obviously doing the conversion twice makes the code slower, but I'm not
sure how critical it would be given that its a pretty fast operation to
begin with:

C:\Pythonsrc\python\dist\src\PCbuild>python ..\lib\timeit.py "repr(3.4)"
100000 loops, best of 3: 10.5 usec per loop

C:\Pythonsrc\python\dist\src\PCbuild>\python23\python ..\lib\timeit.py
"repr(3.4)"
100000 loops, best of 3: 7.58 usec per loop

So its about a third slower, but you have to do 300,000 reprs before you
lose 1 second of cpu time.

Ben Finney

unread,
Oct 10, 2003, 8:48:22 AM10/10/03
to
On Fri, 10 Oct 2003 10:36:16 +0000 (UTC), Duncan Booth wrote:
> I know this is an FAQ, but the one thing I've never seen explained
> satisfactorily is why repr(3.4) has to be '3.3999999999999999' rather
> than '3.4'?

Because '3.4' is what str(3.4) returns. If repr(3.4) lies about the
value stored, what function will you leave us to discover the actual
value?

The str() function is for getting the working output of the value. The
repr() function is for discovering, as precisely as possible, the actual
value.

--
\ "I know the guy who writes all those bumper stickers. He hates |
`\ New York." -- Steven Wright |
_o__) |
Ben Finney <http://bignose.squidly.org/>

Duncan Booth

unread,
Oct 10, 2003, 9:55:34 AM10/10/03
to
Ben Finney <bignose-h...@and-benfinney-does-too.id.au> wrote in
news:slrnbodbq6.2fj.b...@rose.localdomain.fake:

> On Fri, 10 Oct 2003 10:36:16 +0000 (UTC), Duncan Booth wrote:
>> I know this is an FAQ, but the one thing I've never seen explained
>> satisfactorily is why repr(3.4) has to be '3.3999999999999999' rather
>> than '3.4'?
>
> Because '3.4' is what str(3.4) returns. If repr(3.4) lies about the
> value stored, what function will you leave us to discover the actual
> value?

In what way is 3.3999999999999999 any more the value than 3.4?

>>> print 3.3999999999999999 == 3.4
True

The exact value stored is neither of these, it is somewhere in between the
two (perhaps 3.399999999999999911182158029987476766109466552734375 if I
counted it right). repr gives a representation of the float which is
guaranteed to convert back to the same sequence of bits, 3.4 will do just
as well for this case as the longer value.

Try a different value, say 3.333*30. Repr gives you 99.990000000000009, str
gives you 99.99. I'm not proposing that should change because
99.990000000000009 != 99.99.

> The str() function is for getting the working output of the value. The
> repr() function is for discovering, as precisely as possible, the actual
> value.

It doesn't do that. It currently shows you the value to sufficient
precision to allow you to reconstruct the bits exactly.

Documentation on repr:
repr(...)
repr(object) -> string

Return the canonical string representation of the object.
For most object types, eval(repr(object)) == object.

Stephen Horne

unread,
Oct 10, 2003, 10:14:52 AM10/10/03
to
On 10 Oct 2003 22:38:22 +0950, Ben Finney
<bignose-h...@and-benfinney-does-too.id.au> wrote:

>On Fri, 10 Oct 2003 10:36:16 +0000 (UTC), Duncan Booth wrote:
>> I know this is an FAQ, but the one thing I've never seen explained
>> satisfactorily is why repr(3.4) has to be '3.3999999999999999' rather
>> than '3.4'?
>
>Because '3.4' is what str(3.4) returns. If repr(3.4) lies about the
>value stored, what function will you leave us to discover the actual
>value?
>
>The str() function is for getting the working output of the value. The
>repr() function is for discovering, as precisely as possible, the actual
>value.

Is there a basis for that claim?

My impression has always been that 'repr' gives a representation of
the value which, when parsed (using 'eval', for instance),
reconstructs the original value. In this respect, '3.4' is just as
good as '3.3999999999'.

IIRC, a binary float can always be given a precise decimal
representation - it simply tends to take a lot of digits. The fact
that repr doesn't give a perfect representation of the binary float
value suggests that it is not 'for discovering, as precisely as
possible, the actual value'.

Out of curiosity, I wrote the function at the bottem of this post to
convert a Python float into two string representations - a rational
and a decimal - both having precisely the same value as the float. I
got the following results starting with 3.4...

Rational : 7656119366529843/2251799813685248
Decimal : 3.399999999999999911182158029987476766109466552734375

I don't guarantee that the code is bug free - it may well be very
fragile, depending on platform specific float handling - but I believe
these results are accurate. For the record, I'm running Python 2.3
under Windows 2000 on a Pentium 4.

I am not aware of a Python standard function which will give this
rather impractical level of precision. But if Pythons repr function
was intended 'for discovering, as precisely as possible, the actual
value', it really should give the decimal value from above which it is
clearly possible to discover. The truth is, however, that such
discovery is rarely if ever useful - floats are inherently approximate
values.

Converting float values to decimal is almost always either for the
benefit of human readers, or for creating text representations that
will be converted back to floats at some point. str serves the first
purpose well. For the second, the important identity is that
eval(repr(x)) == x (or at least a sufficiently close approximation -
I'm not sure if repr currently preserves the full precision of the
float).


Here's the code...

def perfect (val) :
# Convert to rational

num = 0
denom = 1

# handle integer part

num = int(val)
val -= num

# handle fractional part

while val != 0 :
val *= 2
num *= 2
denom *= 2

if val >= 1 :
num += 1
val -= 1

rat = str(num)+"/"+str(denom)

# convert to decimal form

dec = str(num/denom) + "."
num = num % denom

while num > 0 :
num *= 10
dec += str(num / denom)
num = num % denom

return (rat, dec)

Terry Reedy

unread,
Oct 10, 2003, 1:01:56 PM10/10/03
to

"Duncan Booth" <dun...@NOSPAMrcp.co.uk> wrote in message
news:Xns9410970CA71...@127.0.0.1...

> Ben Finney <bignose-h...@and-benfinney-does-too.id.au> wrote
in
> news:slrnbodbq6.2fj.b...@rose.localdomain.fake:
> > Because '3.4' is what str(3.4) returns. If repr(3.4) lies about
the
> > value stored, what function will you leave us to discover the
actual
> > value?
>
> In what way is 3.3999999999999999 any more the value than 3.4?

In the same way that 0 is a better approximation of .3 than 1, and
vice versa for .7. repr(<float>) attemps to return closest 17 digit
decimal, or perhaps closest that will yield same binary when evaled.
Sometime adding or substracting 1 to or from last digits will give a
decimal that also evals to same, sometimes not.

Let's turn question around. Suppose you started with
a=3.3999999999999999
Now, would you want repr(a) to be number entered, or less accurate
3.4?

Or suppose 'a' resulted from calculation rather than entered literal.
Why should repr() do anything but report closest approximation
possible? Especially given that one can explicitly choose any level
of rounding one wants. As Ben said, if repr() fudged output, then we
would need another function to replace it. But we already have
round(), formats, and str() to do fudging.

Terry J. Reedy


Terry Reedy

unread,
Oct 10, 2003, 1:26:27 PM10/10/03
to

"Stephen Horne" <$$$$$$$$$$$$$$$$$@$$$$$$$$$$$$$$$$$$$$.co.uk> wrote
in message news:odcdovgdf7mib8emi...@4ax.com...

> My impression has always been that 'repr' gives a representation of
> the value which, when parsed (using 'eval', for instance),
> reconstructs the original value. In this respect, '3.4' is just as
> good as '3.3999999999'.

Not just *a* representation, but the *most accurate*. '3.4' is (as
you show below) less accurate, or that would have been chosen instead.
The internal value is what it is, regardless of whether it results
from this literal or that literal or from calculation. Why the
opposition to having a way to get the closest 17-digit decimal
approximation?

> IIRC, a binary float can always be given a precise decimal
> representation - it simply tends to take a lot of digits. The fact
> that repr doesn't give a perfect representation of the binary float
> value suggests that it is not 'for discovering, as precisely as
> possible, the actual value'.

It is for decimally representing, as precisely as possible *with 17
digits*, the actual value. I presume that 17 in the minimum necessary
to guarantee a unique, back-convertible prepresentation for every
float.

> Out of curiosity, I wrote the function at the bottem of this post to
> convert a Python float into two string representations - a rational
> and a decimal - both having precisely the same value as the float. I
> got the following results starting with 3.4...
>
> Rational : 7656119366529843/2251799813685248
> Decimal : 3.399999999999999911182158029987476766109466552734375

If this is correct, then rounding up to 3.4 would be like rounding .11
to 1 instead of 0.

Terry J. Reedy

Alex Martelli

unread,
Oct 10, 2003, 1:36:37 PM10/10/03
to
Duncan Booth wrote:

I like this idea, actually. Care to try your hand at a patch for
2.4 ...?


Alex

Cameron Laird

unread,
Oct 10, 2003, 2:51:05 PM10/10/03
to
In article <Ooicne5U3dh...@comcast.com>,
Terry Reedy <tjr...@udel.edu> wrote:
.
.

.
>It is for decimally representing, as precisely as possible *with 17
>digits*, the actual value. I presume that 17 in the minimum necessary
>to guarantee a unique, back-convertible prepresentation for every
>float.
.
.
.
Stuff in this area is difficult to express precisely. I'm
not sure what your, "I presume that ..." means. Here's one
way to think about that magic number: there are "floats"
which are distinct, but agree to sixteen (decimal-)digits
of accuracy.

Some (seventeen-digit) decimals canNOT be achieved through
a round trip.
--

Cameron Laird <cla...@phaseit.net>
Business: http://www.Phaseit.net

Tim Peters

unread,
Oct 10, 2003, 2:37:51 PM10/10/03
to
[Duncan Booth]

> I know this is an FAQ, but the one thing I've never seen explained
> satisfactorily is why repr(3.4) has to be '3.3999999999999999' rather
> than '3.4'?

Python doesn't do float<->string conversion itself. That's done by the
platform C library.

The IEEE-754 standard requires that if a 754 double is converted to a string
with 17 significant decimal digits, then converted back to a 754 double
again, you'll get back exactly the double you started with.

Python does not guarantee that, and it can't, because the C library does the
heavy lifting in both directions. But because Python use a C %.17g format,
Python eval(repr(x)) == x holds on any platform whose C library meets the
minimal relevant requirements of the 754 standard. I believe all major C
libraries do meet this now.

The 754 standard does not require that string->double or double->string
round correctly in all cases. That's a (much) stronger requirement than
that eval(repr(x)) == x.

> ...


> There's no reason why Python couldn't do the same:
>
> def float_repr(x):
> s = "%.15g" % x
> if float(s)==x: return s
> return "%.17g" % x

Sorry, but there is a reason: if done on a platform whose C library
implements perfect-rounding double->string (e.g., I think gcc does now),
this can hit cases where the string may not reproduce x when eval'ed back on
a different platform whose C library isn't so conscientious but which
nevertheless meets the 754 standard's more forgiving (than perfect rounding)
requirements.

This is acutely important because Python's marshal format (used for .pyc
files) represents floats as repr'ed strings. By making repr() pump out 17
digits, we maximize the odds that .pyc files ported across platforms load
back exactly the same 754 doubles across (754) platforms.

> This would be MUCH friendlier for newcomers to the language.

A decimal floating type would be truly friendlier for them. In extensive
experience with Python using %.12g for repr(float) in the old days, the
primary effect of that was to delay the point at which newcomers bumped into
their first fatal fp "surprise", and *recognized* it as being fatal to them.
I've come to love seeing newcomers whine about the output for, e.g., 0.1:
they hit it early, and are immediately directed to the Appendix explaining
what's going on. This spurs a healthy and necessary mental reset about how
binary floating-point arithemtic really works. In return, what we see much
less often now are complaints about binary fp surprises in much subtler
contexts. If 3.4 got displayed as exactly "3.4", newcomers would face the
much harder task of recognizing the subtle consequences of that, no, in
reality it's not exactly 3.4 at all, and assuming that is can have
catastrophic consequences.

All that said, there's an implementation of IBM's (Mike Cowlishaw's)
proposed standard decimal arithmetic in the Python CVS sandbox, begging for
use, docs and improvement. That would match newcomer expectations much
better, without contorted hacks trying to make it appear that it's something
it isn't. Effort put there would address a cause instead of a symptom.


Stephen Horne

unread,
Oct 10, 2003, 8:02:33 PM10/10/03
to
On Fri, 10 Oct 2003 13:26:27 -0400, "Terry Reedy" <tjr...@udel.edu>
wrote:

>
>"Stephen Horne" <$$$$$$$$$$$$$$$$$@$$$$$$$$$$$$$$$$$$$$.co.uk> wrote
>in message news:odcdovgdf7mib8emi...@4ax.com...
>> My impression has always been that 'repr' gives a representation of
>> the value which, when parsed (using 'eval', for instance),
>> reconstructs the original value. In this respect, '3.4' is just as
>> good as '3.3999999999'.
>
>Not just *a* representation, but the *most accurate*. '3.4' is (as
>you show below) less accurate, or that would have been chosen instead.
>The internal value is what it is, regardless of whether it results
>from this literal or that literal or from calculation. Why the
>opposition to having a way to get the closest 17-digit decimal
>approximation?

I'm not strongly opposed - in fact, I'm not really opposed at all. I
didn't start the discussion, I just countered an argument which I
still believe is simply wrong.

Even so, what is so advantageous about using the closest 17-digit
decimal approximation? That doesn't seem to me to be particulary
suited to the purpose of repr - alternative schemes for choosing the
repr may potentially be better suited.

Certainly it is *not* the most accurate representation possible.

>I presume that 17 in the minimum necessary
>to guarantee a unique, back-convertible prepresentation for every
>float.

In other words, the choice of 17 digits precision is supporting the
goal of a sufficient (ie not overkill) backward-compatible
representation.

The given result is not the optimum in either sufficiency or
precision. If precision is the goal, the result should be
'3.399999999999999911182158029987476766109466552734375'. If
sufficiency is the goal, the result should be '3.4'.

This isn't a criticism of the current system - a balance between
extremes is often appropriate, and in this case the key advantage is
presumably a simpler and faster algorithm. But it may well be valid to
discuss alternate schemes and their rationales.

>> Out of curiosity, I wrote the function at the bottem of this post to
>> convert a Python float into two string representations - a rational
>> and a decimal - both having precisely the same value as the float. I
>> got the following results starting with 3.4...
>>
>> Rational : 7656119366529843/2251799813685248
>> Decimal : 3.399999999999999911182158029987476766109466552734375
>
>If this is correct, then rounding up to 3.4 would be like rounding .11
>to 1 instead of 0.

Yes, if the logic *must* be about rounding. But that isn't necessarily
the best scheme given the purpose of repr. As I said, there are other
possible rationales that give different best representations - the
ones relevant here being 'most precise possible' (which Benn Finney
wrongly seemed to think repr provides - the whole point of my reply)
or 'sufficient'.

Using the representation '3.4' instead of '3.399999...' has advantages
both for human readers and for use in files/data packets - in the
latter case, for instance, it saves bytes. 'Sufficient' does not mean
providing 17 digits of precision when two will do.

Of course, I wouldn't mind a function which could give me the exact
level of precision I want. At present, the '%' operator gives the
closest thing to this, but even that refuses to give more digits
precision than those 17 (or whatever) that repr gives - extra digits
just get filled in as zeros irrespective of the precise value.

Whether there is a need for this, of course, is a different thing.

If I were to argue against, my argument would be that there is the
risk of introducing bugs - either in the repr function itself
(conversion to decimal can be more fiddly than some people realise,
especially when optimised) or in code which relies on the way the repr
function currently works (which I believe has been fixed since Python
prehistory).

The truth is, however, that I really don't care much either way. Just
because I disagree with an argument made by one clan, that doesn't
automatically mean I've joined the other clan. I was simply pointing
out what I see as an error - not taking sides.

Terry Reedy

unread,
Oct 10, 2003, 8:13:50 PM10/10/03
to

"Cameron Laird" <cla...@lairds.com> wrote in message
news:vodvspi...@corp.supernews.com...

> In article <Ooicne5U3dh...@comcast.com>,
> Terry Reedy <tjr...@udel.edu> wrote:
> >It is for decimally representing, as precisely as possible *with 17
> >digits*, the actual value. I presume that 17 in the minimum
necessary
> >to guarantee a unique, back-convertible prepresentation for every
> >float.

> Stuff in this area is difficult to express precisely. I'm


> not sure what your, "I presume that ..." means. Here's one
> way to think about that magic number: there are "floats"
> which are distinct, but agree to sixteen (decimal-)digits
> of accuracy.

That is what I meant. 16 digits is not enough for binary float=>
decimal rep to be one-to-one

> Some (seventeen-digit) decimals canNOT be achieved through
> a round trip.

If you mean s != (sometimes) repr(eval(s)), of course; there are (I
believe) fewer than 10**17 floats (ignoring exponents), so mapping in
that direction cannot be onto. This is the fundamental problem; for
any positive number of bits and decimals, the two sets have different
sizes.

Terry J. Reedy


Duncan Booth

unread,
Oct 13, 2003, 4:19:06 AM10/13/03
to
"Tim Peters" <tim...@comcast.net> wrote in
news:mailman.1065811102...@python.org:

>> There's no reason why Python couldn't do the same:
>>
>> def float_repr(x):
>> s = "%.15g" % x
>> if float(s)==x: return s
>> return "%.17g" % x
>
> Sorry, but there is a reason: if done on a platform whose C library
> implements perfect-rounding double->string (e.g., I think gcc does
> now), this can hit cases where the string may not reproduce x when
> eval'ed back on a different platform whose C library isn't so
> conscientious but which nevertheless meets the 754 standard's more
> forgiving (than perfect rounding) requirements.
>
> This is acutely important because Python's marshal format (used for
> .pyc files) represents floats as repr'ed strings. By making repr()
> pump out 17 digits, we maximize the odds that .pyc files ported across
> platforms load back exactly the same 754 doubles across (754)
> platforms.

Thanks for giving me the reason, but I find this argument unconvincing on
several counts.

If a system has an inaccurate floating point library, then introducing
further inconsistencies based on whether the .pyc file was compiled locally
or copied from another system doesn't sound like a good solution. Surely if
the library is inaccurate you are going to get inaccurate results no matter
what tweaks Python tries to apply?

Also the marshal code doesn't actually use repr. For that matter the
interactive prompt which is what causes the problems I want to avoid in the
first place doesn't use repr either! (Marshal uses PyFloat_AsReprString
which comments say should be deprecated, repr uses float_repr, and
interactive mode uses float_print.) If you think it is important, I don't
have any problems with leaving the marshalling code generating as many
digits as it wants.

Tim Peters

unread,
Oct 13, 2003, 10:29:26 AM10/13/03
to pytho...@python.org
[Duncan Booth]

>>> There's no reason why Python couldn't do the same:
>>>
>>> def float_repr(x):
>>> s = "%.15g" % x
>>> if float(s)==x: return s
>>> return "%.17g" % x

[Tim]


>> Sorry, but there is a reason: if done on a platform whose C library
>> implements perfect-rounding double->string (e.g., I think gcc does
>> now), this can hit cases where the string may not reproduce x when
>> eval'ed back on a different platform whose C library isn't so
>> conscientious but which nevertheless meets the 754 standard's more
>> forgiving (than perfect rounding) requirements.
>>
>> This is acutely important because Python's marshal format (used for
>> .pyc files) represents floats as repr'ed strings. By making repr()
>> pump out 17 digits, we maximize the odds that .pyc files ported
>> across platforms load back exactly the same 754 doubles across (754)
>> platforms.

[Duncan]


> Thanks for giving me the reason, but I find this argument
> unconvincing on several counts.
>
> If a system has an inaccurate floating point library, then introducing
> further inconsistencies based on whether the .pyc file was compiled
> locally or copied from another system doesn't sound like a good
> solution. Surely if the library is inaccurate you are going to get
> inaccurate results no matter what tweaks Python tries to apply?

You snipped most of my msg. As explained in the parts not reproduced here,
Python is aiming to work correctly across (at least) platforms where the
native C library meets the minimal requirements of the 754 standard for
float <-> string accuracy. That doesn't require perfect rounding in all
cases, but to call a system meeting no more than the minimal requirements
"inaccurate" is quite a stretch. It can require multi-thousand bit
arithmetic (in some cases) to do perfect rounding, and that's why the
standard allowed for a small bit of slop. Perfect rounding isn't necessary
for eval(str(float)) == float to hold always; it's enough that platforms
meet the minimal 754 requirements and at least 17 significant digits are
produced in the float->string direction.

> Also the marshal code doesn't actually use repr. For that matter the
> interactive prompt which is what causes the problems I want to avoid in
> the first place doesn't use repr either! (Marshal uses
> PyFloat_AsReprString which comments say should be deprecated, repr
> uses float_repr, and interactive mode uses float_print.)

PyFloat_AsReprString(afloat) is the C API spelling of the Python-level
repr(afloat), as documented in floatobject.h. The comments say it should be
deprecated because it "pass[es] a char buffer without passing a length",
which has nothing to do with the result it produces; adding a buffer length
argument would satisfy the complaint.

It's a general rule that repr(obj) is produced at the interactive prompt
regardless of the type of obj; the specific function called to produce that
result in the specific case of isintance(obj, float) isn't really
interesting; what's relevant is that it *does* produce repr(float), however
it's implemented. It's also a general rule that eval(repr(obj)) == obj
should hold when sanely possible, again without regard to type(obj). That
last rule is why repr(float) does what it does; marshal exploits it.

There are other complaints that can be made about the interactive prompt
using repr(), and many such have been made over the years. sys.displayhook
was introduced in the hopes that people would build prompt format functions
they like better, and share them. It's remarkable (to me -- that's why I'm
remarking <wink>) that so few have.

> If you think it is important, I don't have any problems with leaving
> the marshalling code generating as many digits as it wants.

It's vital for marshal to try to reproduce floats across platforms. It does
OK at that now, but I think it would be better for marshal to move to a
binary format. That's got problems of its own, due to compatibility
hassles.

Regardless of what marshal does, it's still a general rule that Python
strive to maintain that eval(repr(x)) == x. This is true now for all
builtin scalar types, and for lists, tuples and dicts composed (possibly
recursively) of those.

repr(obj) can be an undesirable thing to produce at an interactive prompt
for many reasons, some depending on taste. That's why sys.displayhook
exists, so you can change interactive prompt behavior. The reason I like,
e.g., 0.1 *not* to display as "0.1" by default was given toward the end of
my msg (and had nothing to do with marshal, btw).


Paul Rubin

unread,
Oct 13, 2003, 11:44:17 AM10/13/03
to
"Tim Peters" <tim...@comcast.net> writes:
> It's vital for marshal to try to reproduce floats across platforms. It does
> OK at that now, but I think it would be better for marshal to move to a
> binary format. That's got problems of its own, due to compatibility
> hassles.

If the binary format is ieee754, that should be enough for
compatibility between all ieee754 machines (adjusting for endianness,
right)?

Some library code to convert ieee754 to native format on non-ieee754
machines might be needed, but surely it wouldn't cause worse problems
than the existing decimal encoding already does. (Do you know for a
fact that anyone is actually using python floats on any machines like
that these days anyway? I can imagine a few vaxes still running, but
the cray-1's have to all be gone by now).

Stephen Horne

unread,
Oct 13, 2003, 12:50:36 PM10/13/03
to
On 13 Oct 2003 08:44:17 -0700, Paul Rubin
<http://phr...@NOSPAM.invalid> wrote:

My guess would be that it's not just a hardware issue.

There may still be software around that is doing real arithmetic
without using hardware floats. Possible examples may include software
compiled on old PC compilers, before co-processors were always
available - I certainly remember Turbo BASIC having a different binary
float representation than some common BASIC interpreter (GW BASIC?)
years ago.

Basically, people may be using Python in contexts where interacting
with old systems is important - and interacting with old systems can
throw up some interesting surprises.

0 new messages