Comparing float and decimal

4 views
Skip to first unread message

D'Arcy J.M. Cain

unread,
Sep 23, 2008, 7:20:12 AM9/23/08
to pytho...@python.org
I'm not sure I follow this logic. Can someone explain why float and
integer can be compared with each other and decimal can be compared to
integer but decimal can't be compared to float?

>>> from decimal import Decimal
>>> i = 10
>>> f = 10.0
>>> d = Decimal("10.00")
>>> i == f
True
>>> i == d
True
>>> f == d
False

This seems to break the rule that if A is equal to B and B is equal to
C then A is equal to C.

--
D'Arcy J.M. Cain <da...@druid.net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.

Gerhard Häring

unread,
Sep 23, 2008, 8:45:33 AM9/23/08
to pytho...@python.org
D'Arcy J.M. Cain wrote:
> I'm not sure I follow this logic. Can someone explain why float and
> integer can be compared with each other and decimal can be compared to
> integer but decimal can't be compared to float?
>
>>>> from decimal import Decimal
>>>> i = 10
>>>> f = 10.0
>>>> d = Decimal("10.00")
>>>> i == f
> True
>>>> i == d
> True
>>>> f == d
> False

I can give you the technical answer after reading the sources of the
decimal module: you can only compare to Decimal what can be converted to
Decimal. And that is int, long and another Decimal.

Everything else will return False when comparing.

> This seems to break the rule that if A is equal to B and B is equal to
> C then A is equal to C.

Yes, but only if comparison from type(A) to type(C) is supported at all.
Instead of raising ValueError or NotImplementedError, the decimal
module returns False here.

-- Gerhard

Robert Lehmann

unread,
Sep 23, 2008, 8:58:57 AM9/23/08
to
On Tue, 23 Sep 2008 07:20:12 -0400, D'Arcy J.M. Cain wrote:

> I'm not sure I follow this logic. Can someone explain why float and
> integer can be compared with each other and decimal can be compared to
> integer but decimal can't be compared to float?

In comparisons, `Decimal` tries to convert the other type to a `Decimal`.
If this fails -- and it does for floats -- the equality comparison
renders to False. For ordering comparisons, eg. ``D("10") < 10.0``, it
fails more verbosely::

TypeError: unorderable types: Decimal() < float()

The `decimal tutorial`_ states:

"To create a Decimal from a float, first convert it to a string. This
serves as an explicit reminder of the details of the conversion
(including representation error)."

See the `decimal FAQ`_ for a way to convert floats to Decimals.


>>>> from decimal import Decimal
>>>> i = 10
>>>> f = 10.0
>>>> d = Decimal("10.00")
>>>> i == f
> True
>>>> i == d
> True
>>>> f == d
> False
>
> This seems to break the rule that if A is equal to B and B is equal to C
> then A is equal to C.

I don't see why transitivity should apply to Python objects in general.

HTH,

.. _decimal tutorial: http://docs.python.org/lib/decimal-tutorial.html
.. _decimal FAQ: http://docs.python.org/lib/decimal-faq.html

--
Robert "Stargaming" Lehmann

Michael Palmer

unread,
Sep 23, 2008, 10:08:07 AM9/23/08
to

> > This seems to break the rule that if A is equal to B and B is equal to C
> > then A is equal to C.
>
> I don't see why transitivity should apply to Python objects in general.

Well, for numbers it surely would be a nice touch, wouldn't it.
May be the reason for Decimal to accept float arguments is that
irrational numbers or very long rational numbers cannot be converted
to a Decimal without rounding error, and Decimal doesn't want any part
of it. Seems pointless to me, though.

Michael Palmer

unread,
Sep 23, 2008, 11:50:15 AM9/23/08
to
On Sep 23, 10:08 am, Michael Palmer <m_palme...@yahoo.ca> wrote:
> May be the reason for Decimal to accept float arguments is that

NOT to accept float arguments.

Terry Reedy

unread,
Sep 23, 2008, 2:31:49 PM9/23/08
to pytho...@python.org
Gerhard Häring wrote:
> D'Arcy J.M. Cain wrote:
>> I'm not sure I follow this logic. Can someone explain why float and
>> integer can be compared with each other and decimal can be compared to
>> integer but decimal can't be compared to float?
>>
>>>>> from decimal import Decimal
>>>>> i = 10
>>>>> f = 10.0
>>>>> d = Decimal("10.00")
>>>>> i == f
>> True
>>>>> i == d
>> True
>>>>> f == d
>> False
>
> I can give you the technical answer after reading the sources of the
> decimal module: you can only compare to Decimal what can be converted to
> Decimal. And that is int, long and another Decimal.

The new fractions module acts differently, which is to say, as most
would want.

>>> from fractions import Fraction as F
>>> F(1) == 1.0
True
>>> F(1.0)
Traceback (most recent call last):
File "<pyshell#20>", line 1, in <module>
F(1.0)
File "C:\Program Files\Python30\lib\fractions.py", line 97, in __new__
numerator = operator.index(numerator)
TypeError: 'float' object cannot be interpreted as an integer
>>> F(1,2) == .5
True
>>> .5 == F(1,2)
True

so Fraction obviously does comparisons differently.

Decimal is something of an anomaly in Python because it was written to
exactly follow an external standard, with no concessions to what would
be sensible for Python. It is possible that that standard mandates that
Decimals not compare to floats.

tjr

Marc 'BlackJack' Rintsch

unread,
Sep 24, 2008, 12:21:10 AM9/24/08
to

Is 0.1 a very long number? Would you expect ``0.1 == Decimal('0.1')`` to
be `True` or `False` given that 0.1 actually is

In [98]: '%.50f' % 0.1
Out[98]: '0.10000000000000000555111512312578270211815834045410'

?

Ciao,
Marc 'BlackJack' Rintsch

Nick Craig-Wood

unread,
Sep 24, 2008, 5:30:03 AM9/24/08
to
Terry Reedy <tjr...@udel.edu> wrote:
> The new fractions module acts differently, which is to say, as most
> would want.
>
> >>> from fractions import Fraction as F
> >>> F(1) == 1.0
> True
> >>> F(1.0)
> Traceback (most recent call last):
> File "<pyshell#20>", line 1, in <module>
> F(1.0)
> File "C:\Program Files\Python30\lib\fractions.py", line 97, in __new__
> numerator = operator.index(numerator)
> TypeError: 'float' object cannot be interpreted as an integer

Both the Fraction module and the Decimal module could represent floats
exactly and reversibly since floats are of the form

mantissa * 2**exponent

which is exactly representable as a fraction (rational) and also as

mantissa2 * 10**exponent2

as to why we don't do this...

I guess it is to preserve the sanity of the user when they write
fraction(0.1) or decimal(0.1) did they really mean fraction(1,10),
decimal("0.1") or the exact representations which are
decimal("0.1000000000000000055511151231257827021181583404541015625")
and fraction(3602879701896397,2**55)

Given that we let the exact representations leak out anyway (which
causes a lot of FAQs in this list), eg

>>> 0.1
0.10000000000000001

IMHO We should put the exact conversions in for floats to Decimal and
Fraction by default and add a new section to the FAQ!

In that way people will see floats for what they really are - a crude
approximation to a rational number ;-)

--
Nick Craig-Wood <ni...@craig-wood.com> -- http://www.craig-wood.com/nick

Mark Dickinson

unread,
Sep 24, 2008, 6:14:38 AM9/24/08
to
On Sep 23, 7:31 pm, Terry Reedy <tjre...@udel.edu> wrote:
> Decimal is something of an anomaly in Python because it was written to
> exactly follow an external standard, with no concessions to what would
> be sensible for Python.  It is possible that that standard mandates that
> Decimals not compare to floats.

I don't think the standard says anything about interactions between
Decimals and floats. But there's certainly been a feeling amongst at
least some of the developers that the job of Python's decimal module
is
to implement the standard and no more, and that extensions to its
functionality belong elsewhere.

Regarding equality, there's at least one technical issue: the
requirement
that objects that compare equal hash equal. How do you come up with
efficient hash operations for integers, floats, Decimals and Fractions
that satisfy this requirement?

For other arithmetic operations: should the sum of a float and a
Decimal
produce a Decimal or a float? Why? It's not at all clear to me that
either of these types is 'higher up' the numerical tower than the
other.

Mark

Terry Reedy

unread,
Sep 24, 2008, 1:18:40 PM9/24/08
to pytho...@python.org
Mark Dickinson wrote:
> On Sep 23, 7:31 pm, Terry Reedy <tjre...@udel.edu> wrote:
>> Decimal is something of an anomaly in Python because it was written to
>> exactly follow an external standard, with no concessions to what would
>> be sensible for Python. It is possible that that standard mandates that
>> Decimals not compare to floats.
>
> I don't think the standard says anything about interactions between
> Decimals and floats.

If there is not now, there could be in the future, and the decimal
authors are committed to follow the standard wherever it goes.
Therefore, the safe course, to avoid possible future deprecations due to
doing too much, is to only do what is mandated.

> But there's certainly been a feeling amongst at
> least some of the developers that the job of Python's decimal module
> is to implement the standard and no more, and that extensions to its
> functionality belong elsewhere.

For the reason just stated. A slightly different take is this. The
reason for following the standard is so that decimal code in Python is
exact interconversion both from and to decimal code in other languages.
(And one reason for *that* is that one purpose of the standard is to
reliably implement legal and contractual standards for financial
calculations.) Using extensions in Python could break/deprecate code
translated away from Python.

> Regarding equality, there's at least one technical issue: the
> requirement
> that objects that compare equal hash equal. How do you come up with
> efficient hash operations for integers, floats, Decimals and Fractions
> that satisfy this requirement?

For integral values, this is no problem.
>>> hash(1) == hash(1.0) == hash(decimal.Decimal(1)) ==
hash(fractions.Fraction(1)) == 1
True

> For other arithmetic operations: should the sum of a float and a
> Decimal produce a Decimal or a float? Why? It's not at all clear to me that
> either of these types is 'higher up' the numerical tower than the
> other.

Floats and fractions have the same issue. Fractions are converted to
floats. I can think of two reasons: float operations are faster; floats
are my typically thought of as inexact and since the result is likely to
be inexact (rounded), float is the more appropriate type to express
that. Anyone who disagrees with the choice for their application can
explicitly convert the float to a fraction.

Decimals can also be converted to floats (they also have a __float__
method). But unlike fractions, the conversion must be explicit, using
float(decimal), instead of implicit, as with ints and fractions.

Someone *could* write a PyDecimal wrapper that would do implicit
conversion and thereby more completely integrate decimals with other
Python numbers, but I doubt that saving transitivity of equality will be
sufficient motivation ;-).

Terry Jan Reedy

Steven D'Aprano

unread,
Sep 24, 2008, 10:37:10 PM9/24/08
to
On Wed, 24 Sep 2008 04:30:03 -0500, Nick Craig-Wood wrote:


> Both the Fraction module and the Decimal module could represent floats
> exactly and reversibly since floats are of the form
>
> mantissa * 2**exponent
>
> which is exactly representable as a fraction (rational) and also as
>
> mantissa2 * 10**exponent2
>
> as to why we don't do this...
>
> I guess it is to preserve the sanity of the user when they write
> fraction(0.1) or decimal(0.1) did they really mean fraction(1,10),
> decimal("0.1") or the exact representations which are
> decimal("0.1000000000000000055511151231257827021181583404541015625") and
> fraction(3602879701896397,2**55)


I would say that in practice the chances that somebody *actually* wanted
0.1000000000000000055511151231257827021181583404541015625 when they wrote
0.1 is about the same as the chances that the BDFL will support braces in
Python 3.0.

(Hint: "from __future__ import braces")


> Given that we let the exact representations leak out anyway (which
> causes a lot of FAQs in this list), eg
>
>>>> 0.1
> 0.10000000000000001
>
> IMHO We should put the exact conversions in for floats to Decimal and
> Fraction by default and add a new section to the FAQ!


But of the reasons for having the Decimal module is to avoid such leaky
abstractions. Without Decimal (or fraction) there's no obvious way to get
0.1 exactly. It seems perverse to suggest that by default Decimal should
deliberately expose the same leaky abstraction that causes so much
trouble when people use floats.

> In that way people will see floats for what they really are - a crude
> approximation to a rational number ;-)

You can already see that just by printing a float:

>>> 0.3
0.29999999999999999
>>> 0.1
0.10000000000000001


--
Steven

Tim Roberts

unread,
Sep 25, 2008, 3:55:52 AM9/25/08
to

Actually, it's not. Your C run-time library is generating random digits
after it runs out of useful information (which is the first 16 or 17
digits). 0.1 in an IEEE 784 double is this:

0.100000000000000088817841970012523233890533447265625
--
Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.

Mark Dickinson

unread,
Sep 25, 2008, 5:42:20 AM9/25/08
to
On Sep 24, 6:18 pm, Terry Reedy <tjre...@udel.edu> wrote:
> If there is not now, there could be in the future, and the decimal
> authors are committed to follow the standard wherever it goes.
> Therefore, the safe course, to avoid possible future deprecations due to
> doing too much, is to only do what is mandated.

Makes sense. It looks as though the standard's pretty stable now
though; I'd be quite surprised to see it evolve to include discussion
of floats. But then again, people thought it was stable just before
all the extra transcendental operations appeared. :-)

> For integral values, this is no problem.
>  >>> hash(1) == hash(1.0) == hash(decimal.Decimal(1)) ==
> hash(fractions.Fraction(1)) == 1
> True

Getting integers and Decimals to hash equal was actually
something of a pain, and required changing the way that
the hash of a long was computed. The problem in a nutshell:
what's the hash of Decimal('1e100000000')? The number is
clearly an integer, so its hash should be the same as that
of 10**100000000. But computing 10**100000000, and then
finding its hash, is terribly slow... (Try
hash(Decimal('1e100000000')) in Python 2.5 and see
what happens! It's fixed in Python 2.6.)

As more numeric types get added to Python, this
'equal implies equal hash' requirement becomes more
and more untenable, and difficult to maintain. I also find
it a rather unnatural requirement: numeric equality
is, to me, a weaker equivalence relation than the one
that should be used for identifying keys in dictionaries,
elements of sets, etc. Fraction(1, 2) and 0.5 should, to my
eyes, be considered
different elements of a set. But the only way to 'fix' this
would be to have Python recognise two different types of
equality, and then it wouldn't be Python any more.

The SAGE folks also discovered that they couldn't
maintain the hash requirement.

> Decimals can also be converted to floats (they also have a  __float__
> method).  But unlike fractions, the conversion must be explicit, using
> float(decimal), instead of implicit, as with ints and fractions.

Maybe: if I *had* to pick a direction, I'd make float + Decimal
produce a Decimal, on the basis that Decimal is arbitrary precision
and that the float->Decimal conversion can be made losslessly.
But then there are a whole host of decisions one has to make
about rounding, significant zeros, ... (And then, as you point
out, Cowlishaw might come out with a new version of the standard
that does include interactions with floats, and makes an entirely
different set of decisions...)

Mark

Mark Dickinson

unread,
Sep 25, 2008, 6:05:13 AM9/25/08
to

I get (using Python 2.6):

>>> n, d = 0.1.as_integer_ratio()
>>> from decimal import Decimal, getcontext
>>> getcontext().prec = 100
>>> Decimal(n)/Decimal(d)
Decimal('0.1000000000000000055511151231257827021181583404541015625')

which is a lot closer to Marc's answer. Looks like your float
approximation to 0.1 is 6 ulps out. :-)

Mark

Nick Craig-Wood

unread,
Sep 25, 2008, 6:30:02 AM9/25/08
to

Not according to the decimal FAQ

http://docs.python.org/lib/decimal-faq.html

------------------------------------------------------------
import math
from decimal import *

def floatToDecimal(f):
"Convert a floating point number to a Decimal with no loss of information"
# Transform (exactly) a float to a mantissa (0.5 <= abs(m) < 1.0) and an
# exponent. Double the mantissa until it is an integer. Use the integer
# mantissa and exponent to compute an equivalent Decimal. If this cannot
# be done exactly, then retry with more precision.

mantissa, exponent = math.frexp(f)
while mantissa != int(mantissa):
mantissa *= 2.0
exponent -= 1
mantissa = int(mantissa)

oldcontext = getcontext()
setcontext(Context(traps=[Inexact]))
try:
while True:
try:
return mantissa * Decimal(2) ** exponent
except Inexact:
getcontext().prec += 1
finally:
setcontext(oldcontext)

print "float(0.1) is", floatToDecimal(0.1)
------------------------------------------------------------

Prints this

float(0.1) is 0.1000000000000000055511151231257827021181583404541015625

On my platform

Python 2.5.2 (r252:60911, Aug 8 2008, 09:22:44),
[GCC 4.3.1] on linux2
Linux 2.6.26-1-686
Intel(R) Core(TM)2 CPU T7200

Mark Dickinson

unread,
Sep 25, 2008, 7:02:49 AM9/25/08
to
On Sep 23, 1:58 pm, Robert Lehmann <stargam...@gmail.com> wrote:
> I don't see why transitivity should apply to Python objects in general.

Hmmm. Lack of transitivity does produce some, um, interesting
results when playing with sets and dicts. Here are sets s and
t such that the unions s | t and t | s have different sizes:

>>> from decimal import Decimal
>>> s = set([Decimal(2), 2.0])
>>> t = set([2])
>>> len(s | t)
2
>>> len(t | s)
1

This opens up some wonderful possibilities for hard-to-find bugs...

Mark

Terry Reedy

unread,
Sep 26, 2008, 2:20:08 AM9/26/08
to pytho...@python.org
Mark Dickinson wrote:
> On Sep 24, 6:18 pm, Terry Reedy <tjre...@udel.edu> wrote:
>> If there is not now, there could be in the future, and the decimal
>> authors are committed to follow the standard wherever it goes.
>> Therefore, the safe course, to avoid possible future deprecations due to
>> doing too much, is to only do what is mandated.
>
> Makes sense. It looks as though the standard's pretty stable now
> though; I'd be quite surprised to see it evolve to include discussion
> of floats. But then again, people thought it was stable just before
> all the extra transcendental operations appeared. :-)

What got me were the bizarre new 'logical' operations whose addition
were rather nonsensical from a Python viewpoint (though probably
sensible from an IBM profit business viewpoint). With those added, and
with this thread, I have decided that Decimals best be thought of as a
separate universe, not to be mixed with other numbers unless one has
good reason to and understands the possible anomalies of doing so. For
pure finance apps, I would think that there should be little reason to mix.

tjr

Tim Roberts

unread,
Sep 26, 2008, 11:34:43 PM9/26/08
to
Mark Dickinson <dick...@gmail.com> wrote:

Hmmph, that makes the vote 3 to 1 against me. I need to go re-examine my
"extreme float converter".

Tim Roberts

unread,
Sep 26, 2008, 11:44:28 PM9/26/08
to
Mark Dickinson <dick...@gmail.com> wrote:

>On Sep 25, 8:55 am, Tim Roberts <t...@probo.com> wrote:
>> Marc 'BlackJack' Rintsch <bj_...@gmx.net> wrote:
>> >0.1 actually is
>>
>> >In [98]: '%.50f' % 0.1
>> >Out[98]: '0.10000000000000000555111512312578270211815834045410'
>> >?
>>

>> ....  0.1 in an IEEE 784 double is this:


>>
>>      0.100000000000000088817841970012523233890533447265625
>
>I get (using Python 2.6):
>
>>>> n, d = 0.1.as_integer_ratio()
>>>> from decimal import Decimal, getcontext
>>>> getcontext().prec = 100
>>>> Decimal(n)/Decimal(d)
>Decimal('0.1000000000000000055511151231257827021181583404541015625')
>
>which is a lot closer to Marc's answer. Looks like your float
>approximation to 0.1 is 6 ulps out. :-)

Yes, foolishness on my part. The hex is 3FB99999_9999999A,
so we're looking at 19999_9999999A / 2^56 or
7205759403792794
-------------------
72057594037927936

which is the number that Marc, Nick, and you all describe. Apologies all
around. I actually dropped one 9 the first time around.

Adding one more weird data point, here's what I get trying Marc's original
sample on my Windows box:

C:\tmp>python
Python 2.4.4 (#71, Oct 18 2006, 08:34:43) [MSC v.1310 32 bit (Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> '%.50f' % 0.1
'0.10000000000000001000000000000000000000000000000000'
>>>
I assume this is the Microsoft C run-time library at work.

Gabriel Genellina

unread,
Sep 29, 2008, 10:42:20 PM9/29/08
to pytho...@python.org
En Thu, 25 Sep 2008 08:02:49 -0300, Mark Dickinson <dick...@gmail.com>
escribió:

> On Sep 23, 1:58 pm, Robert Lehmann <stargam...@gmail.com> wrote:
>> I don't see why transitivity should apply to Python objects in general.
>
> Hmmm. Lack of transitivity does produce some, um, interesting
> results when playing with sets and dicts. Here are sets s and
> t such that the unions s | t and t | s have different sizes:
>
>>>> from decimal import Decimal
>>>> s = set([Decimal(2), 2.0])
>>>> t = set([2])
>>>> len(s | t)
> 2
>>>> len(t | s)
> 1

Ouch!

> This opens up some wonderful possibilities for hard-to-find bugs...

And I was thinking all this thread was just a theoretical question without
practical consequences...

--
Gabriel Genellina

Terry Reedy

unread,
Sep 30, 2008, 4:21:13 AM9/30/08
to pytho...@python.org
Gabriel Genellina wrote:
> En Thu, 25 Sep 2008 08:02:49 -0300, Mark Dickinson <dick...@gmail.com>
> escribió:
>> On Sep 23, 1:58 pm, Robert Lehmann <stargam...@gmail.com> wrote:
>>> I don't see why transitivity should apply to Python objects in general.
>>
>> Hmmm. Lack of transitivity does produce some, um, interesting
>> results when playing with sets and dicts. Here are sets s and
>> t such that the unions s | t and t | s have different sizes:
>>
>>>>> from decimal import Decimal
>>>>> s = set([Decimal(2), 2.0])
>>>>> t = set([2])
>>>>> len(s | t)
>> 2
>>>>> len(t | s)
>> 1
>
> Ouch!

>
>> This opens up some wonderful possibilities for hard-to-find bugs...
>
> And I was thinking all this thread was just a theoretical question
> without practical consequences...

To explain this anomaly more clearly, here is a recursive definition of
set union.

if b: a|b = a.add(x)|(b-x) where x is arbitrary member of b
else: a|b = a

Since Python only defines set-set and not set-ob, we would have to
subtract {x} to directly implement the above. But b.pop() subtracts an
arbitrary members and returns it so we can add it. So here is a Python
implementation of the definition.

def union(a,b):
a = set(a) # copy to preserve original
b = set(b) # ditto
while b:
a.add(b.pop())
return a

from decimal import Decimal
d1 = Decimal(1)
fd = set((1.0, d1))
i = set((1,))
print(union(fd,i))
print(union(i,fd))

# prints

{1.0, Decimal('1')}
{1}

This is a bug in relation to the manual:
"union(other, ...)
set | other | ...
Return a new set with elements from both sets."


Transitivity is basic to logical deduction:
equations: a == b == c ... == z implies a == z
implications: (a implies b) and (b implies c)implies (a implies c)
The latter covers syllogism and other deduction rules.

The induction part of an induction proof of set union commutivity is a
typical equality chain:

if b:
a | b
= a.add(x)| b-x for x in b # definition for non-empty b
= b-x | a.add(x) # induction hypothesis
= (b-x).add(x) | a.add(x)-x # definition for non-empty a
= b | a.add(x)-x # definitions of - and .add
if x not in a:
= b | a # .add and -
if x in a:
= b | a-x # .add and -
= b.add(x) | a-x # definition of .add for x in b
= b | a # definition for non-empty a
= b | a # in either case, by case analysis

By transitivity of =, a | b = b | a !

So where does this go wrong for our example? This shows the problems.
>>> fd - i
set()

This pretty much says that 2-1=0, or that 2=1. Not good.

The fundamental idea of a set is that it only contains something once.
This definition assumes that equality is defined sanely, with the usual
properties. So, while fd having two members implies d1 != 1.0, the fact
that f1 == 1 and 1.0 == 1 implies that they are really the same thing,
so that d1 == 1.0, a contradiction.

To put this another way: The rule of substitution is that if E, F, and G
are expressions and E == F and E is a subexpression of G and we
substitute F for E in G to get H, then G == H. Again, this rule, which
is a premise of all formal expressional systems I know of, assumes the
normal definition of =. When we apply this,

fd == {f1, 1.0} == {1,1.0} == {1} == i

But Python says
>>> fd == i
False

Conclusion: fd is not a mathematical set.

Yet another anomaly:

>>> f = set((1.0,))
>>> i == f
True
>>> i.add(d1)
>>> f.add(d1)
>>> i == f
False

So much for "adding the same thing to equals yields equals", which is a
special case of "doing the same thing to equals, where the thing done
only depends on the properties that make the things equal, yields equals."


And another

>>> d1 in i
True
>>> 1.0 in i
True
>>> fd <= i
False

Manual: "set <= other
Test whether every element in the set is in other"

I bet Python first tests the sizes because the implementer *assumed*
that every member of a larger set could not be in a smaller set. I
presume the same assumption is used for equality testing.

Or

Manual: "symmetric_difference(other)
set ^ other
Return a new set with elements in either the set or other but not both."

>>> d1 in fd
True
>>> d1 in i
True
>>> d1
Decimal('1')
>>> fd ^ i
{Decimal('1')}

If no one beats me to it, I will probably file a bug report or two, but
I am still thinking about what to say and to suggest.

Terry Jan Reedy


Mark Dickinson

unread,
Sep 30, 2008, 7:42:52 AM9/30/08
to
On Sep 30, 9:21 am, Terry Reedy <tjre...@udel.edu> wrote:
> If no one beats me to it, I will probably file a bug report or two, but
> I am still thinking about what to say and to suggest.

I can't see many good options here. Some possibilities:

(0) Do nothing besides documenting the problem
somewhere (perhaps in a manual section entitled
'Infrequently Asked Questions', or
'Uncommon Python Pitfalls'). I guess the rule is
simply that Decimals don't mix well with other
numeric types besides integers: if you put both
floats and Decimals into a set, or compare a
Decimal with a Fraction, you're asking for
trouble. I suppose the obvious place for such
a note would be in the decimal documentation,
since non-users of decimal are unlikely to encounter
these problems.

(1) 'Fix' the Decimal type to do numerical comparisons
with other numeric types correctly, and fix up the
Decimal hash appropriately.

(2) I wonder whether there's a way to make Decimals
and floats incomparable, so that an (in)equality check
between them always raises an exception, and any
attempt to have both Decimals and floats in the same
set (or as keys in the same dict) also gives an error.
(Decimals and integers should still be allowed to
mix happily, of course.) But I can't see how this could
be done without adversely affecting set performance.

Option (1) is certainly technically feasible, but I
don't like it much: it means adding a whole load
of code to the Decimal module that benefits few users
but slows down hash computations for everyone.
And then any new numeric type that wants to fit in
with Python's rules had better worry about hashing
equal to ints, floats, Fractions, complexes, *and*
Decimals...

Option (2) appeals to me, but I can't see how to
implement it.

So I guess that just leaves updating the docs.
Other thoughts?

Mark

Terry Reedy

unread,
Sep 30, 2008, 3:07:08 PM9/30/08
to pytho...@python.org
Mark Dickinson wrote:
> On Sep 30, 9:21 am, Terry Reedy <tjre...@udel.edu> wrote:
>> If no one beats me to it, I will probably file a bug report or two, but
>> I am still thinking about what to say and to suggest.
>
> I can't see many good options here. Some possibilities:

Thanks for responding. Agreeing on a fix would make it more likely to
happen sooner ;-)

> (0) Do nothing besides documenting the problem
> somewhere (perhaps in a manual section entitled
> 'Infrequently Asked Questions', or
> 'Uncommon Python Pitfalls'). I guess the rule is
> simply that Decimals don't mix well with other
> numeric types besides integers: if you put both
> floats and Decimals into a set, or compare a
> Decimal with a Fraction, you're asking for
> trouble. I suppose the obvious place for such
> a note would be in the decimal documentation,
> since non-users of decimal are unlikely to encounter
> these problems.

Documenting the problem properly would mean changing the set
documentation to change at least the definitions of union (|), issubset
(<=), issuperset (>=), and symmetric_difference (^) from their current
math set based definitions to implementation based definitions that
describe what they actually do instead of what they intend to do. I do
not like this option.

> (1) 'Fix' the Decimal type to do numerical comparisons
> with other numeric types correctly, and fix up the
> Decimal hash appropriately.

(1A) All that is needed for fix equality transitivity corruption and the
consequent set/dictview problems is to correctly compare integral
values. For this, Decimal hash seems fine already. For the int i I
tried, hash(i) == hash(float(i)) == hash(Decimal(i)) ==
hash(Fraction(i)) == i.

It is fine for transitivity that all fractional decimals are unequal to
all fractional floats (and all fractions) since there is no integer (or
fraction) that either is equal to, let alone both.

This is what I would choose unless there is some 'hidden' problem. But
it seem to me this should work: when a float and decimal are both
integral (easy to determine) convert either to an int and use the
current int-whichever comparison.

> (2) I wonder whether there's a way to make Decimals
> and floats incomparable, so that an (in)equality check
> between them always raises an exception, and any
> attempt to have both Decimals and floats in the same
> set (or as keys in the same dict) also gives an error.
> (Decimals and integers should still be allowed to
> mix happily, of course.) But I can't see how this could
> be done without adversely affecting set performance.

I pretty strongly believe that equality checks should always work (at
least in Python as delivered) just as boolean checks should (and do).

> Option (1) is certainly technically feasible, but I
> don't like it much: it means adding a whole load
> of code to the Decimal module that benefits few users
> but slows down hash computations for everyone.
> And then any new numeric type that wants to fit in
> with Python's rules had better worry about hashing
> equal to ints, floats, Fractions, complexes, *and*
> Decimals...

I believe (1A) would be much easier both to implement and for new
numeric types.


>
> Option (2) appeals to me, but I can't see how to
> implement it.
>
> So I guess that just leaves updating the docs.
> Other thoughts?

(3) Further isolate decimals by making decimals also unequal to all
ints. Like (1A), this would easily fix transitivity breakage, but I
would consider the result less desirable.

My ranking: 1A > 3 > 0 > 2. I might put 1 between 1A and 3, but I am
not sure.

> Mark

Terry Jan Reedy

Mark Dickinson

unread,
Oct 1, 2008, 5:21:50 AM10/1/08
to
On Sep 30, 8:07 pm, Terry Reedy <tjre...@udel.edu> wrote:
> Documenting the problem properly would mean changing the set
> documentation to change at least the definitions of union (|), issubset
> (<=), issuperset (>=), and symmetric_difference (^) from their current
> math set based definitions to implementation based definitions that
> describe what they actually do instead of what they intend to do.  I do
> not like this option.

I was thinking more of a single-line warning in the set documentation
to the effect that funny things happen in the absence of transitivity
of equality, perhaps pointing the finger at Decimal as the most
obvious troublemaker; the Decimal documentation could elaborate on
this.
That is, rather than documenting exactly what the set operations do,
document what they're supposed to do (just as now) and declare that
behaviour is undefined for sets of elements for which transitivity
fails.

> (1A) All that is needed for fix equality transitivity corruption and the
> consequent set/dictview problems is to correctly compare integral
> values.  For this, Decimal hash seems fine already.  For the int i I
> tried, hash(i) == hash(float(i)) == hash(Decimal(i)) ==
> hash(Fraction(i)) == i.

Good point. Though I'd be a bit uncomfortable with having
Decimal(1) == 1.0 return True, but Decimal('0.5') == 0.5 return False.
Not sure what the source of my discomfort is; partly I think it's
that I want to be able to explain the comparison rules at the
level of types; having some floats behave one way and some behave
another feels odd. And explaining to confused users exactly
why Decimal behaves this way could be fun.

I think I'd prefer option 1 to option 1a.

> (3) Further isolate decimals by making decimals also unequal to all
> ints.  Like (1A), this would easily fix transitivity breakage, but I
> would consider the result less desirable.

I'd oppose this. I think having decimals play nicely with integers
is important, both practically and theoretically. There's probably
also already existing code that depends on comparisons between
integers and Decimals working as expected.

So I guess my ranking is 0 > 1 > 1a > 3, though I could live
with any of 0, 1, or 1a.

It's really the decimal module that's breaking the rules here;
I feel it's the decimal module's responsibility to either
fix or document the resulting problems.

It would also be nice if it were made more obvious somewhere
in the docs that transitivity of equality is important
for correct set and dict behaviour.

Mark

greg

unread,
Oct 3, 2008, 2:49:47 AM10/3/08
to
Mark Dickinson wrote:

> Option (2) appeals to me, but I can't see how to
> implement it.

It could be implemented for the special case of floats
and Decimals by keeping flags in each set indicating
whether any elements of those types have been added.

But doing this just for those two types would be
rather hackish, and wouldn't do anything for any
other incomparable types that might come along.

--
Greg

greg

unread,
Oct 3, 2008, 2:53:41 AM10/3/08
to
Terry Reedy wrote:

> Documenting the problem properly would mean changing the set

> documentation ... from their current

> math set based definitions to implementation based definitions

It could be documented that the mathematical definitions
hold only if the equality relations between all the elements
involved are transitive, and leave the semantics in other
cases undefined.

Then in the Decimal module it could be warned that the
equality relations between int-float and int-Decimal are
not transitive, perhaps noting that this can cause
problems with sets and dicts.

--
Greg

Reply all
Reply to author
Forward
0 new messages