4 views

Skip to first unread message

Sep 23, 2008, 7:20:12 AM9/23/08

to pytho...@python.org

I'm not sure I follow this logic. Can someone explain why float and

integer can be compared with each other and decimal can be compared to

integer but decimal can't be compared to float?

integer can be compared with each other and decimal can be compared to

integer but decimal can't be compared to float?

>>> from decimal import Decimal

>>> i = 10

>>> f = 10.0

>>> d = Decimal("10.00")

>>> i == f

True

>>> i == d

True

>>> f == d

False

This seems to break the rule that if A is equal to B and B is equal to

C then A is equal to C.

--

D'Arcy J.M. Cain <da...@druid.net> | Democracy is three wolves

http://www.druid.net/darcy/ | and a sheep voting on

+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.

Sep 23, 2008, 8:45:33 AM9/23/08

to pytho...@python.org

D'Arcy J.M. Cain wrote:

> I'm not sure I follow this logic. Can someone explain why float and

> integer can be compared with each other and decimal can be compared to

> integer but decimal can't be compared to float?

>

>>>> from decimal import Decimal

>>>> i = 10

>>>> f = 10.0

>>>> d = Decimal("10.00")

>>>> i == f

> True

>>>> i == d

> True

>>>> f == d

> False

> I'm not sure I follow this logic. Can someone explain why float and

> integer can be compared with each other and decimal can be compared to

> integer but decimal can't be compared to float?

>

>>>> from decimal import Decimal

>>>> i = 10

>>>> f = 10.0

>>>> d = Decimal("10.00")

>>>> i == f

> True

>>>> i == d

> True

>>>> f == d

> False

I can give you the technical answer after reading the sources of the

decimal module: you can only compare to Decimal what can be converted to

Decimal. And that is int, long and another Decimal.

Everything else will return False when comparing.

> This seems to break the rule that if A is equal to B and B is equal to

> C then A is equal to C.

Yes, but only if comparison from type(A) to type(C) is supported at all.

Instead of raising ValueError or NotImplementedError, the decimal

module returns False here.

-- Gerhard

Sep 23, 2008, 8:58:57 AM9/23/08

to

On Tue, 23 Sep 2008 07:20:12 -0400, D'Arcy J.M. Cain wrote:

> I'm not sure I follow this logic. Can someone explain why float and

> integer can be compared with each other and decimal can be compared to

> integer but decimal can't be compared to float?

In comparisons, `Decimal` tries to convert the other type to a `Decimal`.

If this fails -- and it does for floats -- the equality comparison

renders to False. For ordering comparisons, eg. ``D("10") < 10.0``, it

fails more verbosely::

TypeError: unorderable types: Decimal() < float()

The `decimal tutorial`_ states:

"To create a Decimal from a float, first convert it to a string. This

serves as an explicit reminder of the details of the conversion

(including representation error)."

See the `decimal FAQ`_ for a way to convert floats to Decimals.

>>>> from decimal import Decimal

>>>> i = 10

>>>> f = 10.0

>>>> d = Decimal("10.00")

>>>> i == f

> True

>>>> i == d

> True

>>>> f == d

> False

>

> This seems to break the rule that if A is equal to B and B is equal to C

> then A is equal to C.

I don't see why transitivity should apply to Python objects in general.

HTH,

.. _decimal tutorial: http://docs.python.org/lib/decimal-tutorial.html

.. _decimal FAQ: http://docs.python.org/lib/decimal-faq.html

--

Robert "Stargaming" Lehmann

Sep 23, 2008, 10:08:07 AM9/23/08

to

> > This seems to break the rule that if A is equal to B and B is equal to C

> > then A is equal to C.

>

> I don't see why transitivity should apply to Python objects in general.

Well, for numbers it surely would be a nice touch, wouldn't it.

May be the reason for Decimal to accept float arguments is that

irrational numbers or very long rational numbers cannot be converted

to a Decimal without rounding error, and Decimal doesn't want any part

of it. Seems pointless to me, though.

Sep 23, 2008, 11:50:15 AM9/23/08

to

On Sep 23, 10:08 am, Michael Palmer <m_palme...@yahoo.ca> wrote:

> May be the reason for Decimal to accept float arguments is that

> May be the reason for Decimal to accept float arguments is that

NOT to accept float arguments.

Sep 23, 2008, 2:31:49 PM9/23/08

to pytho...@python.org

Gerhard Häring wrote:

> D'Arcy J.M. Cain wrote:

>> I'm not sure I follow this logic. Can someone explain why float and

>> integer can be compared with each other and decimal can be compared to

>> integer but decimal can't be compared to float?

>>

> D'Arcy J.M. Cain wrote:

>> I'm not sure I follow this logic. Can someone explain why float and

>> integer can be compared with each other and decimal can be compared to

>> integer but decimal can't be compared to float?

>>

>>>>> from decimal import Decimal

>>>>> i = 10

>>>>> f = 10.0

>>>>> d = Decimal("10.00")

>>>>> i == f

>> True

>>>>> i == d

>> True

>>>>> f == d

>> False

>

>>>>> i = 10

>>>>> f = 10.0

>>>>> d = Decimal("10.00")

>>>>> i == f

>> True

>>>>> i == d

>> True

>>>>> f == d

>> False

>

> I can give you the technical answer after reading the sources of the

> decimal module: you can only compare to Decimal what can be converted to

> Decimal. And that is int, long and another Decimal.

> decimal module: you can only compare to Decimal what can be converted to

> Decimal. And that is int, long and another Decimal.

The new fractions module acts differently, which is to say, as most

would want.

>>> from fractions import Fraction as F

>>> F(1) == 1.0

True

>>> F(1.0)

Traceback (most recent call last):

File "<pyshell#20>", line 1, in <module>

F(1.0)

File "C:\Program Files\Python30\lib\fractions.py", line 97, in __new__

numerator = operator.index(numerator)

TypeError: 'float' object cannot be interpreted as an integer

>>> F(1,2) == .5

True

>>> .5 == F(1,2)

True

so Fraction obviously does comparisons differently.

Decimal is something of an anomaly in Python because it was written to

exactly follow an external standard, with no concessions to what would

be sensible for Python. It is possible that that standard mandates that

Decimals not compare to floats.

tjr

Sep 24, 2008, 12:21:10 AM9/24/08

to

Is 0.1 a very long number? Would you expect ``0.1 == Decimal('0.1')`` to

be `True` or `False` given that 0.1 actually is

In [98]: '%.50f' % 0.1

Out[98]: '0.10000000000000000555111512312578270211815834045410'

?

Ciao,

Marc 'BlackJack' Rintsch

Sep 24, 2008, 5:30:03 AM9/24/08

to

Terry Reedy <tjr...@udel.edu> wrote:

> The new fractions module acts differently, which is to say, as most

> would want.

>

> >>> from fractions import Fraction as F

> >>> F(1) == 1.0

> True

> >>> F(1.0)

> Traceback (most recent call last):

> File "<pyshell#20>", line 1, in <module>

> F(1.0)

> File "C:\Program Files\Python30\lib\fractions.py", line 97, in __new__

> numerator = operator.index(numerator)

> TypeError: 'float' object cannot be interpreted as an integer

> The new fractions module acts differently, which is to say, as most

> would want.

>

> >>> from fractions import Fraction as F

> >>> F(1) == 1.0

> True

> >>> F(1.0)

> Traceback (most recent call last):

> File "<pyshell#20>", line 1, in <module>

> F(1.0)

> File "C:\Program Files\Python30\lib\fractions.py", line 97, in __new__

> numerator = operator.index(numerator)

> TypeError: 'float' object cannot be interpreted as an integer

Both the Fraction module and the Decimal module could represent floats

exactly and reversibly since floats are of the form

mantissa * 2**exponent

which is exactly representable as a fraction (rational) and also as

mantissa2 * 10**exponent2

as to why we don't do this...

I guess it is to preserve the sanity of the user when they write

fraction(0.1) or decimal(0.1) did they really mean fraction(1,10),

decimal("0.1") or the exact representations which are

decimal("0.1000000000000000055511151231257827021181583404541015625")

and fraction(3602879701896397,2**55)

Given that we let the exact representations leak out anyway (which

causes a lot of FAQs in this list), eg

>>> 0.1

0.10000000000000001

IMHO We should put the exact conversions in for floats to Decimal and

Fraction by default and add a new section to the FAQ!

In that way people will see floats for what they really are - a crude

approximation to a rational number ;-)

--

Nick Craig-Wood <ni...@craig-wood.com> -- http://www.craig-wood.com/nick

Sep 24, 2008, 6:14:38 AM9/24/08

to

On Sep 23, 7:31 pm, Terry Reedy <tjre...@udel.edu> wrote:

> Decimal is something of an anomaly in Python because it was written to

> exactly follow an external standard, with no concessions to what would

> be sensible for Python. It is possible that that standard mandates that

> Decimals not compare to floats.

> Decimal is something of an anomaly in Python because it was written to

> exactly follow an external standard, with no concessions to what would

> be sensible for Python. It is possible that that standard mandates that

> Decimals not compare to floats.

I don't think the standard says anything about interactions between

Decimals and floats. But there's certainly been a feeling amongst at

least some of the developers that the job of Python's decimal module

is

to implement the standard and no more, and that extensions to its

functionality belong elsewhere.

Regarding equality, there's at least one technical issue: the

requirement

that objects that compare equal hash equal. How do you come up with

efficient hash operations for integers, floats, Decimals and Fractions

that satisfy this requirement?

For other arithmetic operations: should the sum of a float and a

Decimal

produce a Decimal or a float? Why? It's not at all clear to me that

either of these types is 'higher up' the numerical tower than the

other.

Mark

Sep 24, 2008, 1:18:40 PM9/24/08

to pytho...@python.org

Mark Dickinson wrote:

> On Sep 23, 7:31 pm, Terry Reedy <tjre...@udel.edu> wrote:

>> Decimal is something of an anomaly in Python because it was written to

>> exactly follow an external standard, with no concessions to what would

>> be sensible for Python. It is possible that that standard mandates that

>> Decimals not compare to floats.

>

> I don't think the standard says anything about interactions between

> Decimals and floats.

> On Sep 23, 7:31 pm, Terry Reedy <tjre...@udel.edu> wrote:

>> Decimal is something of an anomaly in Python because it was written to

>> exactly follow an external standard, with no concessions to what would

>> be sensible for Python. It is possible that that standard mandates that

>> Decimals not compare to floats.

>

> I don't think the standard says anything about interactions between

> Decimals and floats.

If there is not now, there could be in the future, and the decimal

authors are committed to follow the standard wherever it goes.

Therefore, the safe course, to avoid possible future deprecations due to

doing too much, is to only do what is mandated.

> But there's certainly been a feeling amongst at

> least some of the developers that the job of Python's decimal module

> is to implement the standard and no more, and that extensions to its

> functionality belong elsewhere.

For the reason just stated. A slightly different take is this. The

reason for following the standard is so that decimal code in Python is

exact interconversion both from and to decimal code in other languages.

(And one reason for *that* is that one purpose of the standard is to

reliably implement legal and contractual standards for financial

calculations.) Using extensions in Python could break/deprecate code

translated away from Python.

> Regarding equality, there's at least one technical issue: the

> requirement

> that objects that compare equal hash equal. How do you come up with

> efficient hash operations for integers, floats, Decimals and Fractions

> that satisfy this requirement?

For integral values, this is no problem.

>>> hash(1) == hash(1.0) == hash(decimal.Decimal(1)) ==

hash(fractions.Fraction(1)) == 1

True

> For other arithmetic operations: should the sum of a float and a

> Decimal produce a Decimal or a float? Why? It's not at all clear to me that

> either of these types is 'higher up' the numerical tower than the

> other.

Floats and fractions have the same issue. Fractions are converted to

floats. I can think of two reasons: float operations are faster; floats

are my typically thought of as inexact and since the result is likely to

be inexact (rounded), float is the more appropriate type to express

that. Anyone who disagrees with the choice for their application can

explicitly convert the float to a fraction.

Decimals can also be converted to floats (they also have a __float__

method). But unlike fractions, the conversion must be explicit, using

float(decimal), instead of implicit, as with ints and fractions.

Someone *could* write a PyDecimal wrapper that would do implicit

conversion and thereby more completely integrate decimals with other

Python numbers, but I doubt that saving transitivity of equality will be

sufficient motivation ;-).

Terry Jan Reedy

Sep 24, 2008, 10:37:10 PM9/24/08

to

On Wed, 24 Sep 2008 04:30:03 -0500, Nick Craig-Wood wrote:

> Both the Fraction module and the Decimal module could represent floats

> exactly and reversibly since floats are of the form

>

> mantissa * 2**exponent

>

> which is exactly representable as a fraction (rational) and also as

>

> mantissa2 * 10**exponent2

>

> as to why we don't do this...

>

> I guess it is to preserve the sanity of the user when they write

> fraction(0.1) or decimal(0.1) did they really mean fraction(1,10),

> decimal("0.1") or the exact representations which are

> decimal("0.1000000000000000055511151231257827021181583404541015625") and

> fraction(3602879701896397,2**55)

I would say that in practice the chances that somebody *actually* wanted

0.1000000000000000055511151231257827021181583404541015625 when they wrote

0.1 is about the same as the chances that the BDFL will support braces in

Python 3.0.

(Hint: "from __future__ import braces")

> Given that we let the exact representations leak out anyway (which

> causes a lot of FAQs in this list), eg

>

>>>> 0.1

> 0.10000000000000001

>

> IMHO We should put the exact conversions in for floats to Decimal and

> Fraction by default and add a new section to the FAQ!

But of the reasons for having the Decimal module is to avoid such leaky

abstractions. Without Decimal (or fraction) there's no obvious way to get

0.1 exactly. It seems perverse to suggest that by default Decimal should

deliberately expose the same leaky abstraction that causes so much

trouble when people use floats.

> In that way people will see floats for what they really are - a crude

> approximation to a rational number ;-)

You can already see that just by printing a float:

>>> 0.3

0.29999999999999999

>>> 0.1

0.10000000000000001

--

Steven

Sep 25, 2008, 3:55:52 AM9/25/08

to

Actually, it's not. Your C run-time library is generating random digits

after it runs out of useful information (which is the first 16 or 17

digits). 0.1 in an IEEE 784 double is this:

0.100000000000000088817841970012523233890533447265625

--

Tim Roberts, ti...@probo.com

Providenza & Boekelheide, Inc.

Sep 25, 2008, 5:42:20 AM9/25/08

to

On Sep 24, 6:18 pm, Terry Reedy <tjre...@udel.edu> wrote:

> If there is not now, there could be in the future, and the decimal

> authors are committed to follow the standard wherever it goes.

> Therefore, the safe course, to avoid possible future deprecations due to

> doing too much, is to only do what is mandated.

> If there is not now, there could be in the future, and the decimal

> authors are committed to follow the standard wherever it goes.

> Therefore, the safe course, to avoid possible future deprecations due to

> doing too much, is to only do what is mandated.

Makes sense. It looks as though the standard's pretty stable now

though; I'd be quite surprised to see it evolve to include discussion

of floats. But then again, people thought it was stable just before

all the extra transcendental operations appeared. :-)

> For integral values, this is no problem.

> >>> hash(1) == hash(1.0) == hash(decimal.Decimal(1)) ==

> hash(fractions.Fraction(1)) == 1

> True

Getting integers and Decimals to hash equal was actually

something of a pain, and required changing the way that

the hash of a long was computed. The problem in a nutshell:

what's the hash of Decimal('1e100000000')? The number is

clearly an integer, so its hash should be the same as that

of 10**100000000. But computing 10**100000000, and then

finding its hash, is terribly slow... (Try

hash(Decimal('1e100000000')) in Python 2.5 and see

what happens! It's fixed in Python 2.6.)

As more numeric types get added to Python, this

'equal implies equal hash' requirement becomes more

and more untenable, and difficult to maintain. I also find

it a rather unnatural requirement: numeric equality

is, to me, a weaker equivalence relation than the one

that should be used for identifying keys in dictionaries,

elements of sets, etc. Fraction(1, 2) and 0.5 should, to my

eyes, be considered

different elements of a set. But the only way to 'fix' this

would be to have Python recognise two different types of

equality, and then it wouldn't be Python any more.

The SAGE folks also discovered that they couldn't

maintain the hash requirement.

> Decimals can also be converted to floats (they also have a __float__

> method). But unlike fractions, the conversion must be explicit, using

> float(decimal), instead of implicit, as with ints and fractions.

Maybe: if I *had* to pick a direction, I'd make float + Decimal

produce a Decimal, on the basis that Decimal is arbitrary precision

and that the float->Decimal conversion can be made losslessly.

But then there are a whole host of decisions one has to make

about rounding, significant zeros, ... (And then, as you point

out, Cowlishaw might come out with a new version of the standard

that does include interactions with floats, and makes an entirely

different set of decisions...)

Mark

Sep 25, 2008, 6:05:13 AM9/25/08

to

I get (using Python 2.6):

>>> n, d = 0.1.as_integer_ratio()

>>> from decimal import Decimal, getcontext

>>> getcontext().prec = 100

>>> Decimal(n)/Decimal(d)

Decimal('0.1000000000000000055511151231257827021181583404541015625')

which is a lot closer to Marc's answer. Looks like your float

approximation to 0.1 is 6 ulps out. :-)

Mark

Sep 25, 2008, 6:30:02 AM9/25/08

to

Not according to the decimal FAQ

http://docs.python.org/lib/decimal-faq.html

------------------------------------------------------------

import math

from decimal import *

def floatToDecimal(f):

"Convert a floating point number to a Decimal with no loss of information"

# Transform (exactly) a float to a mantissa (0.5 <= abs(m) < 1.0) and an

# exponent. Double the mantissa until it is an integer. Use the integer

# mantissa and exponent to compute an equivalent Decimal. If this cannot

# be done exactly, then retry with more precision.

mantissa, exponent = math.frexp(f)

while mantissa != int(mantissa):

mantissa *= 2.0

exponent -= 1

mantissa = int(mantissa)

oldcontext = getcontext()

setcontext(Context(traps=[Inexact]))

try:

while True:

try:

return mantissa * Decimal(2) ** exponent

except Inexact:

getcontext().prec += 1

finally:

setcontext(oldcontext)

print "float(0.1) is", floatToDecimal(0.1)

------------------------------------------------------------

Prints this

float(0.1) is 0.1000000000000000055511151231257827021181583404541015625

On my platform

Python 2.5.2 (r252:60911, Aug 8 2008, 09:22:44),

[GCC 4.3.1] on linux2

Linux 2.6.26-1-686

Intel(R) Core(TM)2 CPU T7200

Sep 25, 2008, 7:02:49 AM9/25/08

to

On Sep 23, 1:58 pm, Robert Lehmann <stargam...@gmail.com> wrote:

> I don't see why transitivity should apply to Python objects in general.

> I don't see why transitivity should apply to Python objects in general.

Hmmm. Lack of transitivity does produce some, um, interesting

results when playing with sets and dicts. Here are sets s and

t such that the unions s | t and t | s have different sizes:

>>> from decimal import Decimal

>>> s = set([Decimal(2), 2.0])

>>> t = set([2])

>>> len(s | t)

2

>>> len(t | s)

1

This opens up some wonderful possibilities for hard-to-find bugs...

Mark

Sep 26, 2008, 2:20:08 AM9/26/08

to pytho...@python.org

Mark Dickinson wrote:

> On Sep 24, 6:18 pm, Terry Reedy <tjre...@udel.edu> wrote:

>> If there is not now, there could be in the future, and the decimal

>> authors are committed to follow the standard wherever it goes.

>> Therefore, the safe course, to avoid possible future deprecations due to

>> doing too much, is to only do what is mandated.

>

> Makes sense. It looks as though the standard's pretty stable now

> though; I'd be quite surprised to see it evolve to include discussion

> of floats. But then again, people thought it was stable just before

> all the extra transcendental operations appeared. :-)

> On Sep 24, 6:18 pm, Terry Reedy <tjre...@udel.edu> wrote:

>> If there is not now, there could be in the future, and the decimal

>> authors are committed to follow the standard wherever it goes.

>> Therefore, the safe course, to avoid possible future deprecations due to

>> doing too much, is to only do what is mandated.

>

> Makes sense. It looks as though the standard's pretty stable now

> though; I'd be quite surprised to see it evolve to include discussion

> of floats. But then again, people thought it was stable just before

> all the extra transcendental operations appeared. :-)

What got me were the bizarre new 'logical' operations whose addition

were rather nonsensical from a Python viewpoint (though probably

sensible from an IBM profit business viewpoint). With those added, and

with this thread, I have decided that Decimals best be thought of as a

separate universe, not to be mixed with other numbers unless one has

good reason to and understands the possible anomalies of doing so. For

pure finance apps, I would think that there should be little reason to mix.

tjr

Sep 26, 2008, 11:34:43 PM9/26/08

to

Mark Dickinson <dick...@gmail.com> wrote:

Hmmph, that makes the vote 3 to 1 against me. I need to go re-examine my

"extreme float converter".

Sep 26, 2008, 11:44:28 PM9/26/08

to

Mark Dickinson <dick...@gmail.com> wrote:

>On Sep 25, 8:55 am, Tim Roberts <t...@probo.com> wrote:

>> Marc 'BlackJack' Rintsch <bj_...@gmx.net> wrote:

>> >0.1 actually is

>>

>> >In [98]: '%.50f' % 0.1

>> >Out[98]: '0.10000000000000000555111512312578270211815834045410'

>> >?

>>

>> .... 0.1 in an IEEE 784 double is this:

>>

>> 0.100000000000000088817841970012523233890533447265625

>

>I get (using Python 2.6):

>

>>>> n, d = 0.1.as_integer_ratio()

>>>> from decimal import Decimal, getcontext

>>>> getcontext().prec = 100

>>>> Decimal(n)/Decimal(d)

>Decimal('0.1000000000000000055511151231257827021181583404541015625')

>

>which is a lot closer to Marc's answer. Looks like your float

>approximation to 0.1 is 6 ulps out. :-)

Yes, foolishness on my part. The hex is 3FB99999_9999999A,

so we're looking at 19999_9999999A / 2^56 or

7205759403792794

-------------------

72057594037927936

which is the number that Marc, Nick, and you all describe. Apologies all

around. I actually dropped one 9 the first time around.

Adding one more weird data point, here's what I get trying Marc's original

sample on my Windows box:

C:\tmp>python

Python 2.4.4 (#71, Oct 18 2006, 08:34:43) [MSC v.1310 32 bit (Intel)] on

win32

Type "help", "copyright", "credits" or "license" for more information.

>>> '%.50f' % 0.1

'0.10000000000000001000000000000000000000000000000000'

>>>

I assume this is the Microsoft C run-time library at work.

Sep 29, 2008, 10:42:20 PM9/29/08

to pytho...@python.org

En Thu, 25 Sep 2008 08:02:49 -0300, Mark Dickinson <dick...@gmail.com>

escribió:

> On Sep 23, 1:58 pm, Robert Lehmann <stargam...@gmail.com> wrote:

>> I don't see why transitivity should apply to Python objects in general.

>

> Hmmm. Lack of transitivity does produce some, um, interesting

> results when playing with sets and dicts. Here are sets s and

> t such that the unions s | t and t | s have different sizes:

>

>>>> from decimal import Decimal

>>>> s = set([Decimal(2), 2.0])

>>>> t = set([2])

>>>> len(s | t)

> 2

>>>> len(t | s)

> 1

escribió:

> On Sep 23, 1:58 pm, Robert Lehmann <stargam...@gmail.com> wrote:

>> I don't see why transitivity should apply to Python objects in general.

>

> Hmmm. Lack of transitivity does produce some, um, interesting

> results when playing with sets and dicts. Here are sets s and

> t such that the unions s | t and t | s have different sizes:

>

>>>> from decimal import Decimal

>>>> s = set([Decimal(2), 2.0])

>>>> t = set([2])

>>>> len(s | t)

> 2

>>>> len(t | s)

> 1

Ouch!

> This opens up some wonderful possibilities for hard-to-find bugs...

And I was thinking all this thread was just a theoretical question without

practical consequences...

--

Gabriel Genellina

Sep 30, 2008, 4:21:13 AM9/30/08

to pytho...@python.org

Gabriel Genellina wrote:

> En Thu, 25 Sep 2008 08:02:49 -0300, Mark Dickinson <dick...@gmail.com>

> escribió:

> En Thu, 25 Sep 2008 08:02:49 -0300, Mark Dickinson <dick...@gmail.com>

> escribió:

>> On Sep 23, 1:58 pm, Robert Lehmann <stargam...@gmail.com> wrote:

>>> I don't see why transitivity should apply to Python objects in general.

>>

>> Hmmm. Lack of transitivity does produce some, um, interesting

>> results when playing with sets and dicts. Here are sets s and

>> t such that the unions s | t and t | s have different sizes:

>>

>>>>> from decimal import Decimal

>>>>> s = set([Decimal(2), 2.0])

>>>>> t = set([2])

>>>>> len(s | t)

>> 2

>>>>> len(t | s)

>> 1

>

> Ouch!>>> I don't see why transitivity should apply to Python objects in general.

>>

>> Hmmm. Lack of transitivity does produce some, um, interesting

>> results when playing with sets and dicts. Here are sets s and

>> t such that the unions s | t and t | s have different sizes:

>>

>>>>> from decimal import Decimal

>>>>> s = set([Decimal(2), 2.0])

>>>>> t = set([2])

>>>>> len(s | t)

>> 2

>>>>> len(t | s)

>> 1

>

>

>> This opens up some wonderful possibilities for hard-to-find bugs...

>

> And I was thinking all this thread was just a theoretical question

> without practical consequences...

> without practical consequences...

To explain this anomaly more clearly, here is a recursive definition of

set union.

if b: a|b = a.add(x)|(b-x) where x is arbitrary member of b

else: a|b = a

Since Python only defines set-set and not set-ob, we would have to

subtract {x} to directly implement the above. But b.pop() subtracts an

arbitrary members and returns it so we can add it. So here is a Python

implementation of the definition.

def union(a,b):

a = set(a) # copy to preserve original

b = set(b) # ditto

while b:

a.add(b.pop())

return a

from decimal import Decimal

d1 = Decimal(1)

fd = set((1.0, d1))

i = set((1,))

print(union(fd,i))

print(union(i,fd))

# prints

{1.0, Decimal('1')}

{1}

This is a bug in relation to the manual:

"union(other, ...)

set | other | ...

Return a new set with elements from both sets."

Transitivity is basic to logical deduction:

equations: a == b == c ... == z implies a == z

implications: (a implies b) and (b implies c)implies (a implies c)

The latter covers syllogism and other deduction rules.

The induction part of an induction proof of set union commutivity is a

typical equality chain:

if b:

a | b

= a.add(x)| b-x for x in b # definition for non-empty b

= b-x | a.add(x) # induction hypothesis

= (b-x).add(x) | a.add(x)-x # definition for non-empty a

= b | a.add(x)-x # definitions of - and .add

if x not in a:

= b | a # .add and -

if x in a:

= b | a-x # .add and -

= b.add(x) | a-x # definition of .add for x in b

= b | a # definition for non-empty a

= b | a # in either case, by case analysis

By transitivity of =, a | b = b | a !

So where does this go wrong for our example? This shows the problems.

>>> fd - i

set()

This pretty much says that 2-1=0, or that 2=1. Not good.

The fundamental idea of a set is that it only contains something once.

This definition assumes that equality is defined sanely, with the usual

properties. So, while fd having two members implies d1 != 1.0, the fact

that f1 == 1 and 1.0 == 1 implies that they are really the same thing,

so that d1 == 1.0, a contradiction.

To put this another way: The rule of substitution is that if E, F, and G

are expressions and E == F and E is a subexpression of G and we

substitute F for E in G to get H, then G == H. Again, this rule, which

is a premise of all formal expressional systems I know of, assumes the

normal definition of =. When we apply this,

fd == {f1, 1.0} == {1,1.0} == {1} == i

But Python says

>>> fd == i

False

Conclusion: fd is not a mathematical set.

Yet another anomaly:

>>> f = set((1.0,))

>>> i == f

True

>>> i.add(d1)

>>> f.add(d1)

>>> i == f

False

So much for "adding the same thing to equals yields equals", which is a

special case of "doing the same thing to equals, where the thing done

only depends on the properties that make the things equal, yields equals."

And another

>>> d1 in i

True

>>> 1.0 in i

True

>>> fd <= i

False

Manual: "set <= other

Test whether every element in the set is in other"

I bet Python first tests the sizes because the implementer *assumed*

that every member of a larger set could not be in a smaller set. I

presume the same assumption is used for equality testing.

Or

Manual: "symmetric_difference(other)

set ^ other

Return a new set with elements in either the set or other but not both."

>>> d1 in fd

True

>>> d1 in i

True

>>> d1

Decimal('1')

>>> fd ^ i

{Decimal('1')}

If no one beats me to it, I will probably file a bug report or two, but

I am still thinking about what to say and to suggest.

Terry Jan Reedy

Sep 30, 2008, 7:42:52 AM9/30/08

to

On Sep 30, 9:21 am, Terry Reedy <tjre...@udel.edu> wrote:

> If no one beats me to it, I will probably file a bug report or two, but

> I am still thinking about what to say and to suggest.

> If no one beats me to it, I will probably file a bug report or two, but

> I am still thinking about what to say and to suggest.

I can't see many good options here. Some possibilities:

(0) Do nothing besides documenting the problem

somewhere (perhaps in a manual section entitled

'Infrequently Asked Questions', or

'Uncommon Python Pitfalls'). I guess the rule is

simply that Decimals don't mix well with other

numeric types besides integers: if you put both

floats and Decimals into a set, or compare a

Decimal with a Fraction, you're asking for

trouble. I suppose the obvious place for such

a note would be in the decimal documentation,

since non-users of decimal are unlikely to encounter

these problems.

(1) 'Fix' the Decimal type to do numerical comparisons

with other numeric types correctly, and fix up the

Decimal hash appropriately.

(2) I wonder whether there's a way to make Decimals

and floats incomparable, so that an (in)equality check

between them always raises an exception, and any

attempt to have both Decimals and floats in the same

set (or as keys in the same dict) also gives an error.

(Decimals and integers should still be allowed to

mix happily, of course.) But I can't see how this could

be done without adversely affecting set performance.

Option (1) is certainly technically feasible, but I

don't like it much: it means adding a whole load

of code to the Decimal module that benefits few users

but slows down hash computations for everyone.

And then any new numeric type that wants to fit in

with Python's rules had better worry about hashing

equal to ints, floats, Fractions, complexes, *and*

Decimals...

Option (2) appeals to me, but I can't see how to

implement it.

So I guess that just leaves updating the docs.

Other thoughts?

Mark

Sep 30, 2008, 3:07:08 PM9/30/08

to pytho...@python.org

Mark Dickinson wrote:

> On Sep 30, 9:21 am, Terry Reedy <tjre...@udel.edu> wrote:

>> If no one beats me to it, I will probably file a bug report or two, but

>> I am still thinking about what to say and to suggest.

>

> I can't see many good options here. Some possibilities:

> On Sep 30, 9:21 am, Terry Reedy <tjre...@udel.edu> wrote:

>> If no one beats me to it, I will probably file a bug report or two, but

>> I am still thinking about what to say and to suggest.

>

> I can't see many good options here. Some possibilities:

Thanks for responding. Agreeing on a fix would make it more likely to

happen sooner ;-)

> (0) Do nothing besides documenting the problem

> somewhere (perhaps in a manual section entitled

> 'Infrequently Asked Questions', or

> 'Uncommon Python Pitfalls'). I guess the rule is

> simply that Decimals don't mix well with other

> numeric types besides integers: if you put both

> floats and Decimals into a set, or compare a

> Decimal with a Fraction, you're asking for

> trouble. I suppose the obvious place for such

> a note would be in the decimal documentation,

> since non-users of decimal are unlikely to encounter

> these problems.

Documenting the problem properly would mean changing the set

documentation to change at least the definitions of union (|), issubset

(<=), issuperset (>=), and symmetric_difference (^) from their current

math set based definitions to implementation based definitions that

describe what they actually do instead of what they intend to do. I do

not like this option.

> (1) 'Fix' the Decimal type to do numerical comparisons

> with other numeric types correctly, and fix up the

> Decimal hash appropriately.

(1A) All that is needed for fix equality transitivity corruption and the

consequent set/dictview problems is to correctly compare integral

values. For this, Decimal hash seems fine already. For the int i I

tried, hash(i) == hash(float(i)) == hash(Decimal(i)) ==

hash(Fraction(i)) == i.

It is fine for transitivity that all fractional decimals are unequal to

all fractional floats (and all fractions) since there is no integer (or

fraction) that either is equal to, let alone both.

This is what I would choose unless there is some 'hidden' problem. But

it seem to me this should work: when a float and decimal are both

integral (easy to determine) convert either to an int and use the

current int-whichever comparison.

> (2) I wonder whether there's a way to make Decimals

> and floats incomparable, so that an (in)equality check

> between them always raises an exception, and any

> attempt to have both Decimals and floats in the same

> set (or as keys in the same dict) also gives an error.

> (Decimals and integers should still be allowed to

> mix happily, of course.) But I can't see how this could

> be done without adversely affecting set performance.

I pretty strongly believe that equality checks should always work (at

least in Python as delivered) just as boolean checks should (and do).

> Option (1) is certainly technically feasible, but I

> don't like it much: it means adding a whole load

> of code to the Decimal module that benefits few users

> but slows down hash computations for everyone.

> And then any new numeric type that wants to fit in

> with Python's rules had better worry about hashing

> equal to ints, floats, Fractions, complexes, *and*

> Decimals...

I believe (1A) would be much easier both to implement and for new

numeric types.

>

> Option (2) appeals to me, but I can't see how to

> implement it.

>

> So I guess that just leaves updating the docs.

> Other thoughts?

(3) Further isolate decimals by making decimals also unequal to all

ints. Like (1A), this would easily fix transitivity breakage, but I

would consider the result less desirable.

My ranking: 1A > 3 > 0 > 2. I might put 1 between 1A and 3, but I am

not sure.

> Mark

Terry Jan Reedy

Oct 1, 2008, 5:21:50 AM10/1/08

to

On Sep 30, 8:07 pm, Terry Reedy <tjre...@udel.edu> wrote:

> Documenting the problem properly would mean changing the set

> documentation to change at least the definitions of union (|), issubset

> (<=), issuperset (>=), and symmetric_difference (^) from their current

> math set based definitions to implementation based definitions that

> describe what they actually do instead of what they intend to do. I do

> not like this option.

> Documenting the problem properly would mean changing the set

> documentation to change at least the definitions of union (|), issubset

> (<=), issuperset (>=), and symmetric_difference (^) from their current

> math set based definitions to implementation based definitions that

> describe what they actually do instead of what they intend to do. I do

> not like this option.

I was thinking more of a single-line warning in the set documentation

to the effect that funny things happen in the absence of transitivity

of equality, perhaps pointing the finger at Decimal as the most

obvious troublemaker; the Decimal documentation could elaborate on

this.

That is, rather than documenting exactly what the set operations do,

document what they're supposed to do (just as now) and declare that

behaviour is undefined for sets of elements for which transitivity

fails.

> (1A) All that is needed for fix equality transitivity corruption and the

> consequent set/dictview problems is to correctly compare integral

> values. For this, Decimal hash seems fine already. For the int i I

> tried, hash(i) == hash(float(i)) == hash(Decimal(i)) ==

> hash(Fraction(i)) == i.

Good point. Though I'd be a bit uncomfortable with having

Decimal(1) == 1.0 return True, but Decimal('0.5') == 0.5 return False.

Not sure what the source of my discomfort is; partly I think it's

that I want to be able to explain the comparison rules at the

level of types; having some floats behave one way and some behave

another feels odd. And explaining to confused users exactly

why Decimal behaves this way could be fun.

I think I'd prefer option 1 to option 1a.

> (3) Further isolate decimals by making decimals also unequal to all

> ints. Like (1A), this would easily fix transitivity breakage, but I

> would consider the result less desirable.

I'd oppose this. I think having decimals play nicely with integers

is important, both practically and theoretically. There's probably

also already existing code that depends on comparisons between

integers and Decimals working as expected.

So I guess my ranking is 0 > 1 > 1a > 3, though I could live

with any of 0, 1, or 1a.

It's really the decimal module that's breaking the rules here;

I feel it's the decimal module's responsibility to either

fix or document the resulting problems.

It would also be nice if it were made more obvious somewhere

in the docs that transitivity of equality is important

for correct set and dict behaviour.

Mark

Oct 3, 2008, 2:49:47 AM10/3/08

to

Mark Dickinson wrote:

> Option (2) appeals to me, but I can't see how to

> implement it.

It could be implemented for the special case of floats

and Decimals by keeping flags in each set indicating

whether any elements of those types have been added.

But doing this just for those two types would be

rather hackish, and wouldn't do anything for any

other incomparable types that might come along.

--

Greg

Oct 3, 2008, 2:53:41 AM10/3/08

to

Terry Reedy wrote:

> Documenting the problem properly would mean changing the set

> documentation ... from their current

> math set based definitions to implementation based definitions

It could be documented that the mathematical definitions

hold only if the equality relations between all the elements

involved are transitive, and leave the semantics in other

cases undefined.

Then in the Decimal module it could be warned that the

equality relations between int-float and int-Decimal are

not transitive, perhaps noting that this can cause

problems with sets and dicts.

--

Greg

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu