Google Groups

Re: [sympy] Re: Changing the default sorting for Add and Mul


Aaron Meurer May 29, 2012 3:33 PM
Posted in group: sympy
It's a rule that's not enforced.  Actually, I think what you're doing
is OK.  As long as the hash (and equality comparison) of the object
does not depend on the cached state, it shouldn't be an issue.

The main thing is that if you change the hash of an object during its
lifetime, or change what is used to compute the hash, bad things will
happen.  For example:

class Bad(object):
    def __init__(self, arg):
        self.arg = arg

    def __hash__(self):
        return hash(self.arg)

In [26]: Bad(1)
Out[26]: <__main__.Bad object at 0x106ca7fd0>

In [27]: hash(Bad(1))
Out[27]: 1

In [28]: a = Bad(1)

In [29]: b = {a:1}

In [30]: a.arg = 2

In [31]: hash(a)
Out[31]: 2

In [32]: b[a]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-32-9f1bfc533672> in <module>()
----> 1 b[a]

KeyError: <__main__.Bad object at 0x106ca7f50>

Here, we implicitly changed the hash of a by mutating the argument of
a that is used to compute the hash.  Note that Python doesn't prevent
me from doing that.  I could make it if I want, by making __setattr__
raise AttributeError, but even then I could forcibly change it by
modifying the object's __dict__.  This is what is meant when we say
that Python is a "consenting adults" language.

Bad things also happen if we change the object, but keep the hash the same:

class Bad2(object):
    def __init__(self, arg):
        self.arg = arg
        self.hash_cache = None
    def __hash__(self):
        if self.hash_cache is None:
            self.hash_cache = hash(self.arg)
        return self.hash_cache
    def __eq__(self, other):
        if not isinstance(other, Bad2):
            return False
        return self.arg == other.arg

(this is the pattern used by Basic, by the way)

Here we see that finding the key in a dictionary still works:

In [47]: a = Bad2(1)

In [48]: b = Bad2(2)

In [50]: d = {b:2}

In [51]: b.arg = 1

In [52]: hash(b)
Out[52]: 2

In [53]: d[b]
Out[53]: 2

but it no longer keeps the invariant that sets (and dicts) keep unique
objects (keys):

In [68]: a = Bad2(1)

In [69]: b = Bad2(2)

In [70]: hash(a)
Out[70]: 1

In [71]: hash(b)
Out[71]: 2

In [72]: b.arg = 1

In [73]: a == b
Out[73]: True

In [74]: hash(a) == hash(b)
Out[74]: False

In [75]: set([a, b])
Out[75]: set([<__main__.Bad2 object at 0x10692b3d0>, <__main__.Bad2
object at 0x10692b590>])

and this can lead to very subtle issues.  For example, look what
happens when I don't compute the hash before changing the object:

In [56]: a = Bad2(1)

In [59]: b = Bad2(2)

In [60]: b.arg = 1

In [61]: a == b
Out[61]: True

In [62]: hash(a) == hash(b)
Out[62]: True

In [63]: set([a, b])
Out[63]: set([<__main__.Bad2 object at 0x10692b910>])

This time, the set contains one element!

So, yes, Python won't enforce it, but if you don't follow the rule
that hash(object) remains the same for the lifetime of object and that
a == b implies hash(a) == hash(b), then you will run into very subtle
issues, because very basic Python operations expect these invariants.

Aaron Meurer

On Tue, May 29, 2012 at 4:11 PM, krastano...@gmail.com
<krastano...@gmail.com> wrote:
> On 30 May 2012 00:04, Aaron Meurer <asme...@gmail.com> wrote:
>> On Tue, May 29, 2012 at 11:57 AM, Joachim Durchholz <j...@durchholz.org> wrote:
>>> Am 29.05.2012 07:10, schrieb Aaron Meurer:
>>>
>>>> _hashable_content generally contains
>>>> other Basic objects. So what I really want is the "fully denested"
>>>> _hashable_content, where each Basic object is recursively replaced
>>>> with its _hashable_content.  I've no idea how expensive that would be
>>>> to compute.
>>>
>>>
>>> The behavior will be quadratic with number of levels of sets. This could
>>> start to impact us on deeply nested expressions - not with anything a human
>>> would enter, but if somebody wants to use SymPy from a script, this might be
>>> different. (I'm talking of thousands of nesting levels.)
>>> One way to set that might be to simplify x+0+0+...+0 (1000 terms, then retry
>>> with 100,000 ;-) ).
>>>
>>> The downside of caching is that each mutator must invalidate the cached
>>> hash. (Unless mutators are disallowed once an object is entered into a dict.
>>> Or the mutable objects are really copy-on-write ones. I don't know whether
>>> any of these paths is available already, or feasible if unavailable - we're
>>> deep into Systems Programming Land here.)
>>
>> In Python, only immutable objects are hashable, and in SymPy, all
>> Basic subclasses are immutable, so this isn't an issue.
>>
> One can easily cheat this rule:
>
> class A(object):
>    def __init__(self, immut1, immut2, mut):
>       ...
>    def __hash__(self):
>        return hash((self.immut1, self.immut2))
>
> For instance this can be a Manifold, that must be "immutable" as
> mathematical object, however still be able to learn about new
> coordinate systems defined on it.
>
> I am mentioning it because I am not really sure what does this mean:
>> In Python, only immutable objects are hashable
> I suppose I can get some clarification that way.
>
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To post to this group, send email to sy...@googlegroups.com.
> To unsubscribe from this group, send email to sympy+un...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sympy?hl=en.
>