The code of the method 'flatten` of the class 'sympy.core.add.Add'
contain the following lines.
# order args canonically
# Currently we sort things using hashes, as it is quite fast. A
# better solution is not to sort things at all - but this needs
# some more fixing. NOTE: this is used in primitive, too, so if
# it changes here it should be changed there.
newseq.sort(key=hash) <-- this line.
Thus the arguments of Add sorted by their hashes.
>>> Add(x, y)
x + y
>>> Add(x, y).args
(y, x) <-- _args are sorted by hashes
>>> Add(y, x)
x + y <-- printer system sort arguments independently
This sort of arguments affects the calculation of the hash of the object
Add itself: by default the realization of the Basic.__hash__ is
calcullated as a hash of tuple of hashes of its arguments.
As it is related with the hashes therefore it is influent to compare
functions. So, when I tried to remove this line of sorting, I will get
>>> x + y == y + x
False
It is also related with commutative topics (as a flatten function itself
too).
So my questions: do I understand the situation correctly? That is we
still need to remove this sorting of arguments from a 'flatten' code and
reorganize other related things such as hash calculation or
commutativity in a right way.
For the last, as I understand, we need only to encapsulate this sorting
only where it is needed: at the hash function for commutative parts.
I want create an issue about it, but now I found only one issue which
related with it, but a little:
http://code.google.com/p/sympy/issues/detail?id=1908
And if I have omitted an precise issue about the topic, tell me please
the number of it.
Thanks you.
--
Alexey U.
The main problem with sorting by hash is that it is platform
dependent, which means that on different architectures (in particular,
based on the word size of the architecture), you will get different
orderings. Thus, any algorithm whose result depends on the order of
.args can potentially return different results (but both
mathematically correct) on different machines.
For example:
on 32-bit:
In [2]: (x + y + z).args
Out[2]: (y, z, x)
on 64-bit:
In [56]: (x + y + z).args
Out[56]: (z, y, x)
The solution suggested in Add.flatten, which is to not sort the args,
is one idea. This would be difficult to implement, because everything
that works based on arg ordering being canonicalize would have to be
rewritten. It also doesn't solve the problem of algorithms that
depend on what order the args are processed.
Another idea would be to replace hash() with something that is always
the same, no matter what architecture it is run on, similar to what
the printer uses. The problem is that the function would need to be
very fast, so for example, sort_key, which is what the printer uses,
would not work, because it is too slow.
I had an idea to refactor _hashable_content so that it actually
returns the expression that is hashed (right now, it doesn't include
the name, see Basic.__hash__), and then compare that directly. I
don't know if this would be faster, but at least it should be platform
independent.
On Fri, Nov 18, 2011 at 11:59 AM, Alexey U. Gudchenko <pr...@goodok.ru> wrote:
>
> Greetings to all.
>
> The code of the method 'flatten` of the class 'sympy.core.add.Add'
> contain the following lines.
>
> # order args canonically
> # Currently we sort things using hashes, as it is quite fast. A
> # better solution is not to sort things at all - but this needs
> # some more fixing. NOTE: this is used in primitive, too, so if
> # it changes here it should be changed there.
>
> newseq.sort(key=hash) <-- this line.
>
> Thus the arguments of Add sorted by their hashes.
>
>>>> Add(x, y)
> x + y
>
>>>> Add(x, y).args
> (y, x) <-- _args are sorted by hashes
>
>>>> Add(y, x)
> x + y <-- printer system sort arguments independently
>
> This sort of arguments affects the calculation of the hash of the object
> Add itself: by default the realization of the Basic.__hash__ is
> calcullated as a hash of tuple of hashes of its arguments.
> As it is related with the hashes therefore it is influent to compare
> functions. So, when I tried to remove this line of sorting, I will get
>
>>>> x + y == y + x
> False
I think there are two problems here. One is that the hash is computed
using the tuple of args. The second, is that __eq__ compares the .args
tuple (this second is actually what is happening here). I think both
could be solved by using a frozenset().
>
> It is also related with commutative topics (as a flatten function itself
> too).
>
> So my questions: do I understand the situation correctly? That is we
> still need to remove this sorting of arguments from a 'flatten' code and
> reorganize other related things such as hash calculation or
> commutativity in a right way.
>
> For the last, as I understand, we need only to encapsulate this sorting
> only where it is needed: at the hash function for commutative parts.
>
> I want create an issue about it, but now I found only one issue which
> related with it, but a little:
> http://code.google.com/p/sympy/issues/detail?id=1908
I don't think there's an issue for it. Feel free to open one.
>
> And if I have omitted an precise issue about the topic, tell me please
> the number of it.
>
> Thanks you.
>
> --
> Alexey U.
>
Aaron Meurer