Test failures in master: Hash randomization

47 views
Skip to first unread message

Aaron Meurer

unread,
Jun 24, 2012, 1:32:10 AM6/24/12
to sy...@googlegroups.com
Hi everyone.

I just merged pull request https://github.com/sympy/sympy/pull/1379,
which enables hash randomization by default in the test runner. What
this means is that if you are running Python 2.6.8, 2.7.3, 3.2.3, or
3.3, then you are going to start seeing a lot of test failures. Hash
randomization randomizes the hash values of strings, which results in
random hash values of all SymPy objects. Because we still rely on
hash values for ordering in a lot of places, this means that there are
a lot of failures when it is enabled.

In order to facilitate sane testing with this, hash randomization is
seeded. However, the only way to seed it is to set the environment
variable PYTHONHASHSEED before starting Python. Therefore, when the
tests are run in one of the above Python versions, they are not run in
a separate subprocess. If this is a problem for anyone, or if you
desire to disable hash randomization, you can run the tests with
./bin/test --no-subprocess or ./bin/doctest --no-subprocess. Note
that in all Python versions after 3.3 it will be enabled by default.

So here are some important things to remember:

- The test and doctest headers now looks like this:

============================= test process starts ==============================
executable: /usr/bin/python3 (3.2.3-candidate-2)
architecture: 64-bit
cache: yes
ground types: python
random seed: 45250937
hash randomization: on (PYTHONHASHSEED=61944319)

Notice the new item at the bottom, "hash randomization". The seed is
given there too. If hash randomization is not supported (e.g., in
Python 2.5), it will say "hash randomization: off".

- To run the tests with a specific seed, set the environment variable, like

$ PYTHONHASHSEED=61944319 python bin/test

- Note that the tests and doctests are run in separate subprocesses
with separate seeds with setup.py test.

- To reproduce failures, you need both the same seed *and* the same
architecture (32-bit or 64-bit), because both are used to compute hash
values.

- Finally, there is a pretty bad bug with some seeds where test_expand
hangs. We should put priority on this, but until then, you may want
to run the tests with --timeout=60 to timeout any test that runs for
longer than a minute.

Let's keep a log of all failures and seeds that can reproduce them at
http://code.google.com/p/sympy/issues/detail?id=3272 (or if it would
be easier, we could start a wiki page for them). Any help fixing any
of these problems would be greatly appreciated. We cannot release
until they are all fixed (and conversely, once they are all fixed, we
should be able to get a release candidate out almost right away).
Hopefully we won't have to resort to any XFAILing.

And lastly, if anyone has any thoughts on how we could canonically
order the arguments of Add and Mul independent of hash values, but is
still just as fast as hash values, I would love to hear it. If we
could do that, it would make fixing these errors a lot easier (on the
other hand, maybe we would be better off design-wise if we made
everything .arg ordering agnostic).

Aaron Meurer

Sergiu Ivanov

unread,
Jun 24, 2012, 3:41:15 AM6/24/12
to sy...@googlegroups.com
Hello,

On Sun, Jun 24, 2012 at 8:32 AM, Aaron Meurer <asme...@gmail.com> wrote:
>
> And lastly, if anyone has any thoughts on how we could canonically
> order the arguments of Add and Mul independent of hash values, but is
> still just as fast as hash values, I would love to hear it.  If we
> could do that, it would make fixing these errors a lot easier (on the
> other hand, maybe we would be better off design-wise if we made
> everything .arg ordering agnostic).

From my recent experience, using sympy.utilities.misc.default_sort_key
is a nice way to canonically order things. As far as I can see in the
code, it doesn't seem to rely on hashes for sorting; instead, it
provides sort keys which are tuples often (not sure how often)
including native numbers and strings. I *think* it's not going to be
just as fast as hash-based ordering, but, I guess, it's going to be
one of the fastest approaches, because, eventually, simple native
types will be compared.

Sergiu

Ondřej Čertík

unread,
Jun 24, 2012, 3:11:23 PM6/24/12
to sy...@googlegroups.com
P.S. The Travis CI says "hash randomization: off", so that means that the Python
executable there doesn't support "-R" yet, right? They have
version 2.7.2+.

>
> - To run the tests with a specific seed, set the environment variable, like
>
> $ PYTHONHASHSEED=61944319 python bin/test
>
> - Note that the tests and doctests are run in separate subprocesses
> with separate seeds with setup.py test.
>
> - To reproduce failures, you need both the same seed *and* the same
> architecture (32-bit or 64-bit), because both are used to compute hash
> values.
>
> - Finally, there is a pretty bad bug with some seeds where test_expand
> hangs.  We should put priority on this, but until then, you may want
> to run the tests with --timeout=60 to timeout any test that runs for
> longer than a minute.
>
> Let's keep a log of all failures and seeds that can reproduce them at
> http://code.google.com/p/sympy/issues/detail?id=3272 (or if it would
> be easier, we could start a wiki page for them).  Any help fixing any
> of these problems would be greatly appreciated.  We cannot release
> until they are all fixed (and conversely, once they are all fixed, we
> should be able to get a release candidate out almost right away).
> Hopefully we won't have to resort to any XFAILing.
>
> And lastly, if anyone has any thoughts on how we could canonically
> order the arguments of Add and Mul independent of hash values, but is
> still just as fast as hash values, I would love to hear it.  If we
> could do that, it would make fixing these errors a lot easier (on the
> other hand, maybe we would be better off design-wise if we made
> everything .arg ordering agnostic).

I think that the fastest and simplest is to simply use native Python data types
like dictionaries, and those will depend on the hash. The final results should
not depend on the hash, for example in printing we are simply sorting it.

One problem is the usage of .args to access the individual terms in Add and Mul.
If those depend on the hash, then I guess one should not really access
it directly,
except for some quick prototyping in ipython.

I just sent an email to the list with the subject "Testing:
differences between platforms", not realizing
that you already took care of this.

Ondrej

Aaron Meurer

unread,
Jun 24, 2012, 9:45:40 PM6/24/12
to sy...@googlegroups.com
Yes. It looks like all of their versions are one below the latest,
which is what is required for hash randomization. I opened
https://github.com/travis-ci/travis-ci/issues/614 for it.
I'm not exactly clear on what you are suggesting here.

At any rate, I'm now beginning to think that the better solution is to
just not worry about arg ordering, and instead make sure that
everything works no matter what the ordering. We might even at some
point remove ordering altogether (see for example
https://github.com/sympy/sympy/pull/722), which should make things a
little faster in the core if we can pull it off.

>
> One problem is the usage of .args to access the individual terms in Add and Mul.
> If those depend on the hash, then I guess one should not really access
> it directly,
> except for some quick prototyping in ipython.

I think the problem is also when you iterate over .args, and the
result depends on the order that you get things. For example, with
cse(), we are getting different results. All of them are common
subexpression eliminations of the expression, but different
expressions are taken out in different orders and things like that.
So we either need to restrict the definition of what cse() returns so
that it is unique, or else expand our tests for it so that any valid
result passes.

There's also the issue of functions that return lists when the result
is inherently unordered, like solve(). So solve(x**2 - 2, x) could
return either [sqrt(2), -sqrt(2)] or [-sqrt(2), sqrt(2)]. The best
solution here is probably to change solve to return a set, or some
similarly unordered structure, but it would break a lot of things too
(like anything that indexes the result of solve()).

And of course, the easy way out is always to arbitrarily but
canonically sort the args, like with default_sort_key. But if this is
really unnecessary, as I think it is with all cases except for
printing, the better way would be to do as I said above.

Aaron Meurer

>
> I just sent an email to the list with the subject "Testing:
> differences between platforms", not realizing
> that you already took care of this.
>
> Ondrej
>
> --
> You received this message because you are subscribed to the Google Groups "sympy" group.
> To post to this group, send email to sy...@googlegroups.com.
> To unsubscribe from this group, send email to sympy+un...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sympy?hl=en.
>

Sergiu Ivanov

unread,
Jun 29, 2012, 4:06:13 PM6/29/12
to sy...@googlegroups.com
So, what is the official position as far as sorting the arguments of
Mul and Arg is concerned? I have seen you say that things should be
made to work independently of the ordering; is that the current
strategy?

Sergiu

Ondřej Čertík

unread,
Jun 29, 2012, 5:24:09 PM6/29/12
to sy...@googlegroups.com
On Fri, Jun 29, 2012 at 1:06 PM, Sergiu Ivanov
I think that there is a huge speed penalty for sorting the actual arguments.
So I think that the only solution really is to not sort the argument,
but make all results hash independent, by sorting the printing (which we do)
and fixing other bugs.

Ondrej

Sergiu Ivanov

unread,
Jun 29, 2012, 5:37:44 PM6/29/12
to sy...@googlegroups.com
OK, I see. Thank you for the explanation!

Sergiu

Joachim Durchholz

unread,
Jun 30, 2012, 3:14:21 AM6/30/12
to sy...@googlegroups.com
Am 29.06.2012 23:24, schrieb Ondřej Čertík:
> I think that there is a huge speed penalty for sorting the actual arguments.

Why do you think so?

Sergiu Ivanov

unread,
Jun 30, 2012, 7:13:16 AM6/30/12
to sy...@googlegroups.com
Since I did take a bit of a look at doctest failures recently, I
thought I would make my conclusions public, although a lot of this is
going to be self-evident.

Apparently, the reasons for doctest failures now belong to two groups:

1. instances of Basic are output, and some .args get output in
arbitrary order;

2. non-Basic thinks are output, and get output in arbitrary order.

I suggest solving (1) by altering the corresponding printing functions
to sort their arguments, just as it has recently been done for
FiniteSet. This is an elementary change, it shouldn't impact the
performance of tests, since doctests don't generally get sufficiently
large for that. Nevertheless, I believe it is going to sort out a lot
of failures.

Each failure of type (2) would require a more specific approach.
However, I'll try once again to push through the idea of defining
sympy.utilities.sympysort in the following way:

sympysort(x) = sorted(x, key=default_sort_key)

This will make the doctests which output non-Basic objects uglier, but
not less clearer with sympysort. Further, there are not that many
doctest failures of type (2), which makes me believe that resorting to
sympysort in a couple cases wouldn't be that fatal.

I'm looking forward to suggestions. Whenever something more or less
final is decided, I'll implement it.

Sergiu

P.S. As noted in [0], altering the actual testing code is impractical
in solving such kind of failures.

[0] https://github.com/sympy/sympy/pull/1393#issuecomment-6684465

Joachim Durchholz

unread,
Jun 30, 2012, 7:36:19 AM6/30/12
to sy...@googlegroups.com
Am 30.06.2012 13:13, schrieb Sergiu Ivanov:
> However, I'll try once again to push through the idea of defining
> sympy.utilities.sympysort in the following way:
>
> sympysort(x) = sorted(x, key=default_sort_key)
>
> This will make the doctests which output non-Basic objects uglier, but
> not less clearer with sympysort.

Doctests are code examples made executable to avoid the embarrassment of
having non-working example code.
Adding code to make the doctest pass will make the code example less
useful in its primary purpose.

Alternatively, we could push sympysort into standard equality testing.

This could be done in two ways:
1) Change SymPy to use its own equality testing function(s). I believe
that this could benefit other problems - I see recurring questions of
the "a and b should be equal but aren't" on the mailing list, and having
an array of different equality testers that work at different levels of
abstractions might help address this without the feel of ugly hacks that
the answers often carry.

2) If Python supports that, redefine Python's equality for SymPy dicts
to be agnostic of key ordering. That would push sympysort into the
equality function. (Making this efficient might be an interesting task.)

Sergiu Ivanov

unread,
Jun 30, 2012, 7:49:52 AM6/30/12
to sy...@googlegroups.com
On Sat, Jun 30, 2012 at 2:36 PM, Joachim Durchholz <j...@durchholz.org> wrote:
> Am 30.06.2012 13:13, schrieb Sergiu Ivanov:
>
>> However, I'll try once again to push through the idea of defining
>> sympy.utilities.sympysort in the following way:
>>
>> sympysort(x) = sorted(x, key=default_sort_key)
>>
>> This will make the doctests which output non-Basic objects uglier, but
>> not less clearer with sympysort.
>
>
> Doctests are code examples made executable to avoid the embarrassment of
> having non-working example code.
> Adding code to make the doctest pass will make the code example less useful
> in its primary purpose.

Good point, will keep in mind, thank you.

> Alternatively, we could push sympysort into standard equality testing.

Note that I suggested applying sympysort to _non-Basic_ things only.
Basic objects (i.e., inherently SymPyish objects) will be handled by
sorting their .args at printing. Therefore, the change I propose
would be totally transparent in "most" cases.

> This could be done in two ways:
> 1) Change SymPy to use its own equality testing function(s). I believe that
> this could benefit other problems - I see recurring questions of the "a and
> b should be equal but aren't" on the mailing list, and having an array of
> different equality testers that work at different levels of abstractions
> might help address this without the feel of ugly hacks that the answers
> often carry.

This is beyond the scope I had in mind when writing my previous
E-mail, which means that I don't claim to be of any authority on this
point :-)

Do bear in mind, however, that playing around with equality testing
functions will _not_ address the doctest failures because the problem
most often arises from _string_ comparison in doctesters.

> 2) If Python supports that, redefine Python's equality for SymPy dicts to be
> agnostic of key ordering. That would push sympysort into the equality
> function. (Making this efficient might be an interesting task.)

SymPy dicts are Basic objects, so you I believe you can do whatever
you want with them.

Nevertheless, my impression from the test (not doctest) failures we
see is that they have nothing to do with comparison of SymPy dicts.

Again, do remember that I have only considered doctest failures so
far, so what I say about unrelated things is pure speculation :-)

Sergiu

Chris Smith

unread,
Jun 30, 2012, 10:44:28 AM6/30/12
to sy...@googlegroups.com
> Each failure of type (2) would require a more specific approach.
> However, I'll try once again to push through the idea of defining
> sympy.utilities.sympysort in the following way:
>
> sympysort(x) = sorted(x, key=default_sort_key)

Using this in the codebase still requires that something be imported.
An advantage of
importing only default_sort_key (which could be aliased as sympy_key) is that
you can sort in place and not create a new list:

```
foo.sort(key=sympy_key) # assuming sympy_key = default_sort_key
```

BTW, in 1375 I include fixes (hopefully) for circuitutils, expr,
guide, recurr, and iterables. Someone who understands the quantum gate
workings should review the sorting that is implemented in
circuitutils.

/c

Aaron Meurer

unread,
Jun 30, 2012, 3:29:48 PM6/30/12
to sy...@googlegroups.com
As I noted on your pull request, I think a better way is to just wrap
the output in an unordered type (set or dict), and let either the
pretty printer or the doctest printer sort the output (both will do
so). That way, we don't have to add any extra boilerplate to the
doctests, which, as was noted, does make them harder to read as
examples.

For the regular tests, from what I've seen so far, the problem is
either sorted data types coming out in a different order (like lists
from solve), or results coming out completely different (but still
correct). The former case is easy to fix: just compare sets. For the
latter, I'm not sure. A good example of this is cse(). You can get
pretty different results. I looked at the code, and it's not only
dependent on the order of .args, but also on an iteration through a
dictionary (which is not guaranteed to be the same even without hash
randomization).

If someone wants to look at the cse() issue specifically, take a look
at the failing test_issue_3022() from sympy/core/tests/test_expand.py.

Aaron Meurer

Chris Smith

unread,
Jun 30, 2012, 7:41:09 PM6/30/12
to sy...@googlegroups.com
After pull request 1375 gets processed I can take a look at the cse
(unless someone else wants to handle it).

/c

Aaron Meurer

unread,
Jun 30, 2012, 8:33:42 PM6/30/12
to sy...@googlegroups.com
Thanks. The differences can be strange. For example, sometimes you
will get (x8, I*x5), and sometimes you will get (x2, -I), (x8, -x5)
(in this case, x2 is also used in some, but not all, other places
where I appears).

So I think the algorithm needs to be modified to give the best
substitution, if that's possible. Other than that, sometimes
independent substitutions can come out in different orders, but that
should be easy to fix by default_sort_key sorting.

Aaron Meurer

Ondřej Čertík

unread,
Jul 1, 2012, 3:02:32 AM7/1/12
to sy...@googlegroups.com
On Sat, Jun 30, 2012 at 12:29 PM, Aaron Meurer <asme...@gmail.com> wrote:
> As I noted on your pull request, I think a better way is to just wrap
> the output in an unordered type (set or dict), and let either the
> pretty printer or the doctest printer sort the output (both will do
> so). That way, we don't have to add any extra boilerplate to the
> doctests, which, as was noted, does make them harder to read as
> examples.
>
> For the regular tests, from what I've seen so far, the problem is
> either sorted data types coming out in a different order (like lists
> from solve), or results coming out completely different (but still
> correct). The former case is easy to fix: just compare sets. For the
> latter, I'm not sure. A good example of this is cse(). You can get
> pretty different results. I looked at the code, and it's not only
> dependent on the order of .args, but also on an iteration through a
> dictionary (which is not guaranteed to be the same even without hash
> randomization).
>
> If someone wants to look at the cse() issue specifically, take a look
> at the failing test_issue_3022() from sympy/core/tests/test_expand.py.

Why not to rewrite the tests for cse() to become functional -- that is,
we would write a function that takes the cse() result and it does the
substitution and verifies that
things work (are equal to the original). That should be independent of hash().

Ondrej

Ondřej Čertík

unread,
Jul 1, 2012, 3:07:49 AM7/1/12
to sy...@googlegroups.com
Because I tried it. We tried a lot of things when trying to make the core fast.
The fastest seems to be to reuse internal Python data structures
(dict) natively,
that is, without sorting, as much as we can. Any artificial (hash
independent) sorting makes things
slow.

If you want to play with such things, use this simple code:

https://github.com/certik/sympyx

Maybe you can figure out some way which is hash() independent and
still fast. At least I didn't figure it out.

Ondrej

Joachim Durchholz

unread,
Jul 1, 2012, 4:14:46 AM7/1/12
to sy...@googlegroups.com
Am 01.07.2012 09:07, schrieb Ondřej Čertík:
> Maybe you can figure out some way which is hash() independent and
> still fast. At least I didn't figure it out.

I'd try something like this:

a) Yes, use native dicts as much as possible. Trying to improve over a
core data structure that's getting unlimited love from the language team
doesn't sound like a winning idea to me :-)
b) Keep the sorted list of keys in a separate member variable.
c) Create the sorted key list lazily (i.e. only when it's actually needed).
d) Destroy the sorted list on every operation that changes the set of keys.

This is based on the assumption that operations that need the sorted
list are far, far rarer than operations that modify the set of keys.
(Without that, we're in serious trouble.)

Has this approach been tried already?

Ondřej Čertík

unread,
Jul 1, 2012, 2:13:17 PM7/1/12
to sy...@googlegroups.com
On Sun, Jul 1, 2012 at 1:14 AM, Joachim Durchholz <j...@durchholz.org> wrote:
> Am 01.07.2012 09:07, schrieb Ondřej Čertík:
>
>> Maybe you can figure out some way which is hash() independent and
>> still fast. At least I didn't figure it out.
>
>
> I'd try something like this:
>
> a) Yes, use native dicts as much as possible. Trying to improve over a core
> data structure that's getting unlimited love from the language team doesn't
> sound like a winning idea to me :-)

Yes, this is the main point in Python.

> b) Keep the sorted list of keys in a separate member variable.
> c) Create the sorted key list lazily (i.e. only when it's actually needed).

I am afraid it is needed pretty much all the time in the core. It would
have to be tried.

> d) Destroy the sorted list on every operation that changes the set of keys.

All SymPy types are immutable, so once you create them, the args will
never change.
So no need to free anything.

>
> This is based on the assumption that operations that need the sorted list
> are far, far rarer than operations that modify the set of keys. (Without
> that, we're in serious trouble.)

Yes, I think that's the case that you need to access the .args very often.
So the point is not to loose time sorting them.

This is at least my experience, but if you have time, please do try it out
in the sympyx code and compare the speed.

Ondrej

Chris Smith

unread,
Jul 1, 2012, 3:56:36 PM7/1/12
to sy...@googlegroups.com
>> a) Yes, use native dicts as much as possible. Trying to improve over a core
>> data structure that's getting unlimited love from the language team doesn't
>> sound like a winning idea to me :-)
>
> Yes, this is the main point in Python.
>
>> b) Keep the sorted list of keys in a separate member variable.

Are you aware that there is now an ordered dictionary in python? It
was backported to 2.7.3, I believe.

Sergiu Ivanov

unread,
Jul 1, 2012, 4:04:05 PM7/1/12
to sy...@googlegroups.com
This looks fine, but do you some data as to the speed? [0] says that
it keeps the items in the order in which they were inserted. That
smells of sequential storage and thus doesn't sound terribly cool
performance-wise, when random-access is needed.

Nevertheless, it's good to know that Python has such a feature, thank
you.

Sergiu

[0] http://docs.python.org/dev/whatsnew/2.7.html#pep-0372

Chris Smith

unread,
Jul 1, 2012, 4:36:11 PM7/1/12
to sy...@googlegroups.com
>> Are you aware that there is now an ordered dictionary in python? It
>> was backported to 2.7.3, I believe.
>
> This looks fine, but do you some data as to the speed? [0] says that
> it keeps the items in the order in which they were inserted. That
> smells of sequential storage and thus doesn't sound terribly cool
> performance-wise, when random-access is needed.
>

Some details at http://tinyurl.com/8y4h9ad

krastano...@gmail.com

unread,
Jul 1, 2012, 4:51:16 PM7/1/12
to sy...@googlegroups.com
> Some details at http://tinyurl.com/8y4h9ad
On the same topic, here is a future optional dependency for sympy:
http://anthon.home.xs4all.nl/Python/ordereddict/
An ordered dict module in C. (sorry, don't derail the discussion over
this, I am just leaving it for future reference)

Joachim Durchholz

unread,
Jul 1, 2012, 5:38:58 PM7/1/12
to sy...@googlegroups.com
To use that, we'd have to deprecate support for Python 2.5 and 2.6 first.

Jus' sayin', I don't have a fixed idea about whether we should actually
do this.

Sergiu Ivanov

unread,
Jul 1, 2012, 5:42:36 PM7/1/12
to sy...@googlegroups.com
On Mon, Jul 2, 2012 at 12:38 AM, Joachim Durchholz <j...@durchholz.org> wrote:
> Am 01.07.2012 21:56, schrieb Chris Smith:
>
>>>> a) Yes, use native dicts as much as possible. Trying to improve over a
>>>> core
>>>> data structure that's getting unlimited love from the language team
>>>> doesn't
>>>> sound like a winning idea to me :-)
>>>
>>>
>>> Yes, this is the main point in Python.
>>>
>>>> b) Keep the sorted list of keys in a separate member variable.
>>
>>
>> Are you aware that there is now an ordered dictionary in python? It
>> was backported to 2.7.3, I believe.
>
>
> To use that, we'd have to deprecate support for Python 2.5 and 2.6 first.

Take a look at two latest E-mails in this thread :-) There are
implementations of ordered dicts for younger Pythons than 2.7, which
are compatible with OrderedDict.

Sergiu

Joachim Durchholz

unread,
Jul 1, 2012, 5:46:22 PM7/1/12
to sy...@googlegroups.com
Am 01.07.2012 20:13, schrieb Ondřej Čertík:
> This is at least my experience, but if you have time, please do try it out
> in the sympyx code and compare the speed.

I'm a bit short on free time right now, so I may not be able to do
anything in the next few days.
That said, I'll try to find a few free minutes.

Is there some code that exercizes expression creation? That would be
useful to benchmark actual impact, and I don't know enough about actual
use cases to whip something up by myself.

Joachim Durchholz

unread,
Jul 2, 2012, 1:25:22 AM7/2/12
to sy...@googlegroups.com
Am 01.07.2012 20:13, schrieb Ondřej Čertík:
> On Sun, Jul 1, 2012 at 1:14 AM, Joachim Durchholz <j...@durchholz.org> wrote:
>> b) Keep the sorted list of keys in a separate member variable.
>> c) Create the sorted key list lazily (i.e. only when it's actually needed).
>
> I am afraid it is needed pretty much all the time in the core. It would
> have to be tried.

If that's the case, it might be even better to simply create the key
list after the last key has been entered.
Is the object creation workflow documented somewhere? That would be
helpful in picking the right place to add this initialization.

Chris Smith

unread,
Jul 2, 2012, 3:09:04 PM7/2/12
to sy...@googlegroups.com
Anyone interested in randomization corrections...#1375 is ready to go.
And the fix for solve's issues is to either 1) for the tests, use
set=True or 2) make solve return sets and change the usages which
don't want this, changing solution[0] with solution.pop() or
`solution=solve(...)` with `solution=list(solve())`. The latter would
be easier as a grep job.

Aaron Meurer

unread,
Jul 2, 2012, 3:38:45 PM7/2/12
to sy...@googlegroups.com
I vote for the first option. If we are going to break compatibility
for solve(), I'd rather just do it once, when we have a real solution
that supports things like infinite solutions.

Aaron Meurer

Ondřej Čertík

unread,
Jul 2, 2012, 6:14:09 PM7/2/12
to sy...@googlegroups.com
On Sun, Jul 1, 2012 at 2:46 PM, Joachim Durchholz <j...@durchholz.org> wrote:
> Am 01.07.2012 20:13, schrieb Ondřej Čertík:
>
>> This is at least my experience, but if you have time, please do try it out
>> in the sympyx code and compare the speed.
>
>
> I'm a bit short on free time right now, so I may not be able to do anything
> in the next few days.
> That said, I'll try to find a few free minutes.
>
> Is there some code that exercizes expression creation? That would be useful

Yep, it's in the sympyx repo:

https://github.com/certik/sympyx/blob/master/benchmarks.py

> to benchmark actual impact, and I don't know enough about actual use cases
> to whip something up by myself.

Just run the benchmarks.py. We put there some basic things. Feel free to extend
it if you think something important is missing.

Ondrej

Joachim Durchholz

unread,
Jul 1, 2012, 5:42:44 PM7/1/12
to sy...@googlegroups.com
They're discussing performance differences from a theoretical
perspective there.

From hands-on practice in the Java world, I can report that the
difference between hash tables and sorted trees is usually negligible.
(I.e. I never found any real difference, but I didn't do any systematic
tests.)

So... talk is cheap, show the benchmark numbers :-)

Sergiu Ivanov

unread,
Jul 1, 2012, 4:48:39 PM7/1/12
to sy...@googlegroups.com
Awesome, thank you!

This looks a lot like the implementation that has recently been
discussed in this thread: a regular Python dict (i.e., hash table)
accompanied by a list to know the order of the items.

By the way, for versions of Python younger than 2.7, there is [0].

There is a subtle point about ordered dicts vs. dicts, however. There
are algorithms which do depend on order, and those which don't. It
often happens so that at implementation a dependency on the order
slips by. I think this should be pointed out by the corresponding
test failures. On the other hand, for algorithms which inherently
need order, ordered dicts should be fine.

Sergiu

[0] http://pypi.python.org/pypi/ordereddict

Joachim Durchholz

unread,
Jul 3, 2012, 1:16:37 PM7/3/12
to sy...@googlegroups.com
Am 01.07.2012 22:48, schrieb Sergiu Ivanov:
> [0] http://pypi.python.org/pypi/ordereddict

We want key order.
OrderedDict remembers insertion order, so it's close but doesn't match.

Sergiu Ivanov

unread,
Jul 3, 2012, 2:34:44 PM7/3/12
to sy...@googlegroups.com
I was under the impression that any order would do, as long as it does
not depend on external factors and is thus the same for the same input
data.

Sergiu

Vinzent Steinberg

unread,
Jul 6, 2012, 2:22:59 PM7/6/12
to sy...@googlegroups.com
There is a pure Python implementation for sorted dicts. [1] (Unfortunately it uses GPLv3.)

Vinzent


Joachim Durchholz

unread,
Jul 6, 2012, 4:09:17 PM7/6/12
to sy...@googlegroups.com
That depends a lot on how the affected tests are set up.
My impression was that the test wants to see a certain result, and the
random-influenced SymPy code gives the right result but in a different
order.
So while the order as such is irrelevant, it needs to come out the same
for value built on different execution paths (here: SymPy algorithm vs.
constants built by the test code).
Reply all
Reply to author
Forward
0 new messages