Interest in adding timeout argument to potentially long running calls (potential implementation included)?

93 views
Skip to first unread message

James Crist

unread,
Oct 23, 2014, 2:23:07 PM10/23/14
to sy...@googlegroups.com
For the large expressions that we see in `mechanics`, calls to `simplify` can take an extremely long time. However, for simple expressions, simplification is desirable. Currently we don't simplify by default inside any of our library code, as it's impossible to tell whether the expression can be reasonably simplified or not. If a `timeout` argument was added to such functions though, one could simply run

>>> simplify(expr, timeout=some_reasonable_max_time)

and get the best of both worlds.

However, this isn't the easiest thing to do in Python. The best *composable* option is to use signal.alarm, which is only available on *NIX systems. This can also cause problems with threaded applications. Checks for windows or not running in the main thread could be added to handle this though, but would limit it's use.

---

A second option would be to implement a "pseudo-timeout". This only works for functions that have many calls, but each call is guaranteed to complete in a reasonable amount of time (recursive, simple rules, e.g. `fu`). The timeout won't be exact, but should limit excessively long recursive functions to approximately the timeout. I wrote up a quick implementation of this here. It requires some function boilerplate for each recursive call that *can't* be replaced with a decorator. However, it's only a few lines per function. I think this is the best option if we were to go about adding this.

Thoughts

Joachim Durchholz

unread,
Oct 23, 2014, 3:39:01 PM10/23/14
to sy...@googlegroups.com
Am 23.10.2014 um 20:23 schrieb James Crist:
> However, this isn't the easiest thing to do in Python. The best
> *composable* option

What's "composable" in this context?

> is to use signal.alarm, which is only available on *NIX
> systems. This can also cause problems with threaded applications.

That would not affect SymPy itself, as it is not multithreaded.
(This, in turn, has its reason in the default Python implementation
having weak-to-nonexistent support for multithreading.)

> Checks
> for windows

It should work without a problem under Windows - there's no mention that
signal.alarm() does not work on any platform.
In fact the timeout code for the unit tests uses this, and AFAIK it
works well enough.

> or not running in the main thread could be added to handle this
> though, but would limit it's use.

Actually you'll get a Python exception if you try to set a signal
handler anywhere except in the main thread. Or at least the Python docs
claim so, I haven't tried.

OTOH anybody who wants a timeout on SymPy (or any other piece of Python
code) can set up signal.alarm() themselves. There's simply no need for
SymPy itself to cater for this.

> A second option would be to implement a "pseudo-timeout". This only works
> for functions that have many calls, but each call is guaranteed to complete
> in a reasonable amount of time (recursive, simple rules, e.g. `fu`). The
> timeout won't be exact, but should limit excessively long recursive
> functions to approximately the timeout. I wrote up a quick implementation
> of this here <https://gist.github.com/jcrist/c451f3bdd6d038521a12>. It
> requires some function boilerplate for each recursive call that *can't* be
> replaced with a decorator. However, it's only a few lines per function. I
> think this is the best option if we were to go about adding this.

It's quite intrusive.
It's also going to be broken with every new algorithm, because people
will (rightly) concentrate on getting it right first.
This means we'll always have a list of algorithms that do this kind of
cooperative multitaking less often than we'd like.

There's an alternative: sys.trace(). I'm not sure what kind of overhead
is assocated with that (depending on implementation specifics, it could
be quite big).
OT3H letting SymPy functions test for timeout on a regular basis isn't
going to come for free, either. People will always have to find the
right middle ground between checking too often (slowdown) or too rarely
(unresponsive).

So... my best advice (not necessarily THE best advice) would be to leave
this to people who call SymPy.

BTW here's the timeout code in sympy/utilities/runtest.py:
def _timeout(self, function, timeout):
def callback(x, y):
signal.alarm(0)
raise Skipped("Timeout")
signal.signal(signal.SIGALRM, callback)
signal.alarm(timeout) # Set an alarm with a given timeout
function()
signal.alarm(0) # Disable the alarm
It's noncomposable (it assumes exclusive use of SIGALRM), and it's
strictly limited to being run in the main thread. That said, it has been
working really well, and maybe the best approach would be to document it
and point people with timeout needs towards it - those who use Stackless
or whatever real multithreading options are out there will be able to
use a better timeout wrapper, so this should be the least intrusive way
to deal with timeout requirements.

just my 2c.
Jo

James Crist

unread,
Oct 23, 2014, 4:06:22 PM10/23/14
to sy...@googlegroups.com
What's "composable" in this context?

Easy to write without intruding too much into the actual function.


That would not affect SymPy itself, as it is not multithreaded.

No, but it would affect anything that tried to run sympy functions that use this in a separate thread.
 
It should work without a problem under Windows - there's no mention that
signal.alarm() does not work on any platform.

From the python documentation: "On Windows, signal() can only be called with SIGABRT, SIGFPE, SIGILL, SIGINT, SIGSEGV, or SIGTERM. A ValueError will be raised in any other case." To use `signal.alarm`, you need to call signal with `SIGALRM`, which is not supported. The test suite is run on travis, which uses ubuntu, so it works there fine.


It's quite intrusive.
It's also going to be broken with every new algorithm, because people
will (rightly) concentrate on getting it right first.

I don't think it's that intrusive (especially with precomposed checking functions). It wouldn't have to be available for every function, and requires very little modification. In the last half hour I've almost finished applying an example of it to `fu`. Didn't take long at all, and requires only a few lines of code changed. Could it be better? Absolutely. I wish there was a way to go about doing this without modifying any of the logic code (simply decorating functions/using a context manager would be ideal). But this doesn't seem to be possible in a crossplatorm/robust way.

I did write up a second option that used a decorator and a contextmanager, but it assumes single-thread application (uses a global TIMEOUT variable :( ). I prefer the method described above.


OT3H letting SymPy functions test for timeout on a regular basis isn't
going to come for free, either.

For sure.  However, I don't think the overhead of calling `time.time()` is too large:

In [2]: %timeit time.time()
1000000 loops, best of 3: 1.21 µs per loop

Could still be a problem though. This was just a proposal - I'm not adamant that sympy needs such a feature.



 



James Crist

unread,
Oct 23, 2014, 5:14:27 PM10/23/14
to sy...@googlegroups.com
Proof of concept PR here. Adding a timeout for each function took not that much code, but it seems the functions in `fu` are slowed down more by `expand` than anything. Continuing to propogate timeout throughout the codebase could work, but I don't feel like doing that until I get some validation on the concept.

James Crist

unread,
Oct 23, 2014, 8:21:51 PM10/23/14
to sy...@googlegroups.com
A second proof of concept PR, this time using a context manager. I actually like this more, but it has its own issues as well. https://github.com/sympy/sympy/pull/8297

Joachim Durchholz

unread,
Oct 24, 2014, 2:15:17 PM10/24/14
to sy...@googlegroups.com
Am 23.10.2014 um 22:06 schrieb James Crist:
>
>>
>> What's "composable" in this context?
>>
>
> Easy to write without intruding too much into the actual function.

OK

>> That would not affect SymPy itself, as it is not multithreaded.
>
>
> No, but it would affect anything that tried to run sympy functions that use
> this in a separate thread.

Yep, I'm aware of that.
Given Python's generally weak support for multithreading, I don't see
that as a real shortcoming of SymPy though.
(I am aware that some Python implementations address that specifically.)

>> It should work without a problem under Windows - there's no mention that
>> signal.alarm() does not work on any platform.
>
>
>>From the python documentation: "On Windows, signal()
> <https://docs.python.org/2/library/signal.html#module-signal> can only be
> called with SIGABRT, SIGFPE, SIGILL, SIGINT, SIGSEGV, or SIGTERM. A
> ValueError
> <https://docs.python.org/2/library/exceptions.html#exceptions.ValueError>
> will be raised in any other case." To use `signal.alarm`, you need to call
> signal with `SIGALRM`, which is not supported. The test suite is run on
> travis, which uses ubuntu, so it works there fine.

Okay, I overlooked that.
Funny that nobody has complained that --timeout doesn't work on Windows.
(Or maybe I overlooked those complaints.)

>> It's quite intrusive.
>> It's also going to be broken with every new algorithm, because people
>> will (rightly) concentrate on getting it right first.
>
> I don't think it's that intrusive (especially with precomposed checking
> functions).

Those add overhead - the function call count doubles.
I don't have numbers, but I suspect it's going to be measurable unless
very carefully tuned.
AND it needs to be very carefully retuned whenever the algorithm changes.

Sorry, but I'm seeing a lot of unaddressed mid-term risks here.

> It wouldn't have to be available for every function,

No, only those that are called very often.
I.e. exactly those where overhead can hurt.

Unless I see some benchmarks that discount that as utterly irrelevant,
I'm strictly -1 on this proposal.

> and
> requires very little modification. In the last half hour I've almost
> finished applying an example of it to `fu`. Didn't take long at all, and
> requires only a few lines of code changed.

I can believe that, it's not hard to do.
I'm worried about the performance impact.

> I wish there was a way to go about doing this without modifying any of the
> logic code (simply decorating functions/using a context manager would be
> ideal). But this doesn't seem to be possible in a crossplatorm/robust way.

I suspect it's entirely doable, and entirely within Python.
However, decorators aren't that easy to do. Plus they incur the same
kind of overhead.

I'm still wondering whether the problem you're solving is worth solving
at all. Sure it's annoying having to Ctrl-C the application
occasionally, but a mere annoyance isn't worth it IMNSHO.

> I did write up a second option that used a decorator and a contextmanager,
> but it assumes single-thread application (uses a global TIMEOUT variable :(
> ).

Yeah, it can be difficult and intrusive to thread state to subfunctions.
OTOH I doubt that this particular problem is tied to the decorator
approach; I'd be surprised if you don't have it in your normal code as
well. (It might be hidden because you're doing a simple case in your
testing, and the context manager solution would force you to full
generality.)

So... double-check whether your assumption about the decorator solution
is right.
(I see you tried it anyway.)

Note that SymPy should not depend on any globals.
There are a few cached values around, but these are either
not-initialized-yet or never-to-change-anymore - that's the harmless
variant of globals.

James Crist

unread,
Oct 24, 2014, 8:31:56 PM10/24/14
to sy...@googlegroups.com
The first approach was hackish (actually both are, but the decorator way is *significantly* better). And I agree, I'm still not sure this a problem worth solving. Just food for conversation. I posted some timings on the decorator PR https://github.com/sympy/sympy/pull/8297

The core tests ran about 6% slower on my machine, but the long running ones ran only neglibly slower. The overhead of calling `_ask` without a timeout set is only ~70 ns on my machine. With one set it's more intrusive, due to the need to lookup the time. Polling is *not* the nicest approach to this, I fully agree.

Joachim Durchholz

unread,
Oct 25, 2014, 1:10:26 AM10/25/14
to sy...@googlegroups.com
I have thought a bit more, and it *would* open up using SymPy inside
other applications (GUIs, as part of an array of problem solvers,
whatever). Can't name any concrete killer app, but making a piece of
software more generally useful and seeing what people make of it is
always a good idea.

I'm still very uneasy about adding something to _ask - I may be mistaken
but I think it's part of the "old assumptions system" and on the way
out. Any other part of SymPy would suffer from a similar insecurity,
plus hacking on SymPy would also require knowing about how timeouts are
implemented, contributing to the entry barrier (which isn't *that* high,
but keeping it low requires an active effort).

One alternative approach that I can think about is running SymPy in a
separate process, via Popen. The calling program would be just a stub
that serializes (via pickle) its parameters to the subprocess, sets a
timer, kills the subprocess if it times out, and reads back any results.
Reply all
Reply to author
Forward
0 new messages