Propose different Scheduler default

Michael Toomim

unread,

Aug 11, 2012, 8:12:00 PM8/11/12

to web2py-d...@googlegroups.com, nip...@gmail.com

I just got bit by the default timeout=60 seconds for tasks in the scheduler, and asked myself:

"Should we change the default task timeout to None?"

We'd replace:

Field('timeout','integer',default=60,comment='seconds'),

with:

Field('timeout','integer',default=None,comment='seconds'),

(Also, we should explain in the comment "None means no timeout.")

The tradeoff:

Currently, a programmer who implements a long-running task has to realize that there is a default timeout of 60 seconds, and change it to None or some huge number. His early tests might not reveal that there's a timeout that will cause a bug until his app is live, in production, running on large amounts of data, in which his background tasks exceed 60 seconds. Imagine your app gets slashdotted, and suddenly everything is breaking, and you have to go find out why. This sucks! So removing the default timeout will fix this problem. (This is what happened to me, but not cause of slashdotting.)
On the other hand, the benefit of this timeout would be that it helps save users from runaway processes, right? Perhaps you are spawning 500 background tasks to get through, and one of them gets stuck. If you have a single worker, it will stop processing the other tasks because it's stuck on one. In this case, you'll still have the bug—but your code might half work. And you'll have to notice it by looking for expired tasks, instead of notice it by noticing that the tasks aren't working at all.

I have to say that I prefer my code to break if there's a problem. And if I don't think I can solve the problem easily, and would prefer to make it half work, that I would just add the timeout at that point.

So I propose we change the default timeout=None. Agree? Disagree?

Niphlod

unread,

Aug 12, 2012, 5:48:11 PM8/12/12

to web2py-d...@googlegroups.com, nip...@gmail.com

To me a sane default value for timeout is recommended, given that the 95% users are likely to queue simple tasks.
This "enforces" also users to think about what is going on really and write tasks accordingly: Scheduler needs to be "used" from people queueing n hundreds "one-time-only" tasks to people "replacing" cron scripts and I don't think that so many peoples will be crippled by the timeout=60 seconds default as much as users writing potentially never-ending tasks that later need to be managed terminating processes and so on.

Michael Toomim

unread,

Aug 12, 2012, 6:07:03 PM8/12/12

to web2py-d...@googlegroups.com, nip...@gmail.com

Thanks for the insight that the common case may be queueing simple tasks.

I believe it is more like 60 or 70% than 95%, however. In my experience it has been 1 out of 5, or 20%.

Can you please think about this in terms of the scenarios I described? I put a bit of work into thinking it through, and describing how the "enforcement" occurs to a developer. As I said, even if a user writes a never-ending task, he will notice this either as (a) his tasks not completing and his tests failing or (b) expirations in the scheduler table. He is more likely to notice (a) than (b), so the enforcement doesn't actually help him any. Anyway, I thought this through, so please tell me what you think.

--
-- mail from:GoogleGroups "web2py-developers" mailing list
make speech: web2py-d...@googlegroups.com
unsubscribe: web2py-develop...@googlegroups.com
details : http://groups.google.com/group/web2py-developers
the project: http://code.google.com/p/web2py/
official : http://www.web2py.com/

Niphlod

unread,

Aug 12, 2012, 6:56:42 PM8/12/12

to web2py-d...@googlegroups.com, nip...@gmail.com

It's probable that you put more thinking into it then I did, I was just reporting my initial ideas about it.
I think the main difference between my POV and yours is that a task is generally unuseful if it returns (or churn) data out of a relatively small window of time (Underlying changes on the data itself, simply network hiccups, etc standing in the way).
From here on, I'll address with "TIs" other Tasks Implementations as Celery, RQ, Resque, DJ and so on.
From my POV, TIs are generally a way to speedup execution times and give the users of your app more "responsiveness" bypassing the normal "serialized" development with out-of-band actor(s).
TIs gave users the best "features" of async programming (responsiveness) avoiding all the hassle required to "switch" your way of doing things to "full-async-mode" (manage actors, events, timeframes, resources, processes, etc).
Without timeouts, the algos would require an async approach that - with all the puns intended - is reaaally different from the standard "we all like it and we're very productive with it" approach of modern web frameworks.

The main issue here is that with no default on timeout a single "never-ending" slashdotted-like task will effectively block execution of all the other tasks in the queue (and I like to think that all tasks within the same group are equally important). Users of some facilities could also end-up with unprevented high $ bills with a single "flawed" task.
Last but not least, I'd like to have users "concerned" about their execution time-frame rather than leaving them without time boundaries.

BTW: in other tasks implementation, a timeout/timelimit/etc parameter is always intended "in place". You have to explicitely declare the task as "possibily-never-ending/the-results-are-always-valid-also-if-they-come-back-in-12-hours".

Reply all

Reply to author

Forward