Scheduler misleading DEBUG logging when task reaches timeout

138 views
Skip to first unread message

Francisco Ribeiro

unread,
Nov 19, 2014, 10:30:36 PM11/19/14
to web...@googlegroups.com
hi,

After some debugging, I noticed that when tasks timeout while using the scheduler, I get an output as follows:
DEBUG:web2py.app.myapp:    new task report: FAILED
DEBUG
:web2py.app.myapp:   traceback: Traceback (most recent call last):
 
File "/../web2py/gluon/scheduler.py", line 303, in executor
    result
= dumps(_function(*args, **vars))
 
File "applications/myapp/models/db.py", line 337, in schedule_call
    time
.sleep(3600)
 
File "/.../web2py/gluon/scheduler.py", line 704, in <lambda>
    signal
.signal(signal.SIGTERM, lambda signum, stack_frame: sys.exit(1))
SystemExit: 1

Whilst the timeout behaviour happens just as I expect it to be and things get stored correctly on the database (scheduler_run.status = 'TIMEOUT'), this debugging output is somewhat misleading since 'FAILED' seems to be an alternative state different than 'TIMEOUT' according to documentation ( http://www.web2py.com/books/default/image/29/ce8edcc3.png ).

Can someone explain to me why this happens? Is it expectable? 

Thank you.
Kind regards,
Francisco

Niphlod

unread,
Nov 20, 2014, 4:57:05 AM11/20/14
to web...@googlegroups.com
the "new task report" line is logged when the status is either COMPLETED or FAILED. 
These are not the statuses of the task itself, it's the status of the task being returned by the "executor" process, that knows only if the task ended correctly or raised some exceptions. 
The "finer grained" statuses are "computed" back in the "worker" process (the report_task() routine, to be exact), that knows, e.g., if a task needs to be queued again, etc etc etc

Francisco Ribeiro

unread,
Nov 20, 2014, 6:07:50 AM11/20/14
to web...@googlegroups.com
thank you,

a different and yet related problem that I found when I was testing the timeout behaviour using a simple task that just does a time.sleep(3000) is that this keeps the CPU load of its process close to 100% during the whole time. This, however it's not a CPU intensive function and you won't find this behaviour if you test it outside of the scheduler. There seems to be room for optimisations since this means that a small number of lightweight tasks that for some reason need more time to complete, will quickly consume CPU.

Kind regards,
Francisco 

Niphlod

unread,
Nov 20, 2014, 2:42:25 PM11/20/14
to web...@googlegroups.com
if you care to post an app that reproduces the behaviour, I'd be glad to iron out the bug, if there's one.

Francisco Ribeiro

unread,
Nov 21, 2014, 6:42:54 AM11/21/14
to web...@googlegroups.com
just create and trigger the following task:
def schedule_call():
   
import time
    time
.sleep(3600)
   
return 'completed'


and queue it like:
myscheduler.queue_task(schedule_call, timeout=0)

once it's triggered check the CPU load of your python scheduler node and should be 100%.

If this is not enough to reproduce the issue, please let me know, I will see you a full app.

Thank you.

Kind regards,
Francisco

Niphlod

unread,
Nov 21, 2014, 8:13:04 AM11/21/14
to web...@googlegroups.com
will do. in the meantime.... with timeout=0 what are you trying to achieve ?

Francisco Ribeiro

unread,
Nov 21, 2014, 10:47:53 AM11/21/14
to web...@googlegroups.com
So, by disabling the timeout, I'm making sure that the scheduler will be taken by that process for 3600s rather than being released on its own through a term signal triggered by a timeout. This way, you should be able to easily verify the high CPU load caused by any task loaded into the scheduler.

Kind regards,
Francisco

Niphlod

unread,
Nov 21, 2014, 3:02:15 PM11/21/14
to web...@googlegroups.com
0 doesn't disable the timeout. it sets it to 0. which is kinda the nonsense I was trying to figure it out ;-P

Francisco Ribeiro

unread,
Nov 21, 2014, 6:30:35 PM11/21/14
to
Now, that is amusing :)
that timeout = 0 is triggering an endless loop which actually works as a way to prevent the app from timing out! It also explains the problem at hands, i.e. the intense CPU load so there is some progress here. Out of curiosity, the same behaviour (both disabling timeout and high CPU load) also happens if you use timeout = None. 

So the question is, what is the appropriate way of disabling the timeout ? Could not find that in the docs..

Thank you,
Francisco

Niphlod

unread,
Nov 23, 2014, 2:20:52 PM11/23/14
to web...@googlegroups.com
timeout gets passed as it is to the underlying base function. Python with timeout = 0 seems to exhibit a pretty strange behaviour, but I guess that is allowed just because in python "we're all consenting adults".
Launching something that needs to return in 0 time is clearly something spectacularly wrong.
As it is, scheduler is "bugged", in the sense that allowing a timeout=0 is not safe. Until now, all "consenting adults" never used it, and perhaps we should change the field to accept only integer starting from 1 rather than 0 (just to discourage "consenting childs" ;-P ).

As web2py enforces "good manners", there's no way to disable the timeout, and I think it's a safe decision to keep. A task queued with no timeout that can potentially block any other outstanding task in a task processor is plain silly.
That being said, if you don't want any (theoretical) timeout, if you pass, e.g.,  timeout=999999, you'll get  ~12 days of timeout.
If your task is still running at that time, killing it is something that won't bother your application at all.

Francisco Ribeiro

unread,
Nov 23, 2014, 4:29:19 PM11/23/14
to web...@googlegroups.com
Niphlod, 
 let me tell you that the "0" is very often interpreted as "disabled" in computing. For example in "select()"  the famous UNIX system call, uses that convention for the timeout argument and the "same" happens when you use the snapshop length = 0 argument in "tcpdump -s 0" which is allowed but certainly doesn't mean 0 length . It's not spectacularly wrong nor childish, it's just a common and useful convention.
So, once I did not find this in the documentation, i tried that just as I tried timeout = None hoping to find it in use.

I understand where you are coming from but tbh i think at this stage it's too early for a scheduler engine to enforce that behaviour. There are scenarios where you don't expect applications to block but if that happens you might want to take different measures to analyse and handle what happened (while it is still running) and not necessarily to have them automatically killed. At the end of the day, a large timeout as you suggest, allows me to achieve the same result but actually doing so is what makes me feel a bit silly ;)

Anyway, now I understand how it works, thank you.

Kind regards,
Francisco

i did not find this on the documentation and 

Niphlod

unread,
Nov 24, 2014, 4:00:31 AM11/24/14
to web...@googlegroups.com
there's a difference between "it's not documented" and "I expect it to do that" . 
In python standard lib, multiprocessing.Process join() method accepts a timeout, and that timeout argument can - theoretically - be 0. 
If you try that "outside" the scheduler, the same thing happens ... 100% cpu and erratic behaviour.

in standard lib, timeout=None means "let it run with no timeout", but the scheduler doesn't allow a timeout=None for the reasons explained before. 

summary: I'll send a patch to restrict the timeout value to be >= 1 instead of >= 0.

Francisco Ribeiro

unread,
Nov 24, 2014, 5:08:32 AM11/24/14
to web...@googlegroups.com
Niphlod,
i have no expectations whatsoever and specially when it comes to undocumented features but merely reasonable hopes.

Now, let me be clear here, I also don't expect software to be always properly documented and specially Open-Source where talented and busy people like yourself generously contribute to.  I just mentioned it to explain my reasonable hope :)

Thank you for adding some input validation in there, it is essential.

Francisco
--
Resources:
- http://web2py.com
- http://web2py.com/book (Documentation)
- http://github.com/web2py/web2py (Source code)
- https://code.google.com/p/web2py/issues/list (Report Issues)
---
You received this message because you are subscribed to a topic in the Google Groups "web2py-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/web2py/3GyfgbJzAio/unsubscribe.
To unsubscribe from this group and all its topics, send an email to web2py+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
Message has been deleted
0 new messages