Just tested on latest web2py from trunk and Pythzon 2.7.3 on Win7. Everything seems to working as expected.
"Tasks with no stop_time set or picked up BEFORE stop_time are ASSIGNED to a worker. When a workers picks up them, they become RUNNING."
> Feel free to propose features you'd like to see in the scheduler, I have some time to spend implementing it.Will (or could) scheduler support multi-platform apps? (EC2, GAE, ...)?
--
--
--
Hi Niphlod,
thanks for the great work with the scheduler, I'm using it in a project where it handles lots of big data imports into a database and the migration to your version was without any problems.
On thing catched my eye in the old version and it still seems to be a "problem/missing feature" in the new one. When a long running process gets executed and produces output (print etc.) this output is written to the database only after the task was run (and finished). It would be really great if the output gets written into the task table while the task runs as this would be a possible feedback mechanism (and we would not need another table etc. just for that) just thinking of a progress meter for example.
What I really miss though is the output of the task when it produces a timeout - nothing in the task table about the output...
Daniel
def function1():
time.sleep(3)
print "first print"
time.sleep(5)
print "second print"
/web2py/gluon/scheduler.py", line 203, in executor
result = dumps(_function(*args,**vars))
File "applications/w2p_scheduler_tests/models/scheduler.py", line 21, in function1
time.sleep(5)
File "/home/niphlod/Scrivania/web2py_source/web2py/gluon/scheduler.py", line 446, in <lambda>
signal.signal(signal.SIGTERM, lambda signum, stack_frame: sys.exit(1))
SystemExit: 1
Uhm, serializing part of the output to the table every n seconds - with the output being a stream - would require a buffer/read/flush to update the scheduler_run table that I'm not sure it's feasible: I'll look into that but ATM I'm more concerned with other small issues of the Scheduler.
I'll definitely add to the feature-list the possibility to "recover" the output from TIMEOUTted tasks.
Aaanyway - for both issues - the logging module (and not some random prints) is the right tool for the job ^_^ .
BTW: as of now I saw only a queue/task processor for node.js that reports the "percentage" (i.e. every small bit of change "intra-execution" of the task in the workers "bubbles" up to the queue manager). Could you pinpoint me to a queue/task messaging implementation with this feature, if you saw this feature implemented already ?
On Wednesday, August 8, 2012 3:25:13 PM UTC+2, Daniel Haag wrote:Hi Niphlod,
thanks for the great work with the scheduler, I'm using it in a project where it handles lots of big data imports into a database and the migration to your version was without any problems.
On thing catched my eye in the old version and it still seems to be a "problem/missing feature" in the new one. When a long running process gets executed and produces output (print etc.) this output is written to the database only after the task was run (and finished). It would be really great if the output gets written into the task table while the task runs as this would be a possible feedback mechanism (and we would not need another table etc. just for that) just thinking of a progress meter for example.
What I really miss though is the output of the task when it produces a timeout - nothing in the task table about the output...
Daniel
--
I don't know if it would work this way but I would be glad if you could give me some feedback (its actually just a proof of concept - but I did already test it a little):
https://github.com/dhx/web2py/compare/scheduler_live_output
I'll definitely add to the feature-list the possibility to "recover" the output from TIMEOUTted tasks.
Aaanyway - for both issues - the logging module (and not some random prints) is the right tool for the job ^_^ .
Using the logging module is an option, but wouldn't I either end up writing to a file or to another table?
Well, actually I did not, but that doesn't mean a lot as I didn't have the requirement of a scheduler in a web framework until recently...
Am 14.08.2012 00:45 schrieb "Niphlod" <nip...@gmail.com>:
>
>
> On Monday, August 13, 2012 4:44:18 PM UTC+2, Daniel Haag wrote:
>>
>> I don't know if it would work this way but I would be glad if you could give me some feedback (its actually just a proof of concept - but I did already test it a little):
>>
>> https://github.com/dhx/web2py/compare/scheduler_live_output
>>
>
> TY for the code (smart), I'll definitely check your implementation ASAP
>
>>
>>>
>>> I'll definitely add to the feature-list the possibility to "recover" the output from TIMEOUTted tasks.
>>> Aaanyway - for both issues - the logging module (and not some random prints) is the right tool for the job ^_^ .
>>
>>
>> Using the logging module is an option, but wouldn't I either end up writing to a file or to another table?
>>
>
> Yep, I was implying that the "burden" of updating the scheduler_run table just to record its output is maybe a "non-wanted" feature for all the ones that don't need that. However, if it turns out it's not heavy at all, I see no problems implementing it.
>
So it might be the best if the burden can be selected with the current behavior as the default. What do you think of another parameter in the task table named update_frequency or similar being 0 as a default resulting in no updates of the output, any number higher than that updates the output every n seconds?
>>
>> Well, actually I did not, but that doesn't mean a lot as I didn't have the requirement of a scheduler in a web framework until recently...
>>
>
> Ok, it's just that if - luckily - there is some code around the web noone is forced to reinvent the wheel.
>
>
>
> --
>
>
>
Ok, done (the "save output for TIMEOUTted tasks").
Small issue, but quite manageable: when a task "timeouts" the output now is saved, and you have the traceback to see "where" it stopped.
e.g. queue function1 with a timeout of 5 secondsdef function1():
time.sleep(3)
print "first print"
time.sleep(5)
print "second print"
The scheduler_run records will report:
- status = TIMEOUT
- output = first print
- traceback =
/web2py/gluon/scheduler.py", line 203, in executor
result = dumps(_function(*args,**vars))
File "applications/w2p_scheduler_tests/models/scheduler.py", line 21, in function1
time.sleep(5)
File "/home/niphlod/Scrivania/web2py_source/web2py/gluon/scheduler.py", line 446, in <lambda>
signal.signal(signal.SIGTERM, lambda signum, stack_frame: sys.exit(1))
SystemExit: 1
Is that ok ? The "small issue" here is that the traceback is "full", starting from where the process is stopped after 5 seconds (the executor function), to the "where" it really stopped (line 21, function1, in models/scheduler.py) that is the useful information.
Should the scheduler report only the output and not the traceback for TIMEOUTted tasks?
That's great! If you want I can test it.
I wouldn't consider this an issue, actually it's a feature, isn't it?
Nope, that goes waaaayyy over the scheduler "responsibility". Prune all records, prune only completed, prune only failed, requeue timeoutted, prune every day, every hour, etc, etc, etc.... these are implementation details that belongs to the application.
We though that since it is all recorded and timestamped it's a matter of:
timelimit = datetime.datetime.utcnow() - datetime.timedelta(days=15)
db((db.scheduler_task.status == 'COMPLETED') & (db.scheduler_task.last_run_time < timelimit)).delete()
that is actually not so hard (scheduler_run records should be pruned away automatically because they are referenced)
(I like to have a function "maintenance" fired off every now and then with these things on it.)
----
--
Ok, got the example (but not the "the last go-round is forgotten").
Let's start saying that your requirements can be fullfilled (simply) decorating your function in a loop and break after the first successful attempt (and repeats=0, retry_failed=-1). Given that, the current behaviour is not properly a limit to what you are trying to achieve, it's only a matter on how implement the requeue facilities on the scheduler.
Lets keep the discussion open...if I got it correctly you're basically asking to ignore period for failed tasks (requeue them and execute ASAP) and reset counters accordingly... right ? Period right now ensures that no more than one task gets executed in n period seconds (and protects you from "flapping", i.e. a continously failing function, and is somewhat required e.g. for respecting webservices API limits, avoid "db pressure" if you're doing heavy operations, etc, etc).
Respecting period in every case is "consistency" for me (because I decided that I can "afford" (or "consume" resources) executing that function only one time every hour).
You are suggesting to alter this for repeating tasks....what I didn't get is that is required always or only when repeats=0 (that is, incidentally, not consistent :P) ?!
i.e. What behaviour should you expect from (repeats=2, retry_failed=3, period=3600) ?
2.00 am, failed
2.00 am, failed
2.00 am, completed
3.00 am, failed
3.00 am, failed
3.00 am, failed
?
This is basically what I'm missing. What could possibly be wrong at 2.00am and be right a few seconds later ?
OK i didn't understand that retries happened periodically- i indeed thought that it would retry right away, though i agree with you that that should be handled at the function level. But if we're handling failures within the scheduled function, then now im wondering what is the value in having retries at all? Just because the scheduler is running asynchronously does not mean it should necessarily be responsible for the scheduled functions' unhandled exceptions (which is what the failures are, right)? In other words, since our scheduler is scheduling we2py functions in a known environment (unlike environment agnostic task-queue systems, which don't know how their operations will resolve), shouldn't the onus be on the scheduled function to handle failures and reschedule if necessary? Maybe we should clarify this before discussing the rest- i may be missing something.
btw im leaving for the night but am interested in finishing the discussion- ill be back in the morning if you dont hear from me..
On Saturday, August 18, 2012 5:14:46 PM UTC-4, Niphlod wrote:
Ok, got the example (but not the "the last go-round is forgotten").
Let's start saying that your requirements can be fullfilled (simply) decorating your function in a loop and break after the first successful attempt (and repeats=0, retry_failed=-1). Given that, the current behaviour is not properly a limit to what you are trying to achieve, it's only a matter on how implement the requeue facilities on the scheduler.
Lets keep the discussion open...if I got it correctly you're basically asking to ignore period for failed tasks (requeue them and execute ASAP) and reset counters accordingly... right ? Period right now ensures that no more than one task gets executed in n period seconds (and protects you from "flapping", i.e. a continously failing function, and is somewhat required e.g. for respecting webservices API limits, avoid "db pressure" if you're doing heavy operations, etc, etc).
Respecting period in every case is "consistency" for me (because I decided that I can "afford" (or "consume" resources) executing that function only one time every hour).
You are suggesting to alter this for repeating tasks....what I didn't get is that is required always or only when repeats=0 (that is, incidentally, not consistent :P) ?!
i.e. What behaviour should you expect from (repeats=2, retry_failed=3, period=3600) ?
2.00 am, failed
2.00 am, failed
2.00 am, completed
3.00 am, failed
3.00 am, failed
3.00 am, failed
?
This is basically what I'm missing. What could possibly be wrong at 2.00am and be right a few seconds later ?
--
---
You received this message because you are subscribed to the Google Groups "web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web2py+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Sweet- looking forward to using the API. Schema changes a pain but done for right reasons. Can you give more explanation of the immediate=True param?
As for patterns- a basic event calendar would be good demo