Scheduler: help us test it while learning

598 views
Skip to first unread message

Niphlod

unread,
Jul 12, 2012, 4:36:38 PM7/12/12
to web...@googlegroups.com
Hello everybody, in the last month several changes were commited to the scheduler, in order to improve it.
Table schemas were changed, to add some features that were missed by some users.
On the verge of releasing web2py v.2.0.0, and seeing that the scheduler potential is often missed by regular web2py users, I created a test app with two main objectives: documenting the new scheduler and test the features.

App is available on github (https://github.com/niphlod/w2p_scheduler_tests). All you need is download the trunk version of web2py, download the app and play with it.

Current features:
- one-time-only tasks
- recurring tasks
- possibility to schedule functions at a given time
- possibility to schedule recurring tasks with a stop_time
- can operate distributed among machines, given a database reachable for all workers
- group_names to "divide" tasks among different workers
- group_names can also influence the "percentage" of assigned tasks to similar workers
- simple integration using modules for "embedded" tasks (i.e. you can use functions defined in modules directly in your app or have them processed in background)
- configurable heartbeat to reduce latency: with sane defaults and not toooo many tasks queued normally a queued task doesn't exceed 5 seconds execution times
- option to start it, process all available tasks and then die automatically
- integrated tracebacks
- monitorable as state is saved on the db
- integrated app environment if started as web2py.py -K
- stop processes immediately (set them to "KILL")
- stop processes gracefully (set them to "TERMINATE")
- disable processes (set them to "DISABLED")
- functions that doesn't return results do not generate a scheduler_run entry
- added a discard_results parameter that doesn't store results "no matter what"
- added a uuid record to tasks to simplify checkings of "unique" tasks
- task_name is not required anymore
- you can skip passing the function to the scheduler istantiation: functions can be dinamically retrieved in the app's environment

So, your mission is:
- test the scheduler with the app and familiarize with it
Secondary mission is:
- report any bug you find here or on github (https://github.com/niphlod/w2p_scheduler_tests/issues)
- propose new examples to be embedded in the app, or correct the current docs (English is not my mother tongue)

Once approved, docs will be probably embedded in the book (http://web2py.com/book)

Feel free to propose features you'd like to see in the scheduler, I have some time to spend implementing it.



David Marko

unread,
Jul 13, 2012, 4:01:31 AM7/13/12
to web...@googlegroups.com
Just tested on latest web2py from trunk and Pythzon 2.7.3 on Win7. Everything seems to working as expected.

One question:  is there a way how to define function that is run when task failes? I mean situation I would like to add some mail notification on fail or something? e.g. onFailure=myFunction(id=id of item in schedule_run table). This would be nice also for other state changes ...

BTW: Great work ...

Massimo Di Pierro

unread,
Jul 13, 2012, 12:54:04 PM7/13/12
to web...@googlegroups.com
I guess you could create a task that checks for recent failed tasks and sends an email about them.

spiffytech

unread,
Jul 14, 2012, 3:28:51 PM7/14/12
to web...@googlegroups.com
All the tests passed for me.

Pystar

unread,
Jul 15, 2012, 3:58:55 PM7/15/12
to web...@googlegroups.com
I am having issues installing the app in the web2py 2.0.0 dev version

Niphlod

unread,
Jul 15, 2012, 4:02:01 PM7/15/12
to web...@googlegroups.com
Maybe if you tell what issues you have we'll be able to help you .....

spiffytech

unread,
Jul 15, 2012, 7:01:09 PM7/15/12
to web...@googlegroups.com
By the way, I think this app is an excellent way to get features tested, and to introduce users to features they might not ordinarily think to use. I'd love to see the idea more-widely adopted!


On Thursday, July 12, 2012 4:36:38 PM UTC-4, Niphlod wrote:

Pystar

unread,
Jul 16, 2012, 5:35:36 PM7/16/12
to web...@googlegroups.com
I downloaded the app, tried to upload and install it but it always gives an error message "failed to install app". I also renamed it but no changes

Niphlod

unread,
Jul 17, 2012, 11:42:26 AM7/17/12
to web...@googlegroups.com
instructions: download the archive from https://github.com/niphlod/w2p_scheduler_tests/zipball/master. the zip contains a folder (now it is named "niphlod-w2p_scheduler_tests-903ee75"). Decompress that folder under "applications" and rename it to "w2p_scheduler_tests".

That should be enough

Andrew

unread,
Jul 18, 2012, 1:24:16 AM7/18/12
to web...@googlegroups.com
Great Job with packaging up the app and the documentation/instructions.  Very impressive.
I'll now start testing / familiarising myself with the scheduler ....


Andrew

unread,
Jul 18, 2012, 2:28:03 AM7/18/12
to web...@googlegroups.com
When I Kill a Task, I end up with a Queued Task and a Stopped Scheduler_run.  When I restart the worker, should I expect the queued task to then get assigned ?

I tried killing the worker in the assigned state, and I noticed it is assigned to a specific worker.  If that worker is killed and another restarted, the task is left assigned to a worker that doesn't exist ?   What happens next.  Should I just clear out the task and create a new task.

What happens to scheduler_task and scheduler_run over time.   Will there only be one row per task/function in scheduler_run, will scheduler_run slowly fill up for each run - so that I would have to occasionally have to purge.

I'll do some more testing in the next few days but I really like the way you've put this together.  

Niphlod

unread,
Jul 18, 2012, 3:51:57 AM7/18/12
to web...@googlegroups.com
STOPPED tasks gets requeued as soon as there is an ACTIVE worker around.
The "philosophy" behind is that if a worker has been stopped (abruptly) your task never finished, so it gets "another shot" with the next worker.
The scheduler_run table will grow as long as you need results. As documented, if your function doesn't return anything, the corresponding scheduler_run record is deleted.
The scheduler_task table will grow as long as you queue tasks..... you can always cleanup it without issues deleting all tasks marked COMPLETED, if you don't need to store those undefinitely.

Yarin

unread,
Aug 5, 2012, 10:54:22 AM8/5/12
to web...@googlegroups.com
@Niphlod- First of all, thanks for taking this on. An effective scheduler is critically important to us, and I'll be glad to help out in any way. 

I've downloaded the test app and am making corrections to the documentation (per your request) for clarity, grammar, etc. 

On thing I'm stuck on is when the ASSIGNED status comes into play. According to the docs:
"Tasks with no stop_time set or picked up BEFORE stop_time are ASSIGNED to a worker. When a workers picks up them, they become RUNNING." 
- This doesn't make sense to me. If a QUEUED task is picked up by a worker, its status changes to RUNNING. So at what point is it ASSIGNED?


On Thursday, July 12, 2012 4:36:38 PM UTC-4, Niphlod wrote:

Niphlod

unread,
Aug 5, 2012, 11:13:38 AM8/5/12
to web...@googlegroups.com
Hi Yarin, Thank you for testing it!
A QUEUED task is not picked up by a worker, it is first ASSIGNED to a worker that can pick up only the ones ASSIGNED to him. The "assignment" phase is important because:
- the group_name parameter is honored (task queued with the group_name 'foo' gets assigned only to workers that process 'foo' tasks (the group_names column in scheduler_workers))
- DISABLED, KILL and TERMINATE workers are "removed" from the assignment alltogether
- in multiple workers situations the QUEUED tasks are split amongst workers evenly, and workers "know in advance" what tasks they are allowed to execute (the assignment allows the scheduler to set up n "independant" queues for the n ACTIVE workers)

Yarin

unread,
Aug 5, 2012, 11:55:19 AM8/5/12
to web...@googlegroups.com
Ok this is clearer to me- I'll see if I can clarify it in the docs..

On to the next issue, this one regarding implementation:

I think the following parameters need to be renamed:
  1. 'repeats' should be 'repeat'
  2. 'repeats_failed' should be 'retry_failed'
Let me explain:
  1. 'repeat' is a command, whereas 'repeats' sounds like a result. Because the task record stores both arguments and results, this becomes confusing.
  2. 'repeats_failed' is even worse, because it sounds like a result (like 'times_failed', which is a result), and because it is a misnomer. We are not instructing it to repeat failures, but to retry them. Moreover, it needs to be clear that this value is completely distinct from the 'repeat' value- i.e. retries will be applied to every execution attempt, regardless of whether those attempts will be repeated or not.

Yarin

unread,
Aug 5, 2012, 12:16:55 PM8/5/12
to web...@googlegroups.com
Let me go further:

Field('repeats_failed', 'integer', default=1, comment="0=unlimited"),

Should really be:

Field('retry_failed', 'integer', default=0, comment="-1=unlimited"),

According to the docs, this param is supposed to "set how many times the function can raise an exception ... and be queued again instead of stopping in FAILED status with the parameter." If that's the case, then 0 should mean that we don't want the function to be queued again if it fails. 1 should mean give it one more try. This is a lot clearer than having the number refer to the number of failures allowed.

Niphlod

unread,
Aug 5, 2012, 12:25:48 PM8/5/12
to web...@googlegroups.com
I like the idea.
The only problem is having people changing repeat to repeats if they're using the scheduler included into the stable version.
I don't think that the implementation would be cumbersome, I'll try to compose a patch and send it to Massimo ASAP.

Yarin

unread,
Aug 5, 2012, 12:53:55 PM8/5/12
to web...@googlegroups.com
Great, let's get these issues ironed out now- I know the 'repeats' param was part of the older scheduler before you started work on it, but as far as I know it's been experimental up until now and and so future-readiness should trump backwards compatibility at this point. I hope...

I'll be testing all day- I'll bring up any more issues as I find them. Thanks for being so responsive, and good work--

Yarin

unread,
Aug 5, 2012, 11:58:29 PM8/5/12
to web...@googlegroups.com
The next issue is a big one: It's absolutely crucial to be able to operate in UTC mode. Without the ability to store and process schedule data in universal time, there's no way to ensure a schedule's integrity across multiple servers, or even on the same server if time settings are changed. It's also essential for being able to perform accurate server-to-client/client-to-server time conversions. In my experience, the only sane way to handle scheduling is to work exclusively in UTC and avoid local server time altogether. 

Niphlod - I have experience with this and can help with the conversion functions. If you want to hash through this together you can reach me at ykes...@appgrinders.com.

Alan Etkin

unread,
Aug 6, 2012, 9:05:43 AM8/6/12
to web...@googlegroups.com
> Feel free to propose features you'd like to see in the scheduler, I have some time to spend implementing it.

Will (or could) scheduler support multi-platform apps? (EC2, GAE, ...)?

Yarin Kessler

unread,
Aug 6, 2012, 9:58:40 AM8/6/12
to web...@googlegroups.com
Alan- the scheduler relies on a normalized table structure that's impossible to implement in GAE, but the GAE has its own task scheduler if I recall. EC2 should be fine as long as you've got a supported DB somewhere.

On Mon, Aug 6, 2012 at 9:05 AM, Alan Etkin <spam...@gmail.com> wrote:
> Feel free to propose features you'd like to see in the scheduler, I have some time to spend implementing it.

Will (or could) scheduler support multi-platform apps? (EC2, GAE, ...)?


--
 
 
 

Niphlod

unread,
Aug 6, 2012, 10:04:46 AM8/6/12
to web...@googlegroups.com
Since there is no way to store in a web2py's datetime field the tz, either you work with UTC all the time or you work with local time.
For me is a no-brainer issue: I'll add a parameter utc_time=False to the Scheduler .

People with your concerns (multi-server time sync, different tz, etc) can use myscheduler = Scheduler(db, utc_time=True). For simple apps with recurring tasks and using the scheduler in production, in the 90% of the cases in the same machine of the webapp scheduling the tasks, will continue to use local time calculations (i.e. start_time=request.now + datetime.timedelta(hours=2) will schedule the task 2 hours from now) and not break backward compatibility.

Niphlod

unread,
Aug 6, 2012, 10:10:42 AM8/6/12
to web...@googlegroups.com
I must admit I have no experience on GAE and EC2.
On GAE the issues are 2:
- no relational db available
- is really allowed on GAE to have a long running process that (possibly) never ends ? Is not GAE charging something for every query made on their BigTable db ? I don't think the scheduler would be a solution fit to their structure, and they have their own Scheduled Tasks

EC2 on the other end, as far as I know, is a "full" system, so as long as there is a relational db available, Scheduler will happily work on that (don't know what they actually charge, so also here there is the issue of a long running process continuously polling for tasks potentially "raising" the $ consumed).

Yarin Kessler

unread,
Aug 6, 2012, 10:40:03 AM8/6/12
to web...@googlegroups.com
Niphlod - utc_time param is the right call- thanks. Let me know when it's rocking and I'll help test.

--
 
 
 

niphlod

unread,
Aug 6, 2012, 11:01:10 AM8/6/12
to web...@googlegroups.com
It's rolling right now ;-)
I'll send the patch to Massimo and wait for trunk inclusion to update w2p_scheduler_tests's dccs (just a find/replace through the enqueing calls and a rewrite for the "repeats failed" tasks.).
I'll send the code to you privately to help test it.

2012/8/6 Yarin Kessler <ykes...@gmail.com>
--
 
 
 

Alan Etkin

unread,
Aug 6, 2012, 3:53:25 PM8/6/12
to web...@googlegroups.com
> I don't think the scheduler would be a solution fit to their structure, and they have their own Scheduled Tasks

GAE Scheduled tasks are configured when updating the app with a list in a .yaml file, so I think switching between the normal scheduler and gae scheduler would be very difficult, but there's also this, which exposes a python API, allowing tasks management

https://developers.google.com/appengine/docs/python/taskqueue/

Scheduled tasks could be simulated by using the Task class eta or countdonw constructor parameters

Niphlod

unread,
Aug 12, 2012, 6:13:48 PM8/12/12
to web...@googlegroups.com
Uhm, serializing part of the output to the table every n seconds - with the output being a stream - would require a buffer/read/flush to update the scheduler_run table that I'm not sure it's feasible: I'll look into that but ATM I'm more concerned with other small issues of the Scheduler.
I'll definitely add to the feature-list the possibility to "recover" the output from TIMEOUTted tasks.
Aaanyway - for both issues - the logging module (and not some random prints) is the right tool for the job ^_^ .

BTW: as of now I saw only a queue/task processor for node.js that reports the "percentage" (i.e. every small bit of change "intra-execution" of the task in the workers "bubbles" up to the queue manager). Could you pinpoint me to a queue/task messaging implementation with this feature, if you saw this feature implemented already ?

On Wednesday, August 8, 2012 3:25:13 PM UTC+2, Daniel Haag wrote:
Hi Niphlod,

thanks for the great work with the scheduler, I'm using it in a project where it handles lots of big data imports into a database and the migration to your version was without any problems.

On thing catched my eye in the old version and it still seems to be a "problem/missing feature" in the new one. When a long running process gets executed and produces output (print etc.) this output is written to the database only after the task was run (and finished). It would be really great if the output gets written into the task table while the task runs as this would be a possible feedback mechanism (and we would not need another table etc. just for that) just thinking of a progress meter for example.

What I really miss though is the output of the task when it produces a timeout - nothing in the task table about the output...

Daniel

Niphlod

unread,
Aug 13, 2012, 4:33:14 PM8/13/12
to
Ok, done (the "save output for TIMEOUTted tasks").
Small issue, but quite manageable: when a task "timeouts" the output now is saved, and you have the traceback to see "where" it stopped.
e.g. queue function1 with a timeout of 5 seconds

def function1():
    time
.sleep(3)
   
print "first print"
    time
.sleep(5)
   
print "second print"



The scheduler_run records will report:
  1. status = TIMEOUT
  2. output = first print
  3. traceback =
/web2py/gluon/scheduler.py", line 203, in executor
    result = dumps(_function(*args,**vars))
  File "
applications/w2p_scheduler_tests/models/scheduler.py", line 21, in function1
    time.sleep(5)
  File "
/home/niphlod/Scrivania/web2py_source/web2py/gluon/scheduler.py", line 446, in <lambda>
    signal.signal(signal.SIGTERM, lambda signum, stack_frame: sys.exit(1))
SystemExit: 1



Is that ok ? The "small issue" here is that the traceback is "full", starting from where the process is stopped after 5 seconds (the executor function), to the "where" it really stopped (line 21, function1, in models/scheduler.py) that is the useful information.
Should the scheduler report only the output and not the traceback for TIMEOUTted tasks?

Daniel Haag

unread,
Aug 13, 2012, 10:44:18 AM8/13/12
to web...@googlegroups.com
Thanks for your response,

2012/8/12 Niphlod <nip...@gmail.com>

Uhm, serializing part of the output to the table every n seconds - with the output being a stream - would require a buffer/read/flush to update the scheduler_run table that I'm not sure it's feasible: I'll look into that but ATM I'm more concerned with other small issues of the Scheduler.

I don't know if it would work this way but I would be glad if you could give me some feedback (its actually just a proof of concept - but I did already test it a little):

https://github.com/dhx/web2py/compare/scheduler_live_output

 
I'll definitely add to the feature-list the possibility to "recover" the output from TIMEOUTted tasks.
Aaanyway - for both issues - the logging module (and not some random prints) is the right tool for the job ^_^ .

Using the logging module is an option, but wouldn't I either end up writing to a file or to another table?


BTW: as of now I saw only a queue/task processor for node.js that reports the "percentage" (i.e. every small bit of change "intra-execution" of the task in the workers "bubbles" up to the queue manager). Could you pinpoint me to a queue/task messaging implementation with this feature, if you saw this feature implemented already ?

Well, actually I did not, but that doesn't mean a lot as I didn't have the requirement of a scheduler in a web framework until recently...



 

On Wednesday, August 8, 2012 3:25:13 PM UTC+2, Daniel Haag wrote:
Hi Niphlod,

thanks for the great work with the scheduler, I'm using it in a project where it handles lots of big data imports into a database and the migration to your version was without any problems.

On thing catched my eye in the old version and it still seems to be a "problem/missing feature" in the new one. When a long running process gets executed and produces output (print etc.) this output is written to the database only after the task was run (and finished). It would be really great if the output gets written into the task table while the task runs as this would be a possible feedback mechanism (and we would not need another table etc. just for that) just thinking of a progress meter for example.

What I really miss though is the output of the task when it produces a timeout - nothing in the task table about the output...

Daniel

--
 
 
 

Niphlod

unread,
Aug 13, 2012, 6:45:11 PM8/13/12
to web...@googlegroups.com

On Monday, August 13, 2012 4:44:18 PM UTC+2, Daniel Haag wrote:
I don't know if it would work this way but I would be glad if you could give me some feedback (its actually just a proof of concept - but I did already test it a little):

https://github.com/dhx/web2py/compare/scheduler_live_output

 
TY for the code (smart), I'll definitely check your implementation ASAP

 
I'll definitely add to the feature-list the possibility to "recover" the output from TIMEOUTted tasks.
Aaanyway - for both issues - the logging module (and not some random prints) is the right tool for the job ^_^ .

Using the logging module is an option, but wouldn't I either end up writing to a file or to another table?

 
Yep, I was implying that the "burden" of updating the scheduler_run table just to record its output is maybe a "non-wanted" feature for all the ones that don't need that. However, if it turns out it's not heavy at all, I see no problems implementing it.


Well, actually I did not, but that doesn't mean a lot as I didn't have the requirement of a scheduler in a web framework until recently...


Ok, it's just that if - luckily - there is some code around the web noone is forced to reinvent the wheel.

 

Daniel Haag

unread,
Aug 14, 2012, 1:29:24 AM8/14/12
to web...@googlegroups.com


Am 14.08.2012 00:45 schrieb "Niphlod" <nip...@gmail.com>:
>
>
> On Monday, August 13, 2012 4:44:18 PM UTC+2, Daniel Haag wrote:
>>
>> I don't know if it would work this way but I would be glad if you could give me some feedback (its actually just a proof of concept - but I did already test it a little):
>>
>> https://github.com/dhx/web2py/compare/scheduler_live_output
>>
>  
> TY for the code (smart), I'll definitely check your implementation ASAP
>
>>  
>>>
>>> I'll definitely add to the feature-list the possibility to "recover" the output from TIMEOUTted tasks.
>>> Aaanyway - for both issues - the logging module (and not some random prints) is the right tool for the job ^_^ .
>>
>>
>> Using the logging module is an option, but wouldn't I either end up writing to a file or to another table?
>>
>  
> Yep, I was implying that the "burden" of updating the scheduler_run table just to record its output is maybe a "non-wanted" feature for all the ones that don't need that. However, if it turns out it's not heavy at all, I see no problems implementing it.
>

So it might be the best if the burden can be selected with the current behavior as the default. What do you think of another parameter in the task table named update_frequency or similar being 0 as a default resulting in no updates of the output, any number higher than that updates the output every n seconds?


>>
>> Well, actually I did not, but that doesn't mean a lot as I didn't have the requirement of a scheduler in a web framework until recently...
>>
>
> Ok, it's just that if - luckily - there is some code around the web noone is forced to reinvent the wheel.
>
>  
>

> --
>  
>  
>  

Daniel Haag

unread,
Aug 14, 2012, 4:06:39 PM8/14/12
to web...@googlegroups.com

Am Montag, 13. August 2012 22:32:19 UTC+2 schrieb Niphlod:
Ok, done (the "save output for TIMEOUTted tasks").

That's great! If you want I can test it.

 
Small issue, but quite manageable: when a task "timeouts" the output now is saved, and you have the traceback to see "where" it stopped.
e.g. queue function1 with a timeout of 5 seconds

def function1():
    time
.sleep(3)
   
print "first print"
    time
.sleep(5)
   
print "second print"



The scheduler_run records will report:
  1. status = TIMEOUT
  2. output = first print
  3. traceback =
/web2py/gluon/scheduler.py", line 203, in executor
    result = dumps(_function(*args,**vars))
  File "
applications/w2p_scheduler_tests/models/scheduler.py", line 21, in function1
    time.sleep(5)
  File "
/home/niphlod/Scrivania/web2py_source/web2py/gluon/scheduler.py", line 446, in <lambda>
    signal.signal(signal.SIGTERM, lambda signum, stack_frame: sys.exit(1))
SystemExit: 1



Is that ok ? The "small issue" here is that the traceback is "full", starting from where the process is stopped after 5 seconds (the executor function), to the "where" it really stopped (line 21, function1, in models/scheduler.py) that is the useful information.

I wouldn't consider this an issue, actually it's a feature, isn't it?
 
Should the scheduler report only the output and not the traceback for TIMEOUTted tasks?

Niphlod

unread,
Aug 14, 2012, 5:30:31 PM8/14/12
to web...@googlegroups.com

That's great! If you want I can test it.

No problem, that is easy to test and it's working well.
 
I wouldn't consider this an issue, actually it's a feature, isn't it?

I'm ok with that if users won't start asking why there is the full traceback instead of the "restricted" version of it ^_^

Now I'm going to take a look at your patch and see if I can understand all the bits of it . TY again.

Yarin

unread,
Aug 14, 2012, 7:37:10 PM8/14/12
to web...@googlegroups.com
Niphlod- has there been any discussion about a param for clearing out old records on the runs and tasks tables? Maybe a retain_results or retain_completed value that specifies a period for which records will be kept?

niphlod

unread,
Aug 14, 2012, 8:00:49 PM8/14/12
to
Nope, that goes waaaayyy over the scheduler "responsibility". Prune all records, prune only completed, prune only failed, requeue timeoutted, prune every day, every hour, etc, etc, etc.... these are implementation details that belongs to the application.

We though that since it is all recorded and timestamped it's a matter of:

timelimit = datetime.datetime.utcnow() - datetime.timedelta(days=15)
db((db.scheduler_task.status == 'COMPLETED') & (db.scheduler_task.last_run_time < timelimit)).delete()

that is actually not so hard (scheduler_run records should be pruned away automatically because they are referenced)

(I like to have a function "maintenance" fired off every now and then with these things on it.)

Yarin Kessler

unread,
Aug 14, 2012, 8:17:40 PM8/14/12
to web...@googlegroups.com
10 4 - thanks

On Tue, Aug 14, 2012 at 7:48 PM, niphlod <nip...@gmail.com> wrote:
Nope, that goes waaaayyy over the scheduler "responsibility". Prune all records, prune only completed, prune only failed, requeue timeoutted, prune every day, every hour, etc, etc, etc.... these are implementation details that belongs to the application.

We though that since it is all recorded and timestamped it's a matter of:

timelimit = datetime.datetime.utcnow() - datetime.timedelta(days=15)
db((db.scheduler_task.status == 'COMPLETED') & (db.scheduler_task.last_run_time < timelimit)).delete()

that is actually not so hard (scheduler_run records should be pruned away automatically because they are referenced)

(I like to have a function "maintenance" fired off every now and then with these things on it.)



2012/8/15 Yarin <ykes...@gmail.com>
--
 
 
 

--
 
 
 

Yarin

unread,
Aug 18, 2012, 1:45:46 PM8/18/12
to web...@googlegroups.com
I've noticed that repeating tasks that fail during a certain period are no longer repeated and the task is turned to FAILED. I think this is inconsistent behavior. The better approach would be:
  • Allow a periodic task to fail during a given period
  • Reset the task to QUEUED, just like when a periodic task completes
  • Have the scheduler_run table record the failure
In other words, retry_failed should apply to the current repeated attempt, not to the totality of the task.

Another smaller issue- in the scheduler_task table definition, can we place the last_run_time field betwen the start_time and next_run_time fields. This way they are grouped clearly in the appadmin screens.

Thanks--

Niphlod

unread,
Aug 18, 2012, 2:56:42 PM8/18/12
to web...@googlegroups.com
Can you elaborate further on the inconsistent behaviour ?
repeats requeues the task n times (defaults to only completed tasks) and retry_failed make them requeued if  execution fails. You have parameters to let the task be like a cron one (repeats=0, retry_failed=-1).
You have also all the bits to manage your tasks (and I don't "catch" the inconsistency). Are you seeking for supporting some kind of "requeue task only if failed at most 2 times in a 2 minutes timeframe" ?

Yarin Kessler

unread,
Aug 18, 2012, 4:32:14 PM8/18/12
to web...@googlegroups.com
I think retry_failed and repeats are two distinct concepts and shouldn't be mixed.

For example, a task set to (repeats=0, retry_failed=0, period=3600) should be able to fail at 2:00pm, but  will try again at 3:00pm regardless of what happened at 2:00. Likewise, if it was set to (repeats=0, retry_failed=2,period=3600), and failed all three times at 2:00pm, the retry count should be reset on the next go around. 

I think it's safer to presume that if a task is set up for indefinite repitition, a failure on one repeat should not bring down the whole task- rather the transactional unit that constitutes a failure should be limited to the any given attempt, repeated or not.

This was one of the reasons i pressed for renaming repeats_failed to retry_failed- distinct concepts

--
 
 
 

Yarin Kessler

unread,
Aug 18, 2012, 4:36:14 PM8/18/12
to web...@googlegroups.com
And the reason i think the behavior's inconsistent is because when you complete an attempt on a repeating task it is immediately requeued to fulfill its repeat obligations, and the last go-around is forgotton- so i think failures should be handled the same way.

Niphlod

unread,
Aug 18, 2012, 5:16:06 PM8/18/12
to
Ok, got the example (but not the "the last go-round is forgotten").
Let's start saying that your requirements can be fullfilled (simply) decorating your function in a loop and break after the first successful attempt (and repeats=0, retry_failed=-1). Given that, the current behaviour is not properly a limit to what you are trying to achieve, it's only a matter on how implement the requeue facilities on the scheduler.

Lets keep the discussion open...if I got it correctly you're basically asking to ignore period for failed tasks (requeue them and execute ASAP) and reset counters accordingly... right ? Period right now ensures that no more than one task gets executed in n period seconds (and protects you from "flapping", i.e. a continously failing function, and is somewhat required e.g. for respecting webservices API limits, avoid "db pressure" if you're doing heavy operations, etc, etc).
Respecting period in every case is "consistency" for me (because I decided that I can "afford" (or "consume" resources) executing that function only one time every hour).
You are suggesting to alter this for repeating tasks....what I didn't get is that is required always or only when repeats=0 (that is, incidentally, not consistent :P) ?!
 
i.e. What behaviour should you expect from (repeats=2, retry_failed=3, period=3600) ?
2.00 am, failed
2.00 am, failed
2.00 am, completed
3.00 am, failed
3.00 am, failed
3.00 am, failed
?
This is basically what I'm missing. What could possibly be wrong at 2.00am and be right a few seconds later ?

Yarin

unread,
Aug 18, 2012, 6:02:14 PM8/18/12
to
OK i didn't understand that retries happened periodically- i indeed thought that it would retry right away, though i agree with you that that should be handled at the function level. But if we're handling failures within the scheduled function, then now im wondering what is the value in having retries at all? Just because the scheduler is running asynchronously does not mean it should necessarily be responsible for the scheduled functions' unhandled exceptions (which is what the failures are, right)? In other words, since our scheduler is scheduling we2py functions in a known environment (unlike environment agnostic task-queue systems, which don't know how their operations will resolve), shouldn't the onus be on the scheduled function to handle failures and reschedule if necessary? Maybe we should clarify this before discussing the rest- i may be missing something.

btw im leaving for the night but am interested in finishing the discussion- ill be back in the morning if you dont hear from me.. 


On Saturday, August 18, 2012 5:14:46 PM UTC-4, Niphlod wrote:
Ok, got the example (but not the "the last go-round is forgotten").
Let's start saying that your requirements can be fullfilled (simply) decorating your function in a loop and break after the first successful attempt (and repeats=0, retry_failed=-1). Given that, the current behaviour is not properly a limit to what you are trying to achieve, it's only a matter on how implement the requeue facilities on the scheduler.

Lets keep the discussion open...if I got it correctly you're basically asking to ignore period for failed tasks (requeue them and execute ASAP) and reset counters accordingly... right ? Period right now ensures that no more than one task gets executed in n period seconds (and protects you from "flapping", i.e. a continously failing function, and is somewhat required e.g. for respecting webservices API limits, avoid "db pressure" if you're doing heavy operations, etc, etc).
Respecting period in every case is "consistency" for me (because I decided that I can "afford" (or "consume" resources) executing that function only one time every hour).
You are suggesting to alter this for repeating tasks....what I didn't get is that is required always or only when repeats=0 (that is, incidentally, not consistent :P) ?!
 
i.e. What behaviour should you expect from (repeats=2, retry_failed=3, period=3600) ?
2.00 am, failed
2.00 am, failed
2.00 am, completed
3.00 am, failed
3.00 am, failed
3.00 am, failed
?
This is basically what I'm missing. What could possibly be wrong at 2.00am and be right a few seconds later ?

Niphlod

unread,
Aug 19, 2012, 7:13:15 AM8/19/12
to web...@googlegroups.com
I didn't say that you must have to handle exceptions exclusively in your functions, but that if you want a functionality of the kind "execute this for the next 2 minutes and retry ASAP 3 times at most" and still you want to have a single scheduler_task record it's the way to go.  Sometimes your functions relies on third-party services that are not "handable" in your functions: you can manage the exception but you still want to execute that function (e.g. you want to send an email but your email server doesn't reply). The mail should be sent anyway, possibly as soon as the email server is available again.... here's where the retry_failed comes handy. Of course if it fails e.g. for 10 times it's better to stop trying and inspect the email server :P


On Saturday, August 18, 2012 11:48:41 PM UTC+2, Yarin wrote:
OK i didn't understand that retries happened periodically- i indeed thought that it would retry right away, though i agree with you that that should be handled at the function level. But if we're handling failures within the scheduled function, then now im wondering what is the value in having retries at all? Just because the scheduler is running asynchronously does not mean it should necessarily be responsible for the scheduled functions' unhandled exceptions (which is what the failures are, right)? In other words, since our scheduler is scheduling we2py functions in a known environment (unlike environment agnostic task-queue systems, which don't know how their operations will resolve), shouldn't the onus be on the scheduled function to handle failures and reschedule if necessary? Maybe we should clarify this before discussing the rest- i may be missing something.

btw im leaving for the night but am interested in finishing the discussion- ill be back in the morning if you dont hear from me.. 

On Saturday, August 18, 2012 5:14:46 PM UTC-4, Niphlod wrote:
Ok, got the example (but not the "the last go-round is forgotten").
Let's start saying that your requirements can be fullfilled (simply) decorating your function in a loop and break after the first successful attempt (and repeats=0, retry_failed=-1). Given that, the current behaviour is not properly a limit to what you are trying to achieve, it's only a matter on how implement the requeue facilities on the scheduler.

Lets keep the discussion open...if I got it correctly you're basically asking to ignore period for failed tasks (requeue them and execute ASAP) and reset counters accordingly... right ? Period right now ensures that no more than one task gets executed in n period seconds (and protects you from "flapping", i.e. a continously failing function, and is somewhat required e.g. for respecting webservices API limits, avoid "db pressure" if you're doing heavy operations, etc, etc).
Respecting period in every case is "consistency" for me (because I decided that I can "afford" (or "consume" resources) executing that function only one time every hour).
You are suggesting to alter this for repeating tasks....what I didn't get is that is required always or only when repeats=0 (that is, incidentally, not consistent :P) ?!
 
i.e. What behaviour should you expect from (repeats=2, retry_failed=3, period=3600) ?
2.00 am, failed
2.00 am, failed
2.00 am, completed
3.00 am, failed
3.00 am, failed
3.00 am, failed
?
This is basically what I'm missing. What could possibly be wrong at 2.00am and be right a few seconds later ?

Yarin

unread,
Aug 20, 2012, 5:41:15 PM8/20/12
to web...@googlegroups.com
OK i've come around- agree this is the right set up, let's just make sure it's clear in the eventual documentation, as it wasn't obvious to me (not much is these days..) - both retries and repeats respect the period. Cool, i like it. 

Niphlod

unread,
Aug 20, 2012, 5:44:49 PM8/20/12
to web...@googlegroups.com
hey, more point of views, more eyes on the code = less errors in the code, more understandable docs, etc.
Opensource development bases.
And I like smart questions :-P

Daniel Haag

unread,
Aug 24, 2012, 11:08:23 AM8/24/12
to web...@googlegroups.com
Just a small thing: Is it possible to have the -g option (the groups to be picked by the worker) when calling the worker with the -K arg from the main web2py.py?

maybe somthing like
python web2py.py -K appname(group1,group2,...)

Andrew

unread,
Aug 24, 2012, 12:07:10 PM8/24/12
to web...@googlegroups.com
Hi Niphlod, what drawing tool did you use for your diagrams in the instructions?
Still think your explanations and doco are great.

Niphlod

unread,
Aug 24, 2012, 12:10:54 PM8/24/12
to web...@googlegroups.com
it's yuml.me, a webapp.

Niphlod

unread,
Aug 24, 2012, 12:13:02 PM8/24/12
to web...@googlegroups.com
Hi, what use should it have if you have the possibility in your app to do declare Scheduler(db, group_names=['group1']) ?

Daniel Haag

unread,
Aug 24, 2012, 2:35:23 PM8/24/12
to web...@googlegroups.com
You could run different task groups with different privileges/priority

Niphlod

unread,
Aug 24, 2012, 2:51:58 PM8/24/12
to web...@googlegroups.com
uhm. You can already change dinamically those just altering the group_names into the scheduler_worker table. But I'll work something out.

Niphlod

unread,
Aug 24, 2012, 4:12:44 PM8/24/12
to web...@googlegroups.com
@Daniel: Ok, I worked out a patch to allows -K app1:group1,app2:group1:group2 (old syntax still works ok). Sent to you privately, can you check it ?

Niphlod

unread,
Feb 23, 2013, 1:21:18 PM2/23/13
to web...@googlegroups.com
resuming historic thread.
Latest commits added a few features, and changed schemas a little (my fault, sorry).
Now db schema complies with check_reserved=['all'], so should work in any RDBMS out there:
- scheduler_run.output --> scheduler_run.run_output
- scheduler_run.result --> scheduler_run.run_result
- scheduler_run.scheduler_task --> scheduler_run.task_id

New features:
- API available: .queue_task(), .task_status(), .resume(), .disable(), .terminate(), .kill()
- W2P_TASK variable injected into tasks
- a new "immediate=True" parameter to queue_task in order to wake up a "nothing to do..." looping worker

The app on Github has been updated with all the new features (use web2py's trunk, not stable).

I'm planning to include also common patterns using the scheduler. As always, feel free (or, should I say, compelled ? :P) to propose your "most wanted" patterns (e.g. manage an email queue)

 

Yarin Kessler

unread,
Feb 25, 2013, 4:59:28 PM2/25/13
to web...@googlegroups.com
Sweet- looking forward to using the API. Schema changes a pain but done for right reasons. Can you give more explanation of the immediate=True param?

As for patterns- a basic event calendar would be good demo

Thanks for the great work Niphlod



 

--
 
---
You received this message because you are subscribed to the Google Groups "web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web2py+un...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Niphlod

unread,
Feb 26, 2013, 7:59:00 AM2/26/13
to web...@googlegroups.com


On Monday, February 25, 2013 10:59:28 PM UTC+1, Yarin wrote:
Sweet- looking forward to using the API. Schema changes a pain but done for right reasons. Can you give more explanation of the immediate=True param?

The app has some docs about it, but to make a tl;dr of it scheduler checks for new tasks every 5 loops. So, worst case scenario, you can have a timeframe of heartbeat*5 seconds from the moment you queued the task to the moment it gets ASSIGNED.
If you set the worker status to PICK (only to the ticker, the one with is_active=True) as soon as he reads the status (so, every heartbeat seconds) and see "PICK" it will try to assign the tasks without waiting for the 5 loops.  immediate=True to queue_task just coordinates all the bits: it inserts the task and set to PICK the ticker.


As for patterns- a basic event calendar would be good demo

Can you elaborate on that a bit ?
 

Reply all
Reply to author
Forward
0 new messages