Scheduler repeat time drift

199 views
Skip to first unread message

Brian M

unread,
Jan 8, 2014, 11:10:38 AM1/8/14
to web...@googlegroups.com
Is there any way to keep the time at which recurring tasks run from drifting? I've got several daily tasks that over time go from running at say 10am to 10:30am - it seems like each consecutive run is 20 seconds or so behind the previous day's. Is this simply a matter of the next execution time being set based on the ending time of the current run rather than the starting time?  If I want to better enforce running at a certain time daily do I need to resort to having a maintenance task run say once a week and reset the next run time of the daily tasks so they don't drift too far? I suppose that this isn't too much of an issue for tasks that run on a more regular basis like once a minute to process constantly updated queues, but for things that you want to run at a certain set time it's a bit annoying.

Thanks
Brian

Niphlod

unread,
Jan 8, 2014, 2:52:31 PM1/8/14
to web...@googlegroups.com
I don't exactly get how you scheduled the tasks and how are you expecting them to run.
If you schedule a task to start at 10am and set a period of 24*60*60, the task will be "requeued" to be executed at 10am.
As the book says,

The time period is not calculated between the END of the first round and the START of the next, but from the START time of the first round to the START time of the next cycle)

Brian M

unread,
Jan 8, 2014, 4:14:45 PM1/8/14
to web...@googlegroups.com
I've got it scheduled to repeat every 86400 seconds (24*60*60) so once a day.  I thought that it was supposed to keep the same start time like the book says but it has definitely been drifting - part of the task that's queued logs to a DB and I can see the timestamps there consistently drifting ~20 seconds later each day.  I had another task originally setup to send out a daily email at 9AM and it's drifted to going out around 9:32AM over the last several months.

Niphlod

unread,
Jan 9, 2014, 3:33:14 PM1/9/14
to web...@googlegroups.com
okkey.........then let me check the code and report back

Niphlod

unread,
Jan 9, 2014, 4:07:16 PM1/9/14
to web...@googlegroups.com
ok, got the "issue". We check for last_run_time and add n = period seconds to it, that is different from what you'd like, i.e. adding n = period seconds * times_run to start_time.

The wording in the book is correct: you get your task queued to be executed n period seconds after the first execution.

The concept is subtle however... that actually means e.g. :
- today the task got queued to be executed at 9:00am...but has been picked up at 9:01am (e.g., that the worker was sleeping or busy processing other tasks) --> the task will be requeued for tomorrow at 9:01am.
- tomorrow, at 9:01am, scheduler "sees" the task that has to be executed, but once again the scheduler is processing another queued task and it gets actually executed at 9:02am...it will be scheduled for the next day at 9:02am.

This "behaviour" was meant with a slightly different logic then the one you mean (yours is valid too, actually, and is making me think on how to solve it)... basically with the scheduler we "drifted" from a "millisecond-perfect cron replacement" to a different approach....... Let's cut the story, here it goes for the summary of the current logic:
- I want to schedule a task that polls a database/external service once every 5*60 seconds (5 minutes)
- I want to be sure that two consecutive executions will never take place unless 5*60 seconds passed

This basically assures you that if your task was scheduled at 09:00 but the worker was caught up doing something else, you won't have a first execution at 09:03 and a second at 09:05 because it would mean "breaking" the assumption that at least 5*60 seconds pass before each execution, and that's good for a lot of cases, but not yours.

Yours is different: you have a task and you want it to start it after 09:00, but every day at 09:00.
Scheduler instead "assumes/figures out/calculates/interprets" your wishes as "you want your emails sent at 24hours intervals", hence the delay you're observing when a few days pass by.

To support your wishes, ATM you should then have a "cleanup" task that sets "future" next_run_time(s) to start_time(s) + n = period seconds (multiplied by times_run), or a task that queues once a day a single execution at 09:00.

We can figure out another "boolean" column for the scheduler_task table that's more or less named "cron_style" (default=False), that if set to True will calculate the next_run_time as the behaviour you want instead of the original one. Let me check with the developers and we'll see what can we do.

Yi Liu

unread,
Sep 1, 2014, 2:11:18 AM9/1/14
to web...@googlegroups.com
Hi, any updates on Cron_style scheduler?

Scheduler is definitely more powerful than cron? Unfortunately, it cannot do what cron can do simply :( I am developing a web app that scrapes a page at 9am every day. Really wish scheduler can do this easily.

I also read your comments here: http://comments.gmane.org/gmane.comp.python.web2py/112088

Even this schuduler-in-a-scheduler (schedu-ception!) workaround will eventually need a cleanup. Because the parent scheduler will drift one minute a day and eventually, on a special day, the accumulated drift will jump over 9:00am and skip scheduling on that day.

Btw, you web2py developers are awesome. I love web2py so much!

Best, Yi

Niphlod

unread,
Sep 1, 2014, 3:53:29 AM9/1/14
to web...@googlegroups.com
prevent_drift is yet on core since 2.8.2

Yi Liu

unread,
Sep 1, 2014, 10:14:48 AM9/1/14
to web...@googlegroups.com
Just read carefully and found that in the book. Thanks!
Reply all
Reply to author
Forward
0 new messages