softcron oddity

28 views
Skip to first unread message

Jonathan Lundell

unread,
Jul 6, 2012, 5:35:13 PM7/6/12
to web2py-developers
I've been experimenting with soft cron, and either there's something I don't understand, or something's wrong.

My crontab:

* * * * * user *tasks/cron


That means: run it once a minute (within the capability of softcron), right?

(I'm running under mod_wsgi.)

But I see it run on every request. Looking at main, I see this at the end of a request:

if global_settings.web2py_crontype == 'soft':
newcron.softcron(global_settings.applications_parent).start()
return http_response.to(responder)

So, a new softcron object is getting created on every request. How can that be right (and what's the lifetime of the softcron object? Until its thread exits?)?

Looking at the code, I'm getting a bad feeling about the way the scheduling works. Somebody reassure me, please.

Massimo DiPierro

unread,
Jul 6, 2012, 7:37:03 PM7/6/12
to web2py-d...@googlegroups.com
When you say "scheduling" you still refer to softcron, right? I will double check but I do not swear by softcron.

massimo
> -- mail from:GoogleGroups "web2py-developers" mailing list
> make speech: web2py-d...@googlegroups.com
> unsubscribe: web2py-develop...@googlegroups.com
> details : http://groups.google.com/group/web2py-developers
> the project: http://code.google.com/p/web2py/
> official : http://www.web2py.com/

Jonathan Lundell

unread,
Jul 6, 2012, 7:50:28 PM7/6/12
to web2py-d...@googlegroups.com
On 6 Jul 2012, at 4:37 PM, Massimo DiPierro wrote:
> When you say "scheduling" you still refer to softcron, right?

Right.

> I will double check but I do not swear by softcron.

The logic appears to be something like this. Suppose you have a crontab entry:

0 * * * * user task

The logic appears to be saying: if the current datetime minute value is 0 for any request, run task. Otherwise don't.

As a consequence, if multiple requests come in during minute 0, the task will be run for each of them. On the other hand, if no request comes in until minute 1, then the task will have to wait an entire hour to run (and then only if a request comes in during the next minute 0).

Similarly,

* * * * * user task

...runs the task on every request.

I think.

If that's right, then it's also the case (is it not?) that we go through the portalocker for every request.

Massimo DiPierro

unread,
Jul 6, 2012, 8:31:38 PM7/6/12
to web2py-d...@googlegroups.com
It is true that softcron makes a new instance at every request. The instance runs in its own thread and it locks the crontab file. (therefore it serializes the picking of cron tasks but not the original request).

That is how it was originally designed. This was for system running - for example - cgi and not cam able of running a background task.

If you would change it, how would you change? What use case do you have in mind that is not covered by hardcron?

Massimo

Jonathan Lundell

unread,
Jul 6, 2012, 8:52:00 PM7/6/12
to web2py-d...@googlegroups.com
On 6 Jul 2012, at 5:31 PM, Massimo DiPierro wrote:
> It is true that softcron makes a new instance at every request. The instance runs in its own thread and it locks the crontab file. (therefore it serializes the picking of cron tasks but not the original request).
>
> That is how it was originally designed. This was for system running - for example - cgi and not cam able of running a background task.
>
> If you would change it, how would you change?

At the very least, some what to ensure that my first example below would run even if no requests comes in during minute 0, and that only one request would run on the minute.

> What use case do you have in mind that is not covered by hardcron?

mod_wsgi.

I suppose I'll go to extcron, or something like that. I was hoping for a more self-contained solution, though.

I like the idea of the scheduler, but I'd prefer that it be self-starting. And the existing scheduler code looks way too complicated to debug.

My use case: I'm using a database to keep a history of video feeds, and also as a cache. Client requests are satisfied from the (cached) db table, and a background task periodically polls the feed sources and updates the table. The "homemade task queue" in the book would do, but I was looking for a simple way to guarantee that there was always exactly one copy running (except during an update), and I was having trouble doing that—it keeps getting overcomplicated.

Seems to me that kind of simple requirement must be a common one, and it'd be nice to have a built-in mechanism to satisfy it.

I'm also looking at coordinating it through memcached, so as to be able to work with multiple servers. But that's anther story.
Reply all
Reply to author
Forward
0 new messages