Cron jobs auto repeat on failure option?

778 views
Skip to first unread message

Marcel Manz

unread,
Jul 19, 2013, 12:06:03 PM7/19/13
to google-a...@googlegroups.com
Sometimes it happens that cron jobs aren't executed correctly and get marked as 'failed' in the cron overview. Is there an option to tell App Engine to automatically retry instead of marking the job as 'failed' ?

We use simple cron jobs to distribute workloads via the taskqueue to worker processes. If the cron fails to execute, the workload doesn't get distributed.

Before we implement some cron-failure checks via marking and checking the execution-success in datastore, I would like to ask first if there's any way to have App Engine auto-repeat the cron job instead of marking it 'failed' ?

Thanks
Marcel

Jason Collins

unread,
Jul 19, 2013, 12:27:06 PM7/19/13
to google-a...@googlegroups.com
We always use the (self-named) "cron-task" pattern.

That is, our cron jobs do nothing more than queue a task to do the actual work, and the tasks have their own retry policy. Queuing a task is a very high success operation.

j

Marcel Manz

unread,
Jul 19, 2013, 12:34:45 PM7/19/13
to google-a...@googlegroups.com
Likely the same as our crons are doing - they just do some key scanning and enqueue tasks to the taskqueue, having their own retry policy. The cron job itself only runs for a few seconds until all tasks are dispatched.

Unfortunately from time to time this simple operation fails. When it happens we can see that the cron request was aborted after 10 minutes. For some reason the handler doesn't start correctly and app engine times out after 10 minutes marking the cron as failed.

Joshua Smith

unread,
Jul 19, 2013, 1:14:01 PM7/19/13
to google-a...@googlegroups.com
Have your crons simply schedule a task. Tasks automatically retry when they fail, with geometrically-increasing backoff intervals.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

pdknsk

unread,
Jul 19, 2013, 8:51:14 PM7/19/13
to google-a...@googlegroups.com
As you may notice, cron itself is a task that runs on queue __cron, so it seems trivial to implement this functionality. There may or may not be a good reason Google doesn't. I had asked about it before.

Jeff Schnitzer

unread,
Jul 19, 2013, 10:51:59 PM7/19/13
to Google App Engine
Can you have the cron job do one and only one thing, enqueue a single task that does the work?

Jeff



Marcel

--

Jeff Schnitzer

unread,
Jul 19, 2013, 10:54:27 PM7/19/13
to Google App Engine
Weird, gmail wasn't showing me the prior replies. Sorry about repeating what everyone else already said.

That said, I'd still cut out the key scanning or any other operation other than "enqueue this one task".

Jeff

Jacob Taylor

unread,
Jul 21, 2013, 2:49:37 AM7/21/13
to google-a...@googlegroups.com
A fun little gotcha about this. You may not really know why that 10 minute interval was triggered and it is likely destructive.
The timeout can be triggered by something else. If any thread on the instance takes too long, the entire instance is killed immediately along with all active requests. We ran into an issue with a huge job that was making it look like logging.info was taking way too long. logging.info was just where the context switching was occurring.

We have also had our fair share of problems adding items to the queue "transient error" when load surges.


On Fri, Jul 19, 2013 at 9:34 AM, Marcel Manz <marce...@gmail.com> wrote:
Likely the same as our crons are doing - they just do some key scanning and enqueue tasks to the taskqueue, having their own retry policy. The cron job itself only runs for a few seconds until all tasks are dispatched.

Unfortunately from time to time this simple operation fails. When it happens we can see that the cron request was aborted after 10 minutes. For some reason the handler doesn't start correctly and app engine times out after 10 minutes marking the cron as failed.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply all
Reply to author
Forward
0 new messages