Some tasks are never executed and dynamic backend gets stuck

Luis

unread,

Oct 9, 2013, 3:55:40 AM10/9/13

to google-a...@googlegroups.com

Hi all,

We are suffering an intermittent (or non-consistent or non-systematic) issue affecting tasks that should be executed in a dynamic backend, but that some reason in certain moments they are never executed.

Basically we have a cron job which identifies certain heavy activities that need to be executed. The cron job sends these activities as tasks to a queue as a mean to have them executed in an ordered way in a dynamic backend.

The problem is that sometimes the tasks are never executed and the backend gets stuck or zombie. In the backend logs we see no evidence at all of the tasks (the actual request doesn't get recorded), even though according to the tasks queues console there should some of them running.

As this has a huge impact on the reliability of our application, we have managed to identify programmatically when this situation comes up and so we could try to resolve it. Unfortunately, even though we purge the queues (also tried deleting the tasks one by one), the tasks which are supposed to be running keep the backend in the zombie state.

The only solution we have found is to manually stop and start the backend through the console. Only when we do this we see in the backend logs error messages related to the tasks that were supposed to be running (please see the attached image): "Process terminated because the backend took too long to shutdown."

As you may understand this has a huge impact for us. We need a 100% reliability that the tasks being sent to the queue are going to be executed. We can't afford to monitor the system all day and restart manually the backend when this issue comes up.

We are on Java and M/S. Have you ever faced something similar? Any ideas or suggestions to get rid of this issue? We already opened a ticket for this (https://code.google.com/p/googleappengine/issues/detail?id=10011), but no answer at all so far.


Hope you can help us. Many thanks,
Luis

Log messages.png

Vinny P

unread,

Oct 10, 2013, 2:13:47 AM10/10/13

to google-a...@googlegroups.com

On Wed, Oct 9, 2013 at 2:55 AM, Luis <l.pereira...@gmail.com> wrote:

Basically we have a cron job which identifies certain heavy activities that need to be executed. The cron job sends these activities as tasks to a queue as a mean to have them executed in an ordered way in a dynamic backend.


The problem is that sometimes the tasks are never executed and the backend gets stuck or zombie. In the backend logs we see no evidence at all of the tasks (the actual request doesn't get recorded), even though according to the tasks queues console there should some of them running.

It's odd that you're not seeing any error logs from the tasks. Are you sure you're seeing all the logs? If you download your logs, do you see any error logs pertaining to the backends there?

Can you try moving the task queues to send requests to a front end instance and see if that works? What settings is each task configured with (for instance, the retryoptions)?

On Wed, Oct 9, 2013 at 2:55 AM, Luis <l.pereira...@gmail.com> wrote:

We are on Java and M/S. Have you ever faced something similar? Any ideas or suggestions to get rid of this issue?

Try to migrate off M/S to the HRD datastore: https://developers.google.com/appengine/docs/adminconsole/migration

M/S occasionally has weird, difficult-to-trace errors.

-----------------

-Vinny P

Technology & Media Advisor

Chicago, IL

App Engine Code Samples: http://www.learntogoogleit.com

Luis Pereira

unread,

Oct 13, 2013, 12:48:26 PM10/13/13

to google-a...@googlegroups.com

Hi Vinny,

We do appreciate taking the time to go through our issue.

Haven't had the chance to download the log files yet. But in the console, there is no sign at all about these tasks in the backend log files. Only when we shut down the backend we see those commented errors.

We can't move the task execution to regular front instances as some of the tasks take more than 10 minutes to be executed (these tasks are reports and can really take very long). This is one of the queue configuration we are having troubles with:

<queue>

<name>scheduledReportsQueue</name>

<bucket-size>100</bucket-size>

<max-concurrent-requests>2</max-concurrent-requests>

<retry-parameters>

<task-retry-limit>2</task-retry-limit>

<task-age-limit>1h</task-age-limit>

</retry-parameters>

</queue>

We are moving to HRD in the coming weeks. Hope the issue disappears after the migration.

This backend we are having the issue is set as private. We are going to make it public to see if there is any difference, and see what happens if we make a regular HTTP request after it gets stuck.

Many thanks,

Luis

2013/10/10 Vinny P <vinn...@gmail.com>

--
You received this message because you are subscribed to a topic in the Google Groups "Google App Engine" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-appengine/nMTnpLQiPEI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
For more options, visit https://groups.google.com/groups/opt_out.

Vinny P

unread,

Oct 14, 2013, 8:00:09 PM10/14/13

to google-a...@googlegroups.com

On Sun, Oct 13, 2013 at 11:48 AM, Luis Pereira <l.pereira...@gmail.com> wrote:

Haven't had the chance to download the log files yet. But in the console, there is no sign at all about these tasks in the backend log files. Only when we shut down the backend we see those commented errors.

Backends flush log data on a periodic basis, see https://developers.google.com/appengine/docs/java/backends/#Java_Periodic_logging for documentation. If you're not seeing any logs, try forcibly flushing logs by calling ApiProxy's flushLogs() method: https://developers.google.com/appengine/docs/java/javadoc/com/google/apphosting/api/ApiProxy

What's odd is that there should still be logs of the request, even if the application itself is not printing any log data. Hopefully the missing logs will be recorded in the downloadable logs service.

On Sun, Oct 13, 2013 at 11:48 AM, Luis Pereira <l.pereira...@gmail.com> wrote:

We are moving to HRD in the coming weeks. Hope the issue disappears after the migration.

The HRD migration is the best chance to fix this problem - M/S has unusual issues, and it wouldn't surprise me at all if that was the problem.

Ezequiel Muns

unread,

Oct 16, 2013, 9:36:01 PM10/16/13

to google-a...@googlegroups.com

I'm experiencing something similar here, tasks are dispatched to a backend module. The backend instance displays the call to /_ah/start and then the task is shown as 'running' (in the Task Queue section of the console). The logs don't show any evidence that the task is running.

If I manually invoke another call to the same backend instance, even to a different callback, the task 'returns' and the log is shown as expected but with a very large processing time (equivalent to the time between when the task was scheduled to run and me manually invoking the callback). Checking appstats shows that the task actually run in the time I expected (< 1s), it seems that the backend just got stuck after the handler finished.

Subsequent calls to the same instance do not exhibit the same behaviour, only fresh instances.

Ezequiel Muns

unread,

Oct 16, 2013, 9:36:34 PM10/16/13

to google-a...@googlegroups.com

Oh and may I just add I am not using the datastore in any way.

Reply all

Reply to author

Forward