Tasks are not running

34 views
Skip to first unread message

Marc Dugger

unread,
Oct 21, 2016, 8:41:00 AM10/21/16
to Google App Engine
When I have > 1000 tasks queued, they do not execute. They simply sit in a waiting state but eventually run hours later (see attached screenshot).

Any insight into why this would happen?
Screen Shot 2016-10-20 at 10.06.00 AM.png

Adam (Cloud Platform Support)

unread,
Oct 22, 2016, 4:38:46 PM10/22/16
to Google App Engine
This is an issue that sometimes comes up when having a large number of tasks enqueued in a single queue, or enqueueing many tasks at once. Sometimes this is due to many tasks failing in a short time and causing the queue retry to hit the maximum default backoff time of one hour, causing the queue to stall. In other cases simply having > 1000 tasks in a single queue can cause transient contention issues with the underlying scheduler.

There are some ways to mitigate this:
  1. Shard your queues, or in other words don't add all your tasks to a single queue but distribute them among several different queues.
  2. Try to avoid adding tasks simultaneously, especially if the tasks being added are scheduled to execute immediately. If possible also try to add tasks that are scheduled to execute in the future, at least 5 minutes later.
  3. Adding to the above, when adding large numbers of tasks that start at the same time (or relatively close to the same time), be sure that all the add operations have completed before the scheduled leasing or execution time.
  4. Add tasks from a single thread. If all task add calls are made sequentially, the risks of contention between calls is minimized.
  5. Add scheduled tasks set to run at different scheduled times. The interval between them does not matter, for example 8:00:00 PM, 8:00:01 PM, and 8:00:02 PM will create 3 different 'buckets' for the tasks.
  6. Backoff on errors when adding tasks. If task add calls fail or specific tasks within a request fail, wait before retrying (preferably using exponential backoff).
  7. If you're following these guidelines and still see the queue stall from time to time, you can adjust max_backoff_seconds in your queue configuration.
Reply all
Reply to author
Forward
0 new messages