Task queue not scaling up!

84 views
Skip to first unread message

Waleed Abdulla

unread,
Jun 18, 2011, 9:14:57 PM6/18/11
to google-appengine
My app is mostly a backend app driven by the task queue (cron job inserts task -> task inserts next task, ...etc). Two days ago, the performance dropped considerably. It used to use 70 instances and now it's down to 7. No changes on my part. After investigation it turns out that the task queue simply doesn't want to run tasks as fast as I want it to. My main task queue is running at the rate of 5/minute, even though I've set the rate to 5/s, bucket to 5/s, and no value for max concurrency. I even tried higher rate and bucket sizes, but it's not helping. 

Anyone else noticing this? Can you think of any reason why the task queue doesn't go as fast as I want it to?


Chiguireitor

unread,
Jun 18, 2011, 9:26:13 PM6/18/11
to google-a...@googlegroups.com
There's a daily MAX limit on 100.000 tasks on the free limits, aren't you exceding it?

Branko Vukelic

unread,
Jun 18, 2011, 9:30:39 PM6/18/11
to google-a...@googlegroups.com
I doubt it would slow down if he was exceeding. It'd probably just
stop in that case.

> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/ovMib3uvv1EJ.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengi...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.
>

--
Branko Vukelić
bra...@herdhound.com

Lead Developer
Herd Hound (tm) - Travel that doesn't bite
www.herdhound.com

Chiguireitor

unread,
Jun 18, 2011, 9:44:21 PM6/18/11
to google-a...@googlegroups.com
Remember most quotas have a rate limit... at 17280 tasks/h it will probably get throttled to avoid exceeding the quota (at that rate he would consume his quota in less than 6 hours)

Waleed Abdulla

unread,
Jun 18, 2011, 10:40:18 PM6/18/11
to google-a...@googlegroups.com
Billing is enabled, so I don't believe it's a quota issue. And I checked the dashboard and the quota details page, and there is nothing to indicate it might be quota related.

Also, another separate chain of tasks seem to be working fine and runs at 300/minute (I've set that queue's rate to 5/s so it's about right). Not sure why the main task queue doesn't work faster. It's now at 8 tasks per minute. The only difference between the two queues is that the tasks on the slow queue are long tasks (~20 seconds), while the fast queue is running small tasks ( < 1s). 

I understand that lengthy tasks mean that more tasks are running at the same time, but I've set the max concurrency to 20 yet I can only run 6 to 8 task sper minute. What else could cause a taskqueue to slow down?





On Sat, Jun 18, 2011 at 6:44 PM, Chiguireitor <johnvill...@gmail.com> wrote:
Remember most quotas have a rate limit... at 17280 tasks/h it will probably get throttled to avoid exceeding the quota (at that rate he would consume his quota in less than 6 hours)

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/HtRxuFZFacUJ.

Chiguireitor

unread,
Jun 18, 2011, 10:45:43 PM6/18/11
to google-a...@googlegroups.com
The other issue is that each task spawns a new instance... could you be hitting a max instance limit there? i really don't know if there's an instance limit... but could happen

Waleed Abdulla

unread,
Jun 19, 2011, 1:52:27 AM6/19/11
to google-a...@googlegroups.com
I think I narrowed the issue a little bit. Added a lot of logging.info() statements and I noticed the following:

1. The time between inserting a task and when it starts running is too long: 5 - 20 seconds. Typically tasks should execute right away after inserting if you're below the queue rate. That's issue #1

2. I put a timer in my code to calculate the time it takes to execute each task. When I compare that time with the time app engine reports in the logs, there is a big difference. 5 - 20 seconds. For example, processing a task takes 25 seconds, of which only 10 seconds is actually spent inside my code. So the infrastructure is keeping requests on hold for a while before they get executed. That's issue #2

    When I compare the delay in #1 to the time difference calculated in #2, they're very close and they correlate (i.e. they go up and down together). So that means that issue #1 is probably caused by issue #2. (i.e. tasks do get picked for execution right away, but the actual execution doesn't start until 5 - 20 seconds later). 

    There seems to be an infrastructure problem in GAE. Maybe GAE is experiencing high load, too much network traffic, ..etc. Whatever the reason, it seems that requests are waiting in a queue for too long before they run. 

    Does anyone notice any issues that confirm or deny the above? Any ideas how to work around it?

Waleed






On Sat, Jun 18, 2011 at 7:45 PM, Chiguireitor <johnvill...@gmail.com> wrote:
The other issue is that each task spawns a new instance... could you be hitting a max instance limit there? i really don't know if there's an instance limit... but could happen

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/nsxW4XGycnwJ.

Robert Kluin

unread,
Jun 19, 2011, 11:08:01 AM6/19/11
to google-a...@googlegroups.com
Slow tasks don't get ran nearly as aggressively, at least not in my experience. I've found that I can pump far, far more tasks through if I keep tasks fast (sub 1000ms).

Are you seeing pending ms values in your logs?

Robert


Kaan Soral

unread,
Jun 19, 2011, 3:22:24 PM6/19/11
to Google App Engine
I have your problem too,

I haven't checked the situation since weeks but right after
development a mapreduce job, which was doing 800k operations was
showing results in ~30 minutes, but that time increased to 3 hours ...

I guess this is because they are re-designing appengine in a wrong
direction

In future we will get charged by instance usage, so you wouldn't want
70 instances, and I guess the modifications are in that direction

I don't like this, but I wouldn't want to pay a lot either,

sad ...
> <johnvillarzava...@gmail.com>wrote:

Waleed Abdulla

unread,
Jun 19, 2011, 10:01:35 PM6/19/11
to google-a...@googlegroups.com
Thank you, Robert. I think you nailed it. After a lot of experimentation I reached the same conclusion. GAE seems to be punishing long running tasks (my tasks take about 20 seconds each). The longer the tasks run, the fewer of them you can execute. 

This seems odd to me because doing more work per task is more efficient. If I split my 20 seconds work into 20 tasks (1 seconds each), then I'm duplicating the overhead 20 times. And the App Engine team just recently increased the task timeout to 10 minutes, which tells me that they want to make it possible to do more work per task. 


Kann,
    Mapreduce is designed to do as much work per task as possible. So, yes, I believe your tasks are being punished as well and that explains the 6x increase in time. But it doesn't add up. When I changed my tasks to do less work, suddenly the app started scaling better and more instances were created. If the goal is to use less instances, then long running tasks should be encouraged because they do more work with less instances. Maybe it's a bug in their scaling algorithm?

stevep

unread,
Jun 20, 2011, 2:33:49 PM6/20/11
to Google App Engine
It is vitally important that developers get a better sense of what
affects task queue delays**. TQs are extremely important IMHO for
maintaining low latency of the on-line handler. Yet there *must* be
some predictability regarding task delays for at least one queue. If
this already exists, point me to the docs Please!

cheers,
stevep

** - I've given up any hope of seeing a separate, high-reliability,
low-volume queue. Now just hoping to gain some specifics from Google
about how to predict TQ performance. Until then, we have Robert who I
can't thank enough for his on-going help in these forums.

Waleed Abdulla

unread,
Jun 21, 2011, 6:35:17 PM6/21/11
to google-a...@googlegroups.com
So far, doing less work per task has been helping in getting the app to do more work faster. But it's still not fast enough. I still see tasks execute at a lower rate that I like them to. It seems to fluctuate through out the day. Sometimes I'm running 140 tasks per minute (which is good), and other times it runs 10 tasks per minute (very bad). 

Any other tips to help get GAE run my tasks faster? This all started a few days ago, and before that all was good and work was done really fast. Anyone know if something changed in GAE a few days ago?



 


--
Reply all
Reply to author
Forward
0 new messages