Wouldn't it be great if pull queues supported arbitrary tagging

113 views
Skip to first unread message

Jason Collins

unread,
May 16, 2011, 9:53:03 PM5/16/11
to Google App Engine
As pointed out by all-around-smart-dude Robert Kluin, it would be
great if you could place arbitrary tags on a task when submitting to a
pull queue, and then lease the tasks back out with that tag. This
would provide a great, built-in mechanism to do groupings, etc.

This would yield:

q = taskqueue.Queue('pull0')
q.add(taskqueue.Task(payload=payload_str, method='PULL',
tag='mytag'))

and

q = taskqueue.Queue('pull0')
sometasks = q.lease_tasks(3600, 100, tag='mytag')

where "tag" is an optional kwarg in both cases.

Star http://code.google.com/p/googleappengine/issues/detail?id=5061,
or discuss there.

Jeff Schnitzer

unread,
May 17, 2011, 12:16:40 AM5/17/11
to google-a...@googlegroups.com
I (and, from the sound of it, a million other people) asked Nick this
exact question at I/O, and it sounds like this is near the top of the
list for the Task Queue team.

(but yeah, I starred it too)

Jeff

> --
> You received this message because you are subscribed to the Google Groups "Google App Engine" group.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
>
>

Robert Kluin

unread,
May 17, 2011, 12:34:06 AM5/17/11
to google-a...@googlegroups.com
I offerred to buy Nick a shot/glass of what ever he likes to drink if
he can make this happen. He looked pretty tempted. Maybe if a few
other people volunteer to contribute a shot/glass we can help this
along.... Of course staring the issue is a good start too.

Pull queues make a lot of very neat things possible. Having some type
of grouping will make some truly awesome things possible.

Robert

Michael Hermus

unread,
May 1, 2012, 5:41:56 PM5/1/12
to google-a...@googlegroups.com
It looks like you recently got your wish for this excellent feature. Am I right in thinking that this can (and should) be used to address the challenges with high throughput updates that still exist? I know the 2010 Google I/O presentation by Brett Slatkin has been referenced by many folks, but there seem to be some issues with that, specifically the Eventual Consistency of the HRD. A push/pull queue based implementation might look like this:

For each unit of work:
-Write work to Pull queue with tag=BatchID
-Write named 'fan-in' task to Push Queue for execution <batch period ms> in the future, containing the BatchID

When the named task executes:
-Lease all available tasks from the Pull queue using the tag=BatchID
-Aggregate the work
-Apply the update to the Datastore

Is a feasible/appropriate/recommended use of the Pull Queue tag mechanism?

Brett Slatkin

unread,
May 1, 2012, 7:48:28 PM5/1/12
to google-a...@googlegroups.com
Indeed!

 
For each unit of work:
-Write work to Pull queue with tag=BatchID
-Write named 'fan-in' task to Push Queue for execution <batch period ms> in the future, containing the BatchID
When the named task executes:
-Lease all available tasks from the Pull queue using the tag=BatchID
-Aggregate the work
-Apply the update to the Datastore

Is a feasible/appropriate/recommended use of the Pull Queue tag mechanism?
 
Example:
http://code.google.com/p/8-bits/source/browse/trunk/backend/main.py#202

I haven't load tested this yet, but I think it should work?

-Brett

Michael Hermus

unread,
May 2, 2012, 4:34:55 PM5/2/12
to Google App Engine
Excellent, thanks! One question though: isn't there an issue similar
to the HRD 'Eventual Consistency' with the Task Queue? In other words,
there is a variable latency between queue insert and lease
availability that could potentially spike high enough so that the fan-
in task misses some work.

If this is true, we still need some sort of cleanup mechanism for a
robust implementation. I have several ideas for this, but wanted to
make sure I wasn't missing something.

Brett Slatkin

unread,
May 2, 2012, 5:46:27 PM5/2/12
to google-a...@googlegroups.com
On Wed, May 2, 2012 at 1:34 PM, Michael Hermus <michael...@gmail.com> wrote:
Excellent, thanks! One question though: isn't there an issue similar
to the HRD 'Eventual Consistency' with the Task Queue? In other words,
there is a variable latency between queue insert and lease
availability that could potentially spike high enough so that the fan-
in task misses some work.

If this is true, we still need some sort of cleanup mechanism for a
robust implementation. I have several ideas for this, but wanted to
make sure I wasn't missing something.

I think having a cron once a minute or so to fetch all tasks on the pull queue (regardless of tag) and re-insert corresponding push tasks is a good idea to make it robust.

Michael Hermus

unread,
May 3, 2012, 10:56:33 AM5/3/12
to Google App Engine
Definitely, assuming the queue maintains FIFO ordering (which the
documentation seems to indicate).

However, I was concerned about determining how deep to go into the
pull queue during cleanup. In other words, you don't want to lease
work tasks that are part of active batches, because you won't be able
to aggregate them effectively. You can guarantee the work will get
done by adding it to a push queue (even if its not aggregated), but if
you process too many like that, it will defeat the primary purpose of
the fan-in task. You could end up with write contention as the
'cleaned up' tasks come in at rates greater than a few per second.

I suppose you could simply timestamp the work tasks, and stop pulling
from the queue once the timestamps pass a certain threshold. For
example, only clean up tasks that are older than 10 minutes. I was a
bit wary of using timestamps for anything after you mentioned the
potential lack of time synchronization, but in this case it wouldn't
have to be perfect, just good enough.

Feature Idea: It would be pretty slick if you could assign a timeout
value to a pull queue task, and set a URL value such that once the
timeout passes, the task would be PUSHED to the specified URL for
handling (assuming it has not already been successfully processed).
This would make the process clean, simple, and efficient, and I
imagine there are a number of other cool uses for such a feature as
well.


On May 2, 5:46 pm, Brett Slatkin <bslat...@google.com> wrote:

Michael Hermus

unread,
May 3, 2012, 6:28:37 PM5/3/12
to Google App Engine
I added a feature request for the Pull task 'timeout' concept. If
anyone should agree that it is useful, please star the issue:
http://code.google.com/p/googleappengine/issues/detail?id=7454

Maxim Lacrima

unread,
May 4, 2012, 2:02:05 AM5/4/12
to google-a...@googlegroups.com
Hi Brett, Michael,

Thank you for pointing to a new approach to do fan-in. This is excellent.

Referring to 2010 I/O talk by Brett Slatkin, I would like also to ask about transactional sequences. Do we have nowadays yet another approach to build materialized views, taking into account appearance of backends, the fact that now we can create cross-group transactions and other new features?


--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.




--
with regards,
Maxim

Michael Hermus

unread,
May 4, 2012, 8:42:35 PM5/4/12
to google-a...@googlegroups.com
I just looked at the java doc for com.google.appengine.api.taskqueue.Queue while trying to implement a version of this, and it looks like the App Engine team was one step ahead, as it relates to the challenges with task cleanup: if you filter by tag but pass in a NULL tag value, you get all the tasks that match the oldest task's tag. This means you can a) process any missed work in aggregate batches, and b) if you happen to pull an active batch, you can still aggregate the work and minimize the impact to throughput. Pretty cool!

I think you would still want to check the time of 'cleaned up' tasks to control how deep you go, but I also see that the API provides access to the task ETA timestamp, so you don't need to add your own.

Michael Hermus

unread,
May 4, 2012, 8:45:48 PM5/4/12
to google-a...@googlegroups.com
Hi Lacrima: I haven't done any thinking about materialized views since I don't have any need yet, but if I come across anything useful I will certainly let you know! I wouldn't be surprised if someone else in the community has, though; you could try posting a new topic.


On Friday, May 4, 2012 2:02:05 AM UTC-4, Lacrima wrote:
Hi Brett, Michael,

Thank you for pointing to a new approach to do fan-in. This is excellent.

Referring to 2010 I/O talk by Brett Slatkin, I would like also to ask about transactional sequences. Do we have nowadays yet another approach to build materialized views, taking into account appearance of backends, the fact that now we can create cross-group transactions and other new features?
On 4 May 2012 01:28, Michael Hermus  wrote:
I added a feature request for the Pull task 'timeout' concept. If
anyone should agree that it is useful, please star the issue:
http://code.google.com/p/googleappengine/issues/detail?id=7454

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-appengine@googlegroups.com.
To unsubscribe from this group, send email to google-appengine+unsubscribe@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.




--
with regards,
Maxim
Reply all
Reply to author
Forward
0 new messages