Maximum number of tasks that can be added in a batch

120 views
Skip to first unread message

Trez Ertzzer

unread,
Nov 17, 2015, 1:13:57 PM11/17/15
to Google App Engine
hello.
on this page:
https://cloud.google.com/appengine/docs/java/taskqueue/overview-push

it's written
Maximum number of tasks that can be added in a batch: 100

what does in mean? (ie what is a "batch"?)


what I plan to do:
I want to have a cron task that is going to run every month.
this task is going to create one task for each of my user to compute the bill and send an email.
do you think that's possible? (if I have 100 000 users...)

how would you do this?


Patrice (Cloud Platform Support)

unread,
Nov 17, 2015, 2:14:35 PM11/17/15
to Google App Engine
Hi,

As can be seen here, when you add tasks to a queue, you can add a LIST of tasks, which is what the batch refers too.

What you describe for your application is definitely feasible, but you won't be able to add 100 000 tasks to your queue at once. Depending on the system, I would personally prefer a datastore entry to keep track of the bill and then a job that just sends the 100 000 emails, but I don't know enough about your system, this may not be feasible.

Cheers!

Jeff Schnitzer

unread,
Nov 17, 2015, 4:29:39 PM11/17/15
to Google App Engine
You are on the right track, but there are a couple tricks to it. I do similar things all the time, often with millions of records/tasks. Since you pointed at java documentation, I assume you're using Java.

The simplest way to do what you want is to perform a keys-only query for your users and batch add the tasks. Guava collections transformation helps a lot here. Run a query that gives you an Iterable<Key<User>>, then transform that into an Iterable<OneUserDeferredTask>, and pass that to a method like this:

/** Allows any number of tasks; automatically partitions as necessary */
public void add(Iterable<? extends DeferredTask> payloads) {
Iterable<TaskOptions> opts = Iterables.transform(payloads, new Function<DeferredTask, TaskOptions>() {
@Override
public TaskOptions apply(DeferredTask task) {
return TaskOptions.Builder.withPayload(task);
}
});

Iterable<List<TaskOptions>> partitioned = Iterables.partition(opts, QueueConstants.maxTasksPerAdd());

for (List<TaskOptions> piece: partitioned)
queue().addDeferredTask(piece);
}
Guava partition() makes life easy for you here. This naive approach will stand up to enqueueing tens of thousands of tasks easily. However, there's one limit you may run into somewhere before 100k - any single query to the datastore times out after 60s, even if you're running from a cron job (which otherwise gives you 10m). So as your userbase grows you may start to exceed this limit.

The easiest solution is to simply run this in a loop and checkpoint the query every 10k rows or so. Get a cursor, and rerun the query from that cursor. This will keep each individual query under the 60s deadline and you will easily be able to process 100k+ records in 10m.

If you're talking about considerably more records, or you want to process this data in a hurry, you can use the __scatter__ property to create partitions that you can transform into tasks in parallel. The map/reduce framework works this way. Here's some sample code I posted recently that should give you a head start:


However, start with the most naive approach and grow from there (assuming you don't already have 100k users).

Suerte,
Jeff

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.
Visit this group at http://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/a8b2207a-52bf-49ec-8c77-acbfc29c69e5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Trez Ertzzer

unread,
Nov 18, 2015, 5:29:57 AM11/18/15
to Google App Engine, je...@infohazard.org
Hello.
thank you very much for your answer. it's very clear now!

however in your code  when you write:
"queue().addDeferredTask(piece);"

I suppose you wanted to write:
"getQueue().add(piece)"

(cause I could not find any method called "addDeferredTask"  so please just confirm...)

does this operation is "long" (I mean does it last long?)

I have no real idea of what's behing this "add" method so I guessed that I could call this function 1 000 000 times in less than 60 seconds.. (like I would in a normal JAVA queue.... and it would take less than a millisecond...)

thank you

Patrice (Cloud Platform Support)

unread,
Nov 23, 2015, 10:36:58 AM11/23/15
to Google App Engine, je...@infohazard.org
Hi Trez,

The operation is generally quick. The only issue I can see with sending 1 000 000 tasks in a single minute is that if this is sustained through your day, there is a chance you hit a quota limit eventually. You can get more information into related quotas here.

Cheers!

Jeff Schnitzer

unread,
Nov 24, 2015, 5:13:28 AM11/24/15
to Trez Ertzzer, Google App Engine
Right, there's no there's no addDeferedTask() method in the library... I wrap the queue myself with a more convenient abstraction but you get the idea.

Jeff
Reply all
Reply to author
Forward
0 new messages