[google-appengine] Idempotence & multiple task execution

133 views
Skip to first unread message

hawkett

unread,
Apr 23, 2010, 3:51:44 AM4/23/10
to Google App Engine
HI,

I understand that it is possible for a single task to be executed more
than once, but is it safe to assume that only one instance of a
specific task will be executing at the one time? It makes it much more
difficult (time consuming) to implement idempotent behaviour if it is
possible for the subsequent executions of a task to begin before the
first has completed - i.e. for the same task to be executing
concurrently. I can think of ways of using db locking (memcache is not
reliable - especially when this scenario is most likely to occur
during system failures) to recognise the multiple concurrent
executions, but it would be great to know that this scenario cannot
occur. Thanks,

Colin

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Tim Hoffman

unread,
Apr 23, 2010, 6:45:22 AM4/23/10
to Google App Engine
Probably the best way to guard would be have the task name specific to
the operation.
You cant have another task with the same name for about a week,

T

hawkett

unread,
Apr 23, 2010, 10:14:29 AM4/23/10
to Google App Engine
Hi Tim - there's a couple of reasons why this won't work - firstly, it
is my understanding that named tasks are also subject to the
possibility of being executed twice (the name only prevents the same
name being added to the queue twice), and secondly tasks raised
transactionally cannot have a task name.

hawkett

unread,
Apr 25, 2010, 12:44:24 PM4/25/10
to Google App Engine
Wondering if I haven't asked the question clearly enough. Regarding
the statement that we need to assume tasks may be executed multiple
times (i.e. ensure idempotence): is that multiple times serially, or
possibly multiple times concurrently?

I've gone ahead and coded my idempotence solution to assume that they
cannot be running concurrently, just because its a bit easier, and a
bit less work inside a transaction. I'm guessing that the reason they
may be run multiple times is that GAE won't know what to do if it
doesn't get a response from a task it executes - it can't be sure that
the task was received by the application, or that the application was
given the opportunity to correctly react the task - in fact it has to
assume that it didn't, and therefore runs it again to be sure. I'm
assuming that GAE always knows for certain that a task has been fired,
just not whether it was fired successfully - and it will only fire
again if it hasn't correctly processed a response from the previous
execution. If this were true, then it seems as long as GAE guarantees
that it waits > 30s before firing the task a second time (rather than
just reacting to the loss of the http connection for example), then we
can know it is not executing in parallel, because the first execution
cannot be still running due to the request limit.

Am I looking at this correctly? Is it fair to assume that the same
task cannot be running in parallel? Cheers,

Colin

djidjadji

unread,
Apr 29, 2010, 12:36:54 PM4/29/10
to google-a...@googlegroups.com
The decision to rerun a task is done based on the HTTP response code.
There is always a response code, even when the connection is lost.

When the code is 200 the task is considered complete and will not be rerun.
Any other code means the task needs a rerun.
The time between the reruns is increased with each retry.

This means a certain task is never retried in parallel.

But it could be that a task created later will finish first because it
did not need to retry.

2010/4/25 hawkett <haw...@gmail.com>:

hawkett

unread,
Apr 29, 2010, 7:28:14 PM4/29/10
to Google App Engine
Thanks for the response - it's good to know that the multiple
executions cannot occur in parallel, although I'm not sure I
completely understand the reasons. Take the following example -

1. task queue executes a task for the first time (T1E1)
2. application receives task, and begins processing
3. the http connection is lost soon after, and the task queue receives
a HTTP response code
4. task queue backs off (e.g. waits 2s)
5. task queue executes the task a second time (T1E2)
6. application receives task and begins processing

Why is it that T1E1 cannot still be running at step 5/6? Are there no
conditions at step 3 where a response (of any status) is received
while the processing at step 2 is still underway?

There is also another situation, where the HTTP client crashes, which
is also unclear -

1. task queue executes a task for the first time (T1E1)
2. application receives task, and begins processing
3. the task queue crashes (i.e. the HTTP client), so no response can
be received
4. task queue recovers, or another node takes over - (how does it
determine the state of T1E1?)
5. task queue executes the task a second time, since it cannot know
whether T1E1 completed successfully? (T1E2)
6. application receives task and begins processing

Is it possible in this scenario that it will re-execute the task
(T1E2) prior to the completion of the first (T1E1)?

Thanks,

Colin


On Apr 29, 5:36 pm, djidjadji <djidja...@gmail.com> wrote:
> The decision to rerun a task is done based on the HTTP response code.
> There is always a response code, even when the connection is lost.
>
> When the code is 200 the task is considered complete and will not be rerun.
> Any other code means the task needs a rerun.
> The time between the reruns is increased with each retry.
>
> This means a certain task is never retried in parallel.
>
> But it could be that a task created later will finish first because it
> did not need to retry.
>
> 2010/4/25 hawkett <hawk...@gmail.com>:

Eli Jones

unread,
Apr 29, 2010, 9:31:57 PM4/29/10
to google-a...@googlegroups.com
In my opinion, the case you are asking about is pretty much the reason they state that tasks must be idempotent.. even with named tasks.

They cannot guarantee 100% that some transient error will not occur when a scheduled task is executed (even if you are naming tasks and are guaranteed 100% that your task will not be added to the queue more than once).

So, it is possible to have more than one version of the "same" task executing at the same time.  You just need to construct your tasks so they aren't doing too much at once (e.g. reading some data, then updating or inserting.. then reading other data... and updating some more), or you need to make sure to do all that inside a big transaction.. and, even then, you still need to ensure idempotence.

I sort of prefer a poor man's version of idempotence for my chained tasks.  Mainly, if the "same" task runs more than once.. each version will have a potentially different result, but I am perfectly happy getting the result from the task that ran last.  But, I can easily accept this since my tasks are not doing multiple updates at once.. and they are not reading from the same entities that they are updating.

What is your exact use case?

hawkett

unread,
Apr 30, 2010, 8:08:21 PM4/30/10
to Google App Engine
My use case is as follows -

1. tasks which do not support idempotence inherently (such as deletes,
and some puts) carry a unique identifier, which is written as a
receipt in an attribute of an entity that is updated in the
transaction.
2. When a task arrives carrying a receipt, I check that it does not
already exist - so receipted tasks incur an additional, key only, db
read

This is essentially my algorithm for ensuring idempotence (in
situations where it is not inherent) - ignore subsequent executions.

If the same task *cannot* be running in parallel, then the check for
the receipt can be done outside the transaction that writes the
receipt - which has a couple of advantages -

a. It can be done up front in the task handler, so I don't have to go
all the way through to the transactional write before discovering it
already executed
b. More importantly, I can reduce the work done inside the transaction
- every extra millisecond spent in the transaction locks the entity
group, and at scale, those milliseconds can add up - especially on
entity groups that are somewhat write intensive.

If the same task *can* be running in parallel, then I need to do the
receipt read inside the transaction that writes it. It would be a pity
to do that extra work in every transaction for a very rare scenario.

As stated earlier, it seems that it might be possible for GAE to
guarantee that it does not execute the same task in parallel - by
ensuring that, for error scenarios like those above (408, client
crash, perhaps others), the 2nd execution waits 30 seconds. That has
some obvious downsides, but given how rarely it occurs, and given that
an app shouldn't be relying on the speed with which a task is
executed, it seems like a reasonable trade-off to get a reduction in
transactional work for the vast majority of the time - less
contention, less CPU, less datastore activity.

A simple example is a task which increments a counter - we don't want
to increment the counter twice.

The problem is the same whether one or many entities are being updated
during handling of the task.

Do you have many situations where you perform a read that does not
result in some sort of update - db update, another task raised, email
sent, external system notified etc.? There's a subset of most of
these that we want to avoid doing twice. It's the multiple writes,
rather than multiple reads causing issues.

Anyone from google able to end the speculation? :)
> > google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com>
> > .
> > > >> > > For more options, visit this group athttp://
> > groups.google.com/group/google-appengine?hl=en.
>
> > > >> > --
> > > >> > You received this message because you are subscribed to the Google
> > Groups "Google App Engine" group.
> > > >> > To post to this group, send email to
> > google-a...@googlegroups.com.
> > > >> > To unsubscribe from this group, send email to
> > google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com>
> > .
> > > >> > For more options, visit this group athttp://
> > groups.google.com/group/google-appengine?hl=en.
>
> > > >> --
> > > >> You received this message because you are subscribed to the Google
> > Groups "Google App Engine" group.
> > > >> To post to this group, send email to
> > google-a...@googlegroups.com.
> > > >> To unsubscribe from this group, send email to
> > google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com>
> > .
> > > >> For more options, visit this group athttp://
> > groups.google.com/group/google-appengine?hl=en.
>
> > > > --
> > > > You received this message because you are subscribed to the Google
> > Groups "Google App Engine" group.
> > > > To post to this group, send email to google-a...@googlegroups.com
> > .
> > > > To unsubscribe from this group, send email to
> > google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com>
> > .
> > > > For more options, visit this group athttp://
> > groups.google.com/group/google-appengine?hl=en.
>
> > > --
> > > You received this message because you are subscribed to the Google Groups
> > "Google App Engine" group.
> > > To post to this group, send email to google-a...@googlegroups.com.
> > > To unsubscribe from this group, send email to
> > google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com>
> > .
> > > For more options, visit this group athttp://
> > groups.google.com/group/google-appengine?hl=en.
>
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Google App Engine" group.
> > To post to this group, send email to google-a...@googlegroups.com.
> > To unsubscribe from this group, send email to
> > google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com>
> > .

hawkett

unread,
May 12, 2010, 6:49:08 AM5/12/10
to Google App Engine
Bump - still not clear whether the same task can be executing multiple
times concurrently? I noticed that failed tasks seem to back off for
significantly longer recently - perhaps this has helped the situation?
Appreciate any clarification - cheers,

Colin
> ...
>
> read more »

Ikai L (Google)

unread,
May 13, 2010, 12:35:38 PM5/13/10
to google-a...@googlegroups.com
The same task should not be executed multiple times concurrently. If it fails, we will retry it in the future (could be back to back, but this is not guaranteed).

Are you seeing evidence of the contrary?
--
Ikai Lan 
Developer Relations, Google App Engine

----------------
Google App Engine links:
Blog: http://googleappengine.blogspot.com 

hawkett

unread,
May 13, 2010, 7:36:38 PM5/13/10
to Google App Engine
Thanks Ikai - that's good to know - I just want to check that you're
not referring to the 'normal' failure behaviour where an exception is
thrown by the server - I understand that they won't be executed
concurrently in this situation. I'm talking about what happens (for
example) if the http client executing the task crashes, and GAE cannot
detect whether the first task execution is still running in the
system. Does it wait sufficient time before re-executing to let it
finish, so as to guarantee that there are not concurrent executions?

I understand its not likely to happen often, but if it can possibly
occur, then I need to program for it to avoid data corruption. Cheers,

Colin

hawkett

unread,
May 22, 2010, 4:46:55 AM5/22/10
to Google App Engine
Apologies for repeatedly bumping this thread, but the advice seems to
be that the same task-id *cannot* execute concurrently (100%
guaranteed), but no response asserting this has addressed the failure
scenario I've raised, where it would appear that the same task *may*
execute concurrently unless app engine has implemented something
specifically to prevent it occurring. I know the task queue is very
reliable, but not 100% so - http://groups.google.com/group/google-appengine/browse_thread/thread/31f6069ed4ee3389.

So - in the scenario where the HTTP client (i.e. the task queue) drops
the HTTP connection in an initial task execution - how does app engine
prevent the recovery mechanism from executing the task a second time
while the first is still running?

The possibility of the same task running concurrently has significant
architectural implications for my app. Does app engine handle the
scenario I've outlined and prevent concurrent execution of the same
task-id?

Thanks for the clarification,

Colin

On May 13, 5:35 pm, "Ikai L (Google)" <ika...@google.com> wrote:

hawkett

unread,
May 28, 2010, 7:50:15 PM5/28/10
to Google App Engine
Just my weekly bump on this thread. The advice from google appears to
be to trust that tasks with the same id cannot be running
concurrently. However, there are clear edge scenarios documented in
this thread that are not accounted for. It would be a pity if people
made architectural decisions based on the advice from google, and
discovered down the track that their data was corrupted as a result of
the occasional concurrent execution of the same task id. Are the edge
cases handled, and tasks *never* run concurrently, or is it only the
case that they don't run concurrently 'under normal conditions'? If
there could ever be concurrent execution then it is a whole different
architectural scenario. Can it happen or not? By all means, if the
answer is that task queue is an experimental feature, 'anything's
possible', that would be better than tumbleweed, and infinitely better
than advising that concurrent execution cannot occur, when in fact
you're not sure that's true. Thanks,

Colin

On May 22, 9:46 am, hawkett <hawk...@gmail.com> wrote:
> Apologies for repeatedly bumping this thread, but the advice seems to
> be that the same task-id *cannot* execute concurrently (100%
> guaranteed), but no response asserting this has addressed the failure
> scenario I've raised, where it would appear that the same task *may*
> execute concurrently unless app engine has implemented something
> specifically to prevent it occurring.  I know the task queue is very
> reliable, but not 100% so -http://groups.google.com/group/google-appengine/browse_thread/thread/....
> ...
>
> read more »

Stephen

unread,
May 29, 2010, 6:37:58 AM5/29/10
to Google App Engine


On May 22, 9:46 am, hawkett <hawk...@gmail.com> wrote:
>
> So - in the scenario where the HTTP client (i.e. the task queue) drops
> the HTTP connection in an initial task execution - how does app engine
> prevent the recovery mechanism from executing the task a second time
> while the first is still running?


No idea if it does this or not -- but one way it could prevent
concurrent execution is by only re-executing a failed task 31 seconds
or more after the failure. By this time, the original task will have
been killed off, even if it did not manage to return a failure/success
code.

hawkett

unread,
May 29, 2010, 7:40:43 AM5/29/10
to Google App Engine
Agreed - I also nominated this solution in my April 25th & May 1
posts :) - cheers,

Colin

Eli Jones

unread,
Sep 7, 2010, 12:04:11 PM9/7/10
to google-a...@googlegroups.com
Just in case anyone comes across this thread and is wondering about the potential for concurrent execution of a named task.

This is documented:


The important quote is:

"When implementing the code for Tasks (as worker URLs within your app), it is important that you consider whether the task is idempotent. App Engine's Task Queue API is designed to only invoke a given task once, however it is possible in exceptional circumstances that a Task may execute multiple times (e.g. in the unlikely case of major system failure). Thus, your code must ensure that there are no harmful side-effects of repeated execution."

So.. again, a named task should not run more than once.. and probably will not run more than once.. But, there could be a major system failure that might result in the named task running more than once.

The "concurrent execution" problem should only come up if an error occurs in the system at the moment the task is executed.. and somehow two versions are started at the same time.

I don't know that this issue would/could come up for failed tasks that are then re-executed.  (I guess there could be an error that somehow indicates the task has failed when it really is still running... and thus the re-executed task begins while the old task is still running.)  But, re-executed tasks already seem to start well over 30 seconds after the purported failed task has finished.

So.. you need to figure out how idempotent you need your tasks to be.. no matter what.. there is no way to guarantee that a large, geographically distributed system like this is 110% exact at all moments.. and assuming (or requesting) that there is no way an exception can happen that might result in concurrent task execution is the wrong approach.

For my chained tasks.. I just relax my requirements and have named tasks that insert, update based on key_name.. and if two happen to run concurrently... I just get the data from the most recent insert, update.. since earlier insert, updates get overwritten, and life goes on.

> ...
>
> read more »

hawkett

unread,
Sep 8, 2010, 4:18:07 AM9/8/10
to Google App Engine
Hi Eli,

Thanks for the info - the question was definitely trying to get a
specific statement about whether app engine could run the same task id
at the same time. Ikai's post seems to suggest that google did not
think this is possible, but did not seem to address the failure
scenarios I outlined.

It was about the time that I queried Ikai'a response that re-executed
tasks started backing off for a significant period (over 30s) - they
used to go immediately, and then get slower and slower. e.g. 1s, 2s,
4s, 8s type behaviour. Probably co-incidence, but the fact it started
happening meant that I chose to assume that concurrent tasks with the
same id could not occur. As you can see in the above thread, I had
suggested backing off for more than 30s as a solution.

I agree that the problem is making sure you know how idempotent your
operations need to be, which is specifically why it is important to
have a definitive statement from google as to whether this the
concurrent execution can occur or not. Without that information, I
don't know how idempotent my operations need to be. Without this
information, I should probably be assuming concurrent execution *can*
occur, but I'm taking a risk because the overhead is so high (in my
application).

So from my perspective, it would be a reasonable courtesy for google
to comment on this thread - it is a reasonable question with some fair
effort spent on articulating it, and it appears they may have fixed it
in response to this thread without taking the time to say so.

Thanks,

Colin
> > > > > > > > completely...
>
> read more »

Eli Jones

unread,
Sep 8, 2010, 11:14:32 AM9/8/10
to google-a...@googlegroups.com
Well, I've been doing named, chained tasks since November 2009, and I can point out three things:

1.  I've had concurrent tasks execute at least once (that I noticed) when only one was supposed to run.. And, this appeared to happen when the subsystem first fired off the task (after it had already been added to the queue.. since TombstonedTaskError and TaskAlreadyExistsError seem to work nicely.).

2.  The GAE doc that I linked to explicitly states "it is possible in exceptional circumstances that a Task may execute multiple times".  I believe that this covers both cases of the same task running concurrently or sequentially.

3.  For my failed tasks, I'm pretty sure the backoff has always been more than 30 seconds (if the task failed in the middle of running).  Generally, if a task failed in the middle of running, it would run again 60 seconds - 120 seconds later.

I can see how one would like the doc to explicitly address the potential for concurrent execution.. but you should presume that it is possible since the doc infers it.. and the doc doesn't say it can't happen.. and (less importantly) some guy on an internet news group is telling you that it has occurred in the past.

I personally cannot imagine how one could guarantee that this would never happen without bogging down the entire taskqueue subsystem with triple and quadruple checks and adding in random (1-3 second) wait times for exactly when any task would execute.. (but, I have a limited imagination).. and it seems like even then.. you cannot guarantee 100% that a task would not execute twice at once if a drastic system error occurred.

>
> read more »

hawkett

unread,
Sep 9, 2010, 5:26:28 AM9/9/10
to Google App Engine
Hi Eli, notes below -

On Sep 8, 4:14 pm, Eli Jones <eli.jo...@gmail.com> wrote:
> Well, I've been doing named, chained tasks since November 2009, and I can
> point out three things:
>

Task names aren't especially relevant to the question - names stop the
same task being raised twice, not executed twice. I have been using
the task queue since it was released, and definitely noticed tasks
being executed more than once, but never concurrently.

> 1.  I've had concurrent tasks execute at least once (that I noticed) when
> only one was supposed to run.. And, this appeared to happen when the
> subsystem first fired off the task (after it had already been added to the
> queue.. since TombstonedTaskError and TaskAlreadyExistsError seem to work
> nicely.).
>

Well, from Ikai's comment it would sound like google does not expect
this behaviour. I raised this thread through hypothetical analysis of
the technology, but if you have seen it happen, then that is
especially interesting. I personally can't see how it could
legitimately happen if it backs off for more than 30s - it would be a
bug in the system for the task to fire duplicates when it is first
raised, IMO. How did you determine the execution was concurrent?

> 2.  The GAE doc that I linked to explicitly states "it is possible in
> exceptional circumstances that a Task may execute multiple times".  I
> believe that this covers both cases of the same task running concurrently or
> sequentially.

I don't think it does, but this is specifically the point of this
thread - it is not clear. I don't want to engineer significant
overhead into my application based on interpretation of unclear
documentation. To me, the same task id executing at the same time in
app engine, if it is possible, is something that needs to be
explicitly documented, because it has significant impact on app
architecture. Again, Ikai's comment above seems to imply Google does
not expect this to happen. So if the documentation is unclear, and
google seems to suggest the opposite of your interpretation, that's a
good reason to be wary of the assumption you are making.

>
> 3.  For my failed tasks, I'm pretty sure the backoff has always been more
> than 30 seconds (if the task failed in the middle of running).  Generally,
> if a task failed in the middle of running, it would run again 60 seconds -
> 120 seconds later.
>

It hasn't. Absolutely, definitely used to retry immediately and back
off at incrementally larger intervals that were initially < 30s.
Worked like this for quite a long while. Indeed, people other than me
suggested this behaviour should be changed to 30s plus to deal with
the issue in this thread. I had many, many situations where I had a
bug in a task, and the work it generated straight after failure would
fill up the error logs almost instantly. It was a real hassle for a
while there, and one of the reasons why I raised this issue in June
last year - http://code.google.com/p/googleappengine/issues/detail?id=1771
(among a bunch of others). I wouldn't have suggested backoff should be
changed to > 30s if it was already the case.

> I can see how one would like the doc to explicitly address the potential for
> concurrent execution.. but you should presume that it is possible since the
> doc infers it.. and the doc doesn't say it can't happen.. and (less
> importantly) some guy on an internet news group is telling you that it has
> occurred in the past.
>

I don't think the docs infer it. I think it is ambiguous, especially
in relation to Ikai's comment.

> I personally cannot imagine how one could guarantee that this would never
> happen without bogging down the entire taskqueue subsystem with triple and
> quadruple checks and adding in random (1-3 second) wait times for exactly
> when any task would execute.. (but, I have a limited imagination).. and it
> seems like even then.. you cannot guarantee 100% that a task would not
> execute twice at once if a drastic system error occurred.

Executing twice is fine, I get that. Executing the same task id
concurrently seems to be something that can be avoided - I don't see
anything other than the 30s+ backoff being required to achieve this.
Maybe that's wrong, but its sufficient for me, and was the suggestion
I made to address it. Unless someone highlights another reason why it
could occur, I'm glad to avoid the additional architecture.
> > > > > > > > all the way through to the...
>
> read more »

Eli Jones

unread,
Sep 9, 2010, 12:41:11 PM9/9/10
to google-a...@googlegroups.com
How did I determine concurrent execution?

I determined that I had concurrent task execution because you can see the task_name in the logs, and a named task successfully ran twice.  And, the one that ran last threw a TaskAlreadyExists error when trying to add the next chained task to the queue since each named task has a specifically defined name for the next task in the chain and the version that finished first had already added the next named task to the queue. (This is why it is absolutely important to use named tasks when chaining.. some sort of random error can fork your tasks).

Why do I suggest tasks do not just retry immediately (or in less than 30 seconds after failure).. and have done so in the time before your April 23rd e-mail.

Here are some logs showing a task retry on Feb 22nd (it's hard to find many examples since Appengine Logs only keep error logs after a few months.. so I need to find two errors in a row for a task to see the retry).

The task's first run was at 12:20:00.026 PM.  It ran for 29 seconds and failed at 12:20:29.275 PM with Deadline Exceeded.. then it retried at 12:21:07.596 PM (37 seconds after failure):

02-22 12:21PM 07.596 /myTask 500 28548ms 306cpu_ms 160api_cpu_ms 2kb AppEngine-Google; (+http://code.google.com/appengine)
E 02-22 12:21PM 36.140 <class 'google.appengine.runtime.DeadlineExceededError'>: Traceback (most recent call last): File "/base/data/home/apps/myApp/1.34005759049070

02-22 12:20PM 00.026 /myTask 500 29255ms 2777cpu_ms 193api_cpu_ms 2kb AppEngine-Google; (+http://code.google.com/appengine)
E 02-22 12:20PM 29.275 <class 'google.appengine.runtime.DeadlineExceededError'>: Traceback (most recent call last): File "/base/data/home/apps/myApp/1.34005759049070 

The general behaviour for my app is more like.. the task will fail, and then it will retry in 120 seconds (I have error logs showing this occurring back in February as well.)

Maybe non-named tasks that are set to run immediately have retried on a different timeframe in the past.. but the retry time has not just been some generic sub-30 second time.

As for Ikai's comment, it says what it says: "The same task should not be executed multiple times concurrently."

It does not say that the same task cannot be executed multiple times concurrently.

Again, my money is on the reality that one cannot guarantee 100% that an error will never occur that could lead to concurrent task execution... you would cripple the task queu subsytem if you put in a bunch of preventative checks.  Though, one can state with reasonable confidence that it is highly improbable that a task will execute concurrently.  But, good luck getting a literal answer to your question.


>
> read more »

hawkett

unread,
Sep 9, 2010, 5:07:16 PM9/9/10
to Google App Engine
Thanks Eli - couple more comments below

On Sep 9, 5:41 pm, Eli Jones <eli.jo...@gmail.com> wrote:
> How did I determine concurrent execution?
>
> I determined that I had concurrent task execution because you can see the
> task_name in the logs, and a named task successfully ran twice.  And, the
> one that ran last threw a TaskAlreadyExists error when trying to add the
> next chained task to the queue since each named task has a specifically
> defined name for the next task in the chain and the version that finished
> first had already added the next named task to the queue. (This is why it is
> absolutely important to use named tasks when chaining.. some sort of random
> error can fork your tasks).

I might have misunderstood something, but this example seems to only
show multiple execution, not concurrent?

> Why do I suggest tasks do not just retry immediately (or in less than 30
> seconds after failure).. and have done so in the time before your April 23rd
> e-mail.

Well I guess our experience differs then. For me, it sometimes backed
off for longer, but generally retried immediately, and frequently,
until the backoff time reached a decent level. Way too much energy
spent on it to forget it :)
> > last year -http://code.google.com/p/googleappengine/issues/detail?id=1771
> ...
>
> read more »

vlad

unread,
Nov 16, 2010, 3:06:06 AM11/16/10
to google-a...@googlegroups.com
Appreciate the discussion here. The motivation behind it is not just curiosity - it is very hard to build a long running "pipeline" without handling correctly all error cases.

For me failed tasks retry in 20 sec. Seen up to 3 retry with flat 20 sec backoff each time. I would have preferred this to be a option of the taskqueue. My app is a user facing game so 20sec delay is annoying.

Robert Kluin

unread,
Nov 16, 2010, 12:12:54 PM11/16/10
to google-a...@googlegroups.com
I like the idea of configurable task-queue retry (or back-off) rates.
If you submit an issue for it let us know so we can star it.

Robert

Reply all
Reply to author
Forward
0 new messages