Hi Eli, notes below -
On Sep 8, 4:14 pm, Eli Jones <
eli.jo...@gmail.com> wrote:
> Well, I've been doing named, chained tasks since November 2009, and I can
> point out three things:
>
Task names aren't especially relevant to the question - names stop the
same task being raised twice, not executed twice. I have been using
the task queue since it was released, and definitely noticed tasks
being executed more than once, but never concurrently.
> 1. I've had concurrent tasks execute at least once (that I noticed) when
> only one was supposed to run.. And, this appeared to happen when the
> subsystem first fired off the task (after it had already been added to the
> queue.. since TombstonedTaskError and TaskAlreadyExistsError seem to work
> nicely.).
>
Well, from Ikai's comment it would sound like google does not expect
this behaviour. I raised this thread through hypothetical analysis of
the technology, but if you have seen it happen, then that is
especially interesting. I personally can't see how it could
legitimately happen if it backs off for more than 30s - it would be a
bug in the system for the task to fire duplicates when it is first
raised, IMO. How did you determine the execution was concurrent?
> 2. The GAE doc that I linked to explicitly states "it is possible in
> exceptional circumstances that a Task may execute multiple times". I
> believe that this covers both cases of the same task running concurrently or
> sequentially.
I don't think it does, but this is specifically the point of this
thread - it is not clear. I don't want to engineer significant
overhead into my application based on interpretation of unclear
documentation. To me, the same task id executing at the same time in
app engine, if it is possible, is something that needs to be
explicitly documented, because it has significant impact on app
architecture. Again, Ikai's comment above seems to imply Google does
not expect this to happen. So if the documentation is unclear, and
google seems to suggest the opposite of your interpretation, that's a
good reason to be wary of the assumption you are making.
>
> 3. For my failed tasks, I'm pretty sure the backoff has always been more
> than 30 seconds (if the task failed in the middle of running). Generally,
> if a task failed in the middle of running, it would run again 60 seconds -
> 120 seconds later.
>
It hasn't. Absolutely, definitely used to retry immediately and back
off at incrementally larger intervals that were initially < 30s.
Worked like this for quite a long while. Indeed, people other than me
suggested this behaviour should be changed to 30s plus to deal with
the issue in this thread. I had many, many situations where I had a
bug in a task, and the work it generated straight after failure would
fill up the error logs almost instantly. It was a real hassle for a
while there, and one of the reasons why I raised this issue in June
last year -
http://code.google.com/p/googleappengine/issues/detail?id=1771
(among a bunch of others). I wouldn't have suggested backoff should be
changed to > 30s if it was already the case.
> I can see how one would like the doc to explicitly address the potential for
> concurrent execution.. but you should presume that it is possible since the
> doc infers it.. and the doc doesn't say it can't happen.. and (less
> importantly) some guy on an internet news group is telling you that it has
> occurred in the past.
>
I don't think the docs infer it. I think it is ambiguous, especially
in relation to Ikai's comment.
> I personally cannot imagine how one could guarantee that this would never
> happen without bogging down the entire taskqueue subsystem with triple and
> quadruple checks and adding in random (1-3 second) wait times for exactly
> when any task would execute.. (but, I have a limited imagination).. and it
> seems like even then.. you cannot guarantee 100% that a task would not
> execute twice at once if a drastic system error occurred.
Executing twice is fine, I get that. Executing the same task id
concurrently seems to be something that can be avoided - I don't see
anything other than the 30s+ backoff being required to achieve this.
Maybe that's wrong, but its sufficient for me, and was the suggestion
I made to address it. Unless someone highlights another reason why it
could occur, I'm glad to avoid the additional architecture.
> > > > > > > > all the way through to the...
>
> read more »