ndb, deadlock waiting for flush, and maximum recursion in app_logging

1,107 views
Skip to first unread message

ckhan

unread,
Oct 29, 2012, 3:31:55 PM10/29/12
to google-a...@googlegroups.com
Hello. I'm having trouble reading an exception log on my app.

Quick background:
- request was made by taskqueue
- request creates several dozen tasklets that have this code path:
  (module/function names simplified):

      mycode3.py:234 myFunc3()
          --> mycode2.py:64 myFunc2()
                  --> mycode1.py:37 myFunc1

  myFunc1 is yielding on async urlfetch; line 37 is the
  @ndb.tasklet decorator

- taskqueue was configured to re-try 2 more times, which it
  did, each failing in exactly the same way

Actual log:

2012-10-29 08:36:59.649 initial generator myFunc1(mycode1.py:37) raised RuntimeError(maximum recursion depth exceeded)
W 2012-10-29 08:36:59.654 initial generator myFunc1(mycode1.py:37) raised RuntimeError(maximum recursion depth exceeded)
W 2012-10-29 08:36:59.655 suspended generator myFunc2(mycode2.py:64) raised RuntimeError(maximum recursion depth exceeded)
W 2012-10-29 08:36:59.665 initial generator myFunc1(mycode1.py:37) raised RuntimeError(maximum recursion depth exceeded)
W 2012-10-29 08:36:59.669 initial generator myFunc1(mycode1.py:37) raised RuntimeError(maximum recursion depth exceeded)
W 2012-10-29 08:36:59.670 suspended generator myFunc2(mycode2.py:64) raised RuntimeError(maximum recursion depth exceeded)
W 2012-10-29 08:36:59.674 initial generator myFunc1(mycode1.py:37) raised RuntimeError(maximum recursion depth exceeded)
W 2012-10-29 08:36:59.674 suspended generator myFunc2(mycode2.py:64) raised RuntimeError(maximum recursion depth exceeded)
W 2012-10-29 08:37:00.658 suspended generator myFunc3(mycode3:234) raised RuntimeError(maximum recursion depth exceeded)
E 2012-10-29 08:37:04.055
Exception: Deadlock waiting for <Future b36ce4ec0b0cf118 created by __call__(_webapp25.py:712) for tasklet flush(context.py:239); pending>
Traceback (most recent call last):
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/_webapp25.py", line 712, in __call__
    handler.post(*groups)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 1050, in add_context_wrapper
    ctx.flush().check_success()
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 322, in check_success
    raise self._exception.__class__, self._exception, self._traceback
RuntimeError: Deadlock waiting for <Future b36ce4ec0b0cf118 created by __call__(_webapp25.py:712) for tasklet flush(context.py:239); pending>
E 2012-10-29 08:37:04.086 Traceback (most recent call last):
E 2012-10-29 08:37:04.086   File "/base/python_runtime/python_lib/versions/1/google/appengine/api/app_logging.py", line 98, in emit
E 2012-10-29 08:37:04.086 Traceback (most recent call last):
E 2012-10-29 08:37:04.086   File "/base/python_runtime/python_lib/versions/1/google/appengine/api/app_logging.py", line 91, in emit
E 2012-10-29 08:37:04.086 Traceback (most recent call last):
E 2012-10-29 08:37:04.086   File "/base/python_runtime/python_lib/versions/1/google/appengine/api/app_logging.py", line 98, in emit
E 2012-10-29 08:37:04.086 Traceback (most recent call last):
E 2012-10-29 08:37:04.086   File "/base/python_runtime/python_lib/versions/1/google/appengine/api/app_logging.py", line 91, in emit
E 2012-10-29 08:37:04.086 Traceback (most recent call last):
E 2012-10-29 08:37:04.086   File "/base/python_runtime/python_lib/versions/1/google/appengine/api/app_logging.py", line 98, in emit

The warnings for the generators would appear to be from 9 separate,
otherwise unrelated tasklets that are all executing concurrently and
all happen to be at different points in the code path. Is that
correct?

The recursion depth error seems to come from the app_logging.  But is
that the underlying cause, or the symptom of the failure?  If the
latter, how do I narrow what got me into this situation?

Any help much appreciated.
-ckhan

Guido van Rossum

unread,
Oct 30, 2012, 3:07:42 PM10/30/12
to google-a...@googlegroups.com
Hi ckhan,

I have debugged a similar problem for an internal app once. The root cause was that they were mixing synchronous and async API calls. In particular, inside a tasklet they were making a synchronous call. The synchronous call is run by invoking the event loop recursively; what may happen in this case is that another tasklet is spawned that does the same thing, and so on, until you have many recursive event loop invocations stacked on top of each other.

The solution is to find the synchronous call(s) in your tasklet(s) and replace them with "yield <async call>". If this is indirect, e.g. your tasklet calls one of your own non-tasklet helper functions which makes the synchronous call, you will have to convert the helper into a tasklet too.

Good luck! It shouldn't be too hard to track this down.

--Guido

ckhan

unread,
Oct 30, 2012, 5:54:12 PM10/30/12
to google-a...@googlegroups.com

Hi Guido -

Thanks, this was exactly the case! 
I was indeed mixing sync/async calls in a tasklet. 

A few follow-up questions:

1. Is there anything I could have/should have done to make it easier for
   me realize this was the problem? Even just showing more frames from
   the recursive stack might have clued me.

2. Just to clarify my understanding: is the issue specifically making
   synchronous calls in a tasklet; or is any mixing sync/async in the
   same function dangerous?

   More concretely, is this pattern safe:

      def munch_stuff():
          # synchronously fetch some entities
          stuff = ndb.get_multi(...)
  
          # process each concurrently with munch tasklet
          result = yield [munch(c) for c in candidates]

2b. If the above is *not* safe, and I need to change the first line to:

       stuff = yield ndb.get_multi_async(...)

   then how do I use munch_stuff in conjunction with 'deferred'?

   It would seem that marking it with a toplevel/tasklet decorator isn't
   enough (deferred complains about failing to pickle the function).

   Or, would this configuration be unsupported by deferred?

   
Thanks again!
-ckhan

Guido van Rossum

unread,
Oct 30, 2012, 7:05:06 PM10/30/12
to google-a...@googlegroups.com
On Tue, Oct 30, 2012 at 2:54 PM, ckhan <charl...@gmail.com> wrote:
>
> Hi Guido -
>
> Thanks, this was exactly the case!
> I was indeed mixing sync/async calls in a tasklet.

Glad it's been diagnosed.

> A few follow-up questions:
>
> 1. Is there anything I could have/should have done to make it easier for
> me realize this was the problem? Even just showing more frames from
> the recursive stack might have clued me.

Hardly. This is an extremely esoteric problem -- I have only ever seen
it reported once before, but a super high performance internal app. I
am still struggling with what would be the best way to diagnose this
automatically. We have
http://code.google.com/p/appengine-ndb-experiment/issues/detail?id=167
to track this -- maybe you could star it.

> 2. Just to clarify my understanding: is the issue specifically making
> synchronous calls in a tasklet;

Yes.

> or is any mixing sync/async in the
> same function dangerous?

No.

> More concretely, is this pattern safe:
>
> def munch_stuff():
> # synchronously fetch some entities
> stuff = ndb.get_multi(...)
>
> # process each concurrently with munch tasklet
> result = yield [munch(c) for c in candidates]

That's not safe, assuming you forgot to show the @ndb.tasklet
decorator (otherwise, how could the yield have worked).

> 2b. If the above is *not* safe, and I need to change the first line to:
>
> stuff = yield ndb.get_multi_async(...)
>
> then how do I use munch_stuff in conjunction with 'deferred'?
>
> It would seem that marking it with a toplevel/tasklet decorator isn't
> enough (deferred complains about failing to pickle the function).
>
> Or, would this configuration be unsupported by deferred?

Oh, deferred. It has too many arbitrary limitations. The best thing to
do is to only ever use it with simple global functions that only takes
simple types (e.g. ints and strings, maybe lists of those) as
arguments.
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/4o37M4EepMAJ.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengi...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.



--
--Guido van Rossum (python.org/~guido)

ckhan

unread,
Nov 1, 2012, 12:38:57 PM11/1/12
to google-a...@googlegroups.com
Hi Guido -

So sorry, I botched my example. Let me try again: munch_stuff is
explicitly *not* a tasklet (no decorator), but 'munch' is,
and I call it with Future's wait_all function:

      def munch_stuff():
          # synchronously fetch some entities
          stuff = ndb.get_multi(...)
  
          # process each concurrently with munch tasklet
          ndb.Future.wait_all([munch(s) for s in stuff])

Would this construct still be safe re the rule against mixing
sync/async, but let me "munch" concurrently?

Thanks so much taking the time to answer these questions; 
extremely helpful and greatly appreciated.

-ckhan

Guido van Rossum

unread,
Nov 1, 2012, 12:49:44 PM11/1/12
to google-a...@googlegroups.com
On Thu, Nov 1, 2012 at 9:38 AM, ckhan <charl...@gmail.com> wrote:
> So sorry, I botched my example. Let me try again: munch_stuff is
> explicitly *not* a tasklet (no decorator), but 'munch' is,
> and I call it with Future's wait_all function:
>
> def munch_stuff():
> # synchronously fetch some entities
> stuff = ndb.get_multi(...)
>
> # process each concurrently with munch tasklet
> ndb.Future.wait_all([munch(s) for s in stuff])
>
> Would this construct still be safe re the rule against mixing
> sync/async, but let me "munch" concurrently?

This example looks fine -- however if you were to call much_stuff()
from a tasklet, you'd be in trouble again.

> Thanks so much taking the time to answer these questions;
> extremely helpful and greatly appreciated.

You're welcome.
Reply all
Reply to author
Forward
0 new messages