threading, caching, and the yield in query.py

94 views
Skip to first unread message

bo

unread,
Sep 29, 2008, 1:37:47 PM9/29/08
to Django developers

This little issue is really hard to replicate .. i've yet to find out
how to programatically do it because it certainly revolves around a
threading, object caching, and the Yield in Query.py (iteritems).

I just wanted to post this here to see if anyone, with more experience
then i, knows howto replicate this in a test case world.

On to the description....

The set up:: Apache 2.2.6 + Linux + Threaded (_not_ forked) +
mod_python + Django 1.X

Suppose i have a 2 level caching object, that basically overloads the
function to store/get the object(s) from a Memcached overlord cache
(say 60 second expire), and 'local' cache (with a 1-2 second expire
time) .. the basic function is to keep a highly requested object
_per_ HTTP request local in ram w/o having to go back to the Memcached
cache.

because of various iterators basing themselves off of "yield"
statements in db/models/query.py. Should 2 threads access the same
Local RAM cache object and try to iterate (yes the READS from the
cache are read/write locked, but this issue appears after the read
lock is released and the object is begin used), the ususal "Value
Error: Generator already running" exception is thrown.

File "mything/models.py", line 1072, in _set_data
for p in self.data:
File "/usr/lib/python2.5/site-packages/django/db/models/query.py",
line 179, in _result_iter
self._fill_cache()
File "/usr/lib/python2.5/site-packages/django/db/models/query.py",
line 612, in _fill_cache
self._result_cache.append(self._iter.next())
ValueError: generator already executing

So, i'm aware this may not be a bug. But my own ignorance for not
doing something right.

This does not happen very often in the servers i am running (about 10
times a day on a 100k+ Django views/per day) which is why its really
hard to track down


Malcolm Tredinnick

unread,
Sep 29, 2008, 8:45:04 PM9/29/08
to django-d...@googlegroups.com

I don't understand from your description what you're actually doing, but
it sounds a lot like you're trying to read from the same QuerySet in
multiple threads whilst it's still retrieving results from the database
cursor. Don't do that. Firstly, database cursor result sets aren't
necesarily safe to be shared across threads. QuerySet and Query objects
probably are once the result set is populated, since every non-trivial
operation on them creates a copy and parallel iteration is supported,
but that's more by accident than design, since it's not worth the extra
overhead: if you want to share QuerySets via caching, they contain the
results (the result_cache is already fully primed).

Nothing in Django will cache a connection to the database or a cursor
result set, so can you break down your problem a bit more to describe
where the simultaneous access is happing. You say "the usual
ValueError", but I have never seen that raised by anything in Django. So
I'm wondering if you're doing something fairly unusual here.

That particular block of code is *designed* to be used in parallel
iterators in the same thread, so it's safe in that respect. But if
you're sharing a partially-read database cursor iterator across multiple
threads, it might be a case of it breaks you get to keep all the pieces.
I can't see why that would be necessary (if you want to cache the
results, just cache the queryset and it will cache the results; if you
want to cache the query, cache queryset.query and it will just cache the
query; both of those cases are designed in and documented).

Regards,
Malcolm


bo

unread,
Sep 30, 2008, 1:05:31 AM9/30/08
to Django developers

I am doing some mildly weird things ..

all are related to the fact that the first iterator (i.e. for data in
myob.data) is _really_ Database heavy, so the entire thing is a very
lazy iterator (with other lazy sub iterators). So in caching the sub
objects (myob) the QuerySet could have been evaluated, but sometimes
not, and when that data is finally evaluated, that data is sub cached,
so that it is shared across all the other outstanding objects. But
unlike the Memcached store, this 2 stage cacher does not need to
pickle anything to hold on to the data in its first fast cache (local)
stage (it's just some some Python Singleton). and yes, if i force the
thing to get pickeled into the local cache these errors do not occur
(directly related to your statement that QuerySets are flattened in
this process). I imagine what the issue is somehow outside of
django .. i was more curious if anyone else has come across this
before as i am stumped.

What seems strange to me is that all theses yield Value errors always
occur in the template rendering, never in any other part of the mix,
which means that the thread churning out the template is somehow
connected to another thread also using the same data object churning
out another template (or even the same template in a different
stage). Which is why i find this strange.

bo

On Sep 29, 5:45 pm, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:
Reply all
Reply to author
Forward
0 new messages