Python data structures and the GIL

26 views
Skip to first unread message

Tobia Conforto

unread,
Nov 6, 2011, 4:07:03 PM11/6/11
to Django users
I have a Django app that does some heavy calculations for the user.

At the start of a user's session it parses some source data and builds
a complex data structure, in the form of big trees of cElementTree
nodes (by themselves fast and small, being written in C). Then it
allows the user to query and manipulate the data structure, using an
AJAX interface.

I can't store the cElementTree data structure in Django's session,
because it's made of unserializable C objects. I also guessed that de/
serializing a complex data structure at every AJAX request would
seriously impact performance. So it needs to stay in RAM, in the form
of Python and C objects.

The way I'm doing it now, there is a global variable in one of my
modules with a dict to map a user's session id with the data
structure. A cleanup algorithm purges stale entries after a given
timeout from the last access.

This forced me to deploy the project with method=threaded, to ensure
there is only one of those cache dicts.

I've given some thought to Python's GIL and the kind of bottleneck it
must be in a method=threaded deployment (in a high-load scenario, with
many users waiting for an answer and many CPU cores to be used,
otherwise it would make no difference).

The only reasonable alternative I see is to use pure Python
ElementTree objects (instead of the unserializable C ones) and store
them in Django's sessions over memcached. This would allow for a
method=prefork deployment. Whether the lack of inter-user Python
locking would offset the cost of using a somewhat larger and slower
data structure and un/pickling it at every request, remains to be
seen.

Any thoughts?

Tobia

Martin J. Laubach

unread,
Nov 6, 2011, 4:27:44 PM11/6/11
to django...@googlegroups.com
  One possibility that springs to mind is shared memory, either the sysv shmem variant or memory mapped files.

        mjl

Masklinn

unread,
Nov 6, 2011, 4:33:36 PM11/6/11
to django...@googlegroups.com
On 2011-11-06, at 22:07 , Tobia Conforto wrote:
>
> I also guessed that deserializing a complex data structure at every AJAX request would seriously impact performance.
Have you *tested* this assumption?

Because, to me and without any hard numbers, wanting to keep trees in-memory kind-of sounds like premature optimization. FWIW, on my machine reading a 60k XML document from memory (string or StringIO) takes cElementTree ~30ms and serializing it back takes roughly the same. While 60ms/request is not insignificant, I'm not sure it's worth the complexity you're trying to introduce.

Reply all
Reply to author
Forward
0 new messages