Google Groups Home Help | Sign in
Django 100% threadsafe with DB?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  20 messages - Collapse all
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
PyMan  
View profile
(1 user)  More options Sep 25 2007, 5:56 am
From: PyMan <c.playma...@tiscali.it>
Date: Tue, 25 Sep 2007 02:56:48 -0700
Local: Tues, Sep 25 2007 5:56 am
Subject: Django 100% threadsafe with DB?
Hi all :)

I have the following problem.

I have a function that do some get_or_create on model X giving as
parameters the field Y and Z. The same function is running on
different threads, so it can happen that more get_or_create on model X
and fields Y/Z are called at the same time. It can also happen that
the values of Y/Z are the same in the different threads and here it
comes the problem: it happens that get_or_create says it returned more
than one row for X-Y/Z.

Now Y/Z are not unique in my model and I do not have an overloading of
the save function, but I think I can have the same problem if Y/Z were
unique too.

Looking at the DB I have more than one row with same values...how
comes this? Here the get_or_create...

    def get_or_create(self, **kwargs):
        """
        Looks up an object with the given kwargs, creating one if
necessary.
        Returns a tuple of (object, created), where created is a
boolean
        specifying whether an object was created.
        """
        assert len(kwargs), 'get_or_create() must be passed at least
one keyword argument'
        defaults = kwargs.pop('defaults', {})
        try:
            return self.get(**kwargs), False
        except self.model.DoesNotExist:
            params = dict([(k, v) for k, v in kwargs.items() if '__'
not in k])
            params.update(defaults)
            obj = self.model(**params)
            obj.save()
            return obj, True

Well...looking at the function I can say it happens in this way (and I
did by debug too) :

time 1 : T1 (thread 1) call get_or_create and does the self.get and it
goes in exception for DoesNotExists
time 2 : T2 do the same and goes in exception too for the same reason
time 3 : T1 goes on with the exception and creates the object and
gives it back
time 4 : T2 the same (creating another one!)
time 5 : any T (T1 or T2 or T3) who calls get_or_create again with
same X-Y/Z...the self.get gets crazy.

If the Y/Z were unique I just think the whole thing would fail at time
4...a little bit different, but still a problem.

I could solve all the thing in this way, but I hope there is a better
solution that I'm missing :

from threading import Lock
lock = Lock()

    def get_or_create(self, **kwargs):
        """
        Looks up an object with the given kwargs, creating one if
necessary.
        Returns a tuple of (object, created), where created is a
boolean
        specifying whether an object was created.
        """
        assert len(kwargs), 'get_or_create() must be passed at least
one keyword argument'
        defaults = kwargs.pop('defaults', {})
        try:
            return self.get(**kwargs), False
        except self.model.DoesNotExist:

            lock.acquire()
            try :

                try :
                    res = self.get(**kwargs), False
                except self.model.DoesNotExist:
                    params = dict([(k, v) for k, v in kwargs.items()
if '__' not in k])
                    params.update(defaults)
                    obj = self.model(**params)
                    obj.save()
                    res = obj, True

            except Exception, e:
                lock.release()
                raise e

            lock.release()
            return res


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Joe Holloway  
View profile
 More options Sep 25 2007, 12:00 pm
From: "Joe Holloway" <jhollow...@gmail.com>
Date: Tue, 25 Sep 2007 11:00:19 -0500
Local: Tues, Sep 25 2007 12:00 pm
Subject: Re: Django 100% threadsafe with DB?
Do you have the transaction middleware enabled?  Given that you do not
have unique constraints on field Y/Z, I believe this would be the
expected behavior with transactions enabled.

Both threads conceptually have their own "picture" of what the
database looked like when the transaction was started.  That isolation
exists until the transaction is ultimately committed.

At the risk of over-explaining and over-simplifying, if the record did
not exist at the time each thread started its transaction, then it
doesn't matter that T1 hit the save operation first, T2 will not see
it.

If you're not using transactions then ignore my explanation, but it
might help to know which database engine you are using.

On 9/25/07, PyMan <c.playma...@tiscali.it> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Istvan Albert  
View profile
 More options Sep 25 2007, 3:32 pm
From: Istvan Albert <istvan.alb...@gmail.com>
Date: Tue, 25 Sep 2007 19:32:19 -0000
Local: Tues, Sep 25 2007 3:32 pm
Subject: Re: Django 100% threadsafe with DB?

Django is 0% threadsafe (as in nada, null or zilch)

it is not supposed to be run that way, but if you must keep locking
around every operation.

i.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mike Scott  
View profile
 More options Sep 25 2007, 6:43 pm
From: "Mike Scott" <mic...@gmail.com>
Date: Wed, 26 Sep 2007 10:43:08 +1200
Local: Tues, Sep 25 2007 6:43 pm
Subject: Re: Django 100% threadsafe with DB?

Istvan,

It should be threadsafe - the way web applications and web loads work mean
that lots of simultaneous connection will mean that it pretty much becomes a
threaded application, and for that reason I think more research should be
done into this sort of operation?

On 9/26/07, Istvan Albert <istvan.alb...@gmail.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Benjamin Slavin  
View profile
 More options Sep 25 2007, 8:27 pm
From: "Benjamin Slavin" <benjamin.sla...@gmail.com>
Date: Tue, 25 Sep 2007 20:27:52 -0400
Local: Tues, Sep 25 2007 8:27 pm
Subject: Re: Django 100% threadsafe with DB?
On 9/25/07, Mike Scott <mic...@gmail.com> wrote:

> It should be threadsafe - [... ] web applications [...] pretty much
> [become] a threaded application

Mike,

There are two issues here.  Thread safe and concurrent operation, and
they are very different issues (though there is overlap).

Django DOES supports concurrent operation (separate processes on the
same or multiple servers).

Django DOES NOT support threaded operation (and from what I've
gathered in past discussions on this list, is not likely to).

This is why Apache must be configured to use the prefork model instead
of the worker model.

In practice this doesn't tend to pose a problem for web deployments.
Both FCGI and Apache are designed so that they can work with non
thread-safe applications.

Where you may run into some difficulty is if you want to make a
multi-threaded backend application.

If you'd like to discuss this issue further, please bring it up on django-users.

> I think more research should be done into this sort of operation

Database locking has been discussed previously on this list and in a
number of related tickets in Trac.  This is currently the recommended
approach for ensuring data integrity in a Django app.  It is, however,
true that there are still some race conditions that require special
attention (get_or_create is one of them).

I agree that there should probably be more information about how to
handle massively parallel web applications, but that's not a
Django-specific concern.  Many web developers don't understand how to
handle these issues, and I haven't found a good resource to point
people toward (suggestions welcome).

I hope this helps to clarify things a bit.

 - Ben


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Graham Dumpleton  
View profile
 More options Sep 25 2007, 10:58 pm
From: Graham Dumpleton <Graham.Dumple...@gmail.com>
Date: Wed, 26 Sep 2007 02:58:08 -0000
Local: Tues, Sep 25 2007 10:58 pm
Subject: Re: Django 100% threadsafe with DB?
On Sep 26, 10:27 am, "Benjamin Slavin" <benjamin.sla...@gmail.com>
wrote:

> On 9/25/07, Mike Scott <mic...@gmail.com> wrote:

> > It should be threadsafe - [... ] web applications [...] pretty much
> > [become] a threaded application

> Mike,

> There are two issues here.  Thread safe and concurrent operation, and
> they are very different issues (though there is overlap).

> Django DOES supports concurrent operation (separate processes on the
> same or multiple servers).

> Django DOES NOT support threaded operation (and from what I've
> gathered in past discussions on this list, is not likely to).

Can you find the discussions on Google groups and post references to
them.

> This is why Apache must be configured to use the prefork model instead
> of the worker model.

In which case there should also be a warning that Django cannot be
used with Apache/mod_python on Windows as the Apache winnt MPM is also
multithreaded. Also, why  are there instructions posted for running a
FASTCGI process in multithreaded mode and that wouldn't be safe
either.

I have pointed out the Apache inconsistency before. At the same time,
there seems to be various people who have no problem running Django on
winnt and worker MPM. That FASTCGI example shows a threaded example
must also mean that is okay as well.

The most recent response I got was that any threading problems were
related to database backends and were fixed a long time ago and that
besides those issues, there weren't any specific things known of that
would be a problem in a multithreaded web server. There were also some
multithreading issues in mod_python <3.2.7 as well which may have been
making people think there were problems where there weren't.

http://groups.google.com/group/django-developers/browse_frm/thread/bf...
http://groups.google.com/group/django-developers/browse_frm/thread/c7...
... plus other posts I can't find right now.

Thus any issues with multithreading are perhaps more to do with how
people implement an application on top of Django. It would be nice
though to get some sort of official statement from the Django
developers on this one way or the other and document on the Django web
site what the issues are and what parts of Django if any do have
multithreading issues.

That you have made this statement that 'prefork' must be used, do you
do that as one of the developers?

> In practice this doesn't tend to pose a problem for web deployments.
> Both FCGI and Apache are designed so that they can work with non
> thread-safe applications.

Although Apache/mod_python can be setup for prefork MPM, it is not
ideal for Python web applications due to the generally large memory
requirements of the web frameworks. It is much more preferable that
worker MPM be used as it cuts down on the number of Apache child
processes. If you ever want Django to be taken up and offered as an
option by commodity web hosters then you must be able to support a
multithreaded server as they cannot afford the memory requirements of
mod_python, mod_wsgi or fastcgi solutions used in a multiprocess/
single threaded mode.

Can we please somehow settle this issue once and for all. I have tried
to get discussions going on this issue in the past but have got
minimal feedback. I thought that too a degree it had been determined
that multithreaded servers were okay, although users should though
ensure there own code is multithread safe, but now again someone is
saying that Django itself is not multithread safe. :-(

Graham


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Joseph Kocherhans  
View profile
 More options Sep 26 2007, 12:17 am
From: "Joseph Kocherhans" <jkocherh...@gmail.com>
Date: Tue, 25 Sep 2007 23:17:34 -0500
Local: Wed, Sep 26 2007 12:17 am
Subject: Re: Django 100% threadsafe with DB?
On 9/25/07, Graham Dumpleton <Graham.Dumple...@gmail.com> wrote:

> Can we please somehow settle this issue once and for all. I have tried
> to get discussions going on this issue in the past but have got
> minimal feedback. I thought that too a degree it had been determined
> that multithreaded servers were okay, although users should though
> ensure there own code is multithread safe, but now again someone is
> saying that Django itself is not multithread safe. :-(

I talked with Jacob about this quite a while ago and he told me that
Django was not originally written to be threadsafe. The only threading
problems I remember hearing about were with the database connections,
and those issues were fixed in #1442 [1]. To my knowledge, there has
never been any review of the code to check for other possible sticky
spots. I used to deploy Django on Windows and never had any threading
problems, but the sites were mostly low traffic, internal, and
probably not good candidates for exposing problems.

In short, Django was not *designed* to be threadsafe, but any obvious
problems that I'm aware of have been fixed. YMMV.

Joseph

[1] http://code.djangoproject.com/ticket/1442


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Derek Anderson  
View profile
 More options Sep 26 2007, 12:55 am
From: Derek Anderson <pub...@kered.org>
Date: Tue, 25 Sep 2007 23:55:58 -0500
Local: Wed, Sep 26 2007 12:55 am
Subject: Re: Django 100% threadsafe with DB?
 > In short, Django was not *designed* to be threadsafe, but any obvious
 > problems that I'm aware of have been fixed. YMMV.

that's scary.

but then again, python itself isn't multi-threaded.  (all threading is
faked - google "global interpreter lock". lazy s.o.b. python devs)  so
all your really hairy "c=c+1" type issues are already nixed.

so not so scary.

derek


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nicola Larosa  
View profile
 More options Sep 26 2007, 3:20 am
From: Nicola Larosa <nicola.lar...@gmail.com>
Date: Wed, 26 Sep 2007 09:20:08 +0200
Local: Wed, Sep 26 2007 3:20 am
Subject: Re: Django 100% threadsafe with DB?

Derek Anderson wrote:
> but then again, python itself isn't multi-threaded.  (all threading is
> faked - google "global interpreter lock". lazy s.o.b. python devs)  so
> all your really hairy "c=c+1" type issues are already nixed.

> so not so scary.

Right. What *is* is scary is how much people cling to the horrible hack
that preemptive multithreading is.

--
Nicola Larosa - http://www.tekNico.net/

Love is hate
War is peace
No is yes
And we're all free
 -- Tracy Chapman, Why?, Tracy Chapman, 1988


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Derek Anderson  
View profile
 More options Sep 26 2007, 3:54 am
From: Derek Anderson <pub...@kered.org>
Date: Wed, 26 Sep 2007 02:54:28 -0500
Local: Wed, Sep 26 2007 3:54 am
Subject: Re: Django 100% threadsafe with DB?
you mean to say cooperative multithreading, right?

if so, heck yeah.  dear lord in heaven yeah.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Fredrik Lundh  
View profile
 More options Sep 26 2007, 4:37 am
From: Fredrik Lundh <fred...@pythonware.com>
Date: Wed, 26 Sep 2007 10:37:22 +0200
Local: Wed, Sep 26 2007 4:37 am
Subject: Re: Django 100% threadsafe with DB?

Derek Anderson wrote:
> but then again, python itself isn't multi-threaded.  (all threading is
> faked - google "global interpreter lock". lazy s.o.b. python devs)

given that a stock CPython interpreter releases the lock in a few
hundred places, primarily around potentially long-running or blocking C
operations, claiming that "all threading is faked" is a bit misleading.
  maybe you should do a bit more research before you start calling
people names?

</F>


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Derek Anderson  
View profile
 More options Sep 26 2007, 5:01 am
From: Derek Anderson <pub