I have a function that do some get_or_create on model X giving as parameters the field Y and Z. The same function is running on different threads, so it can happen that more get_or_create on model X and fields Y/Z are called at the same time. It can also happen that the values of Y/Z are the same in the different threads and here it comes the problem: it happens that get_or_create says it returned more than one row for X-Y/Z.
Now Y/Z are not unique in my model and I do not have an overloading of the save function, but I think I can have the same problem if Y/Z were unique too.
Looking at the DB I have more than one row with same values...how comes this? Here the get_or_create...
def get_or_create(self, **kwargs): """ Looks up an object with the given kwargs, creating one if necessary. Returns a tuple of (object, created), where created is a boolean specifying whether an object was created. """ assert len(kwargs), 'get_or_create() must be passed at least one keyword argument' defaults = kwargs.pop('defaults', {}) try: return self.get(**kwargs), False except self.model.DoesNotExist: params = dict([(k, v) for k, v in kwargs.items() if '__' not in k]) params.update(defaults) obj = self.model(**params) obj.save() return obj, True
Well...looking at the function I can say it happens in this way (and I did by debug too) :
time 1 : T1 (thread 1) call get_or_create and does the self.get and it goes in exception for DoesNotExists time 2 : T2 do the same and goes in exception too for the same reason time 3 : T1 goes on with the exception and creates the object and gives it back time 4 : T2 the same (creating another one!) time 5 : any T (T1 or T2 or T3) who calls get_or_create again with same X-Y/Z...the self.get gets crazy.
If the Y/Z were unique I just think the whole thing would fail at time 4...a little bit different, but still a problem.
I could solve all the thing in this way, but I hope there is a better solution that I'm missing :
from threading import Lock lock = Lock()
def get_or_create(self, **kwargs): """ Looks up an object with the given kwargs, creating one if necessary. Returns a tuple of (object, created), where created is a boolean specifying whether an object was created. """ assert len(kwargs), 'get_or_create() must be passed at least one keyword argument' defaults = kwargs.pop('defaults', {}) try: return self.get(**kwargs), False except self.model.DoesNotExist:
lock.acquire() try :
try : res = self.get(**kwargs), False except self.model.DoesNotExist: params = dict([(k, v) for k, v in kwargs.items() if '__' not in k]) params.update(defaults) obj = self.model(**params) obj.save() res = obj, True
Do you have the transaction middleware enabled? Given that you do not have unique constraints on field Y/Z, I believe this would be the expected behavior with transactions enabled.
Both threads conceptually have their own "picture" of what the database looked like when the transaction was started. That isolation exists until the transaction is ultimately committed.
At the risk of over-explaining and over-simplifying, if the record did not exist at the time each thread started its transaction, then it doesn't matter that T1 hit the save operation first, T2 will not see it.
If you're not using transactions then ignore my explanation, but it might help to know which database engine you are using.
> time 1 : T1 (thread 1) call get_or_create and does the self.get and it > goes in exception for DoesNotExists > time 2 : T2 do the same and goes in exception too for the same reason > time 3 : T1 goes on with the exception and creates the object and > gives it back > time 4 : T2 the same (creating another one!) > time 5 : any T (T1 or T2 or T3) who calls get_or_create again with > same X-Y/Z...the self.get gets crazy.
It should be threadsafe - the way web applications and web loads work mean that lots of simultaneous connection will mean that it pretty much becomes a threaded application, and for that reason I think more research should be done into this sort of operation?
On 9/26/07, Istvan Albert <istvan.alb...@gmail.com> wrote:
> It should be threadsafe - [... ] web applications [...] pretty much > [become] a threaded application
Mike,
There are two issues here. Thread safe and concurrent operation, and they are very different issues (though there is overlap).
Django DOES supports concurrent operation (separate processes on the same or multiple servers).
Django DOES NOT support threaded operation (and from what I've gathered in past discussions on this list, is not likely to).
This is why Apache must be configured to use the prefork model instead of the worker model.
In practice this doesn't tend to pose a problem for web deployments. Both FCGI and Apache are designed so that they can work with non thread-safe applications.
Where you may run into some difficulty is if you want to make a multi-threaded backend application.
If you'd like to discuss this issue further, please bring it up on django-users.
> I think more research should be done into this sort of operation
Database locking has been discussed previously on this list and in a number of related tickets in Trac. This is currently the recommended approach for ensuring data integrity in a Django app. It is, however, true that there are still some race conditions that require special attention (get_or_create is one of them).
I agree that there should probably be more information about how to handle massively parallel web applications, but that's not a Django-specific concern. Many web developers don't understand how to handle these issues, and I haven't found a good resource to point people toward (suggestions welcome).
On Sep 26, 10:27 am, "Benjamin Slavin" <benjamin.sla...@gmail.com> wrote:
> On 9/25/07, Mike Scott <mic...@gmail.com> wrote:
> > It should be threadsafe - [... ] web applications [...] pretty much > > [become] a threaded application
> Mike,
> There are two issues here. Thread safe and concurrent operation, and > they are very different issues (though there is overlap).
> Django DOES supports concurrent operation (separate processes on the > same or multiple servers).
> Django DOES NOT support threaded operation (and from what I've > gathered in past discussions on this list, is not likely to).
Can you find the discussions on Google groups and post references to them.
> This is why Apache must be configured to use the prefork model instead > of the worker model.
In which case there should also be a warning that Django cannot be used with Apache/mod_python on Windows as the Apache winnt MPM is also multithreaded. Also, why are there instructions posted for running a FASTCGI process in multithreaded mode and that wouldn't be safe either.
I have pointed out the Apache inconsistency before. At the same time, there seems to be various people who have no problem running Django on winnt and worker MPM. That FASTCGI example shows a threaded example must also mean that is okay as well.
The most recent response I got was that any threading problems were related to database backends and were fixed a long time ago and that besides those issues, there weren't any specific things known of that would be a problem in a multithreaded web server. There were also some multithreading issues in mod_python <3.2.7 as well which may have been making people think there were problems where there weren't.
Thus any issues with multithreading are perhaps more to do with how people implement an application on top of Django. It would be nice though to get some sort of official statement from the Django developers on this one way or the other and document on the Django web site what the issues are and what parts of Django if any do have multithreading issues.
That you have made this statement that 'prefork' must be used, do you do that as one of the developers?
> In practice this doesn't tend to pose a problem for web deployments. > Both FCGI and Apache are designed so that they can work with non > thread-safe applications.
Although Apache/mod_python can be setup for prefork MPM, it is not ideal for Python web applications due to the generally large memory requirements of the web frameworks. It is much more preferable that worker MPM be used as it cuts down on the number of Apache child processes. If you ever want Django to be taken up and offered as an option by commodity web hosters then you must be able to support a multithreaded server as they cannot afford the memory requirements of mod_python, mod_wsgi or fastcgi solutions used in a multiprocess/ single threaded mode.
Can we please somehow settle this issue once and for all. I have tried to get discussions going on this issue in the past but have got minimal feedback. I thought that too a degree it had been determined that multithreaded servers were okay, although users should though ensure there own code is multithread safe, but now again someone is saying that Django itself is not multithread safe. :-(
On 9/25/07, Graham Dumpleton <Graham.Dumple...@gmail.com> wrote:
> Can we please somehow settle this issue once and for all. I have tried > to get discussions going on this issue in the past but have got > minimal feedback. I thought that too a degree it had been determined > that multithreaded servers were okay, although users should though > ensure there own code is multithread safe, but now again someone is > saying that Django itself is not multithread safe. :-(
I talked with Jacob about this quite a while ago and he told me that Django was not originally written to be threadsafe. The only threading problems I remember hearing about were with the database connections, and those issues were fixed in #1442 [1]. To my knowledge, there has never been any review of the code to check for other possible sticky spots. I used to deploy Django on Windows and never had any threading problems, but the sites were mostly low traffic, internal, and probably not good candidates for exposing problems.
In short, Django was not *designed* to be threadsafe, but any obvious problems that I'm aware of have been fixed. YMMV.
> In short, Django was not *designed* to be threadsafe, but any obvious > problems that I'm aware of have been fixed. YMMV.
that's scary.
but then again, python itself isn't multi-threaded. (all threading is faked - google "global interpreter lock". lazy s.o.b. python devs) so all your really hairy "c=c+1" type issues are already nixed.
Joseph Kocherhans wrote: > On 9/25/07, Graham Dumpleton <Graham.Dumple...@gmail.com> wrote: >> Can we please somehow settle this issue once and for all. I have tried >> to get discussions going on this issue in the past but have got >> minimal feedback. I thought that too a degree it had been determined >> that multithreaded servers were okay, although users should though >> ensure there own code is multithread safe, but now again someone is >> saying that Django itself is not multithread safe. :-(
> I talked with Jacob about this quite a while ago and he told me that > Django was not originally written to be threadsafe. The only threading > problems I remember hearing about were with the database connections, > and those issues were fixed in #1442 [1]. To my knowledge, there has > never been any review of the code to check for other possible sticky > spots. I used to deploy Django on Windows and never had any threading > problems, but the sites were mostly low traffic, internal, and > probably not good candidates for exposing problems.
> In short, Django was not *designed* to be threadsafe, but any obvious > problems that I'm aware of have been fixed. YMMV.
Derek Anderson wrote: > but then again, python itself isn't multi-threaded. (all threading is > faked - google "global interpreter lock". lazy s.o.b. python devs) so > all your really hairy "c=c+1" type issues are already nixed.
> so not so scary.
Right. What *is* is scary is how much people cling to the horrible hack that preemptive multithreading is.
Nicola Larosa wrote: > Derek Anderson wrote: >> but then again, python itself isn't multi-threaded. (all threading is >> faked - google "global interpreter lock". lazy s.o.b. python devs) so >> all your really hairy "c=c+1" type issues are already nixed.
>> so not so scary.
> Right. What *is* is scary is how much people cling to the horrible hack > that preemptive multithreading is.
Derek Anderson wrote: > but then again, python itself isn't multi-threaded. (all threading is > faked - google "global interpreter lock". lazy s.o.b. python devs)
given that a stock CPython interpreter releases the lock in a few hundred places, primarily around potentially long-running or blocking C operations, claiming that "all threading is faked" is a bit misleading. maybe you should do a bit more research before you start calling people names?