I was recently using the get_or_create method and noticed that it added
an entry twice into my database in very rare cases, because it
complained later that "get() returned more than one ___ -- it returned
2!". This problem didn't occur when using the development server.
My question is: Is there a way to avoid these race conditions by e.g.
locking the database? How can I do that?
This is my setup:
- Django (SVN) running with "python manage.py runfcgi method=prefork"
(there are 6 processes running)
- Lighttpd
- SQLite
To prove the race condition, I used the following model:
class Test(models.Model):
str = models.CharField(blank=True, null=True, maxlength=100)
The relevant part of my urls.conf is:
# Testing
(r'^test/$', 'test.main'),
(r'^test/process/$', 'test.process'),
And here's the view:
--- CUT HERE ---
from django.http import HttpResponse
import models
from json import load
def main(request):
return HttpResponse("""
<html>
<head>
<script src="http://prototype.conio.net/dist/prototype-1.4.0.js"
type="text/javascript"></script>
<script src="http://www.json.org/json.js"
type="text/javascript"></script>
<script type="text/javascript">
function send_request(callback, text)
{
new Ajax.Request('process/' /* URL */, {
postBody:'text='+encodeURIComponent(text),
onComplete:callback
});
}
function debug(str)
{
var date = new Date();
str = date.toTimeString().substr(0, 8)+' '+str+'<br />';
$('debug').innerHTML = str+$('debug').innerHTML;
}
function cb(req)
{
if (req.status == 200)
debug('Received response: '+req.responseText);
else
debug('error '+req.status+' '+req.responseText);
}
function clicked()
{
send_request(cb, $('text').value);
send_request(cb, $('text').value);
send_request(cb, $('text').value);
send_request(cb, $('text').value);
send_request(cb, $('text').value);
send_request(cb, $('text').value);
send_request(cb, $('text').value);
send_request(cb, $('text').value);
send_request(cb, $('text').value);
send_request(cb, $('text').value);
}
</script>
</head>
<body>
<h1>test</h1>
<input type="text" id="text" value="x" />
<input type="button" value="Test" onclick="clicked();" />
<div id="debug">
</div>
</body>
</html>
""")
def process(request):
if request.POST.get('text'):
t =
models.Test.objects.get_or_create(str=request.POST.get('text'))[0]
return HttpResponse('%s %s' % (request.POST.get('text'), t.id))
else:
return HttpResponse('Missing argument')
--- CUT HERE ---
When I play around with it, I sometimes get this debug output followed
by an AssertionError (notice the different IDs):
20:07:41 Received response: blah 21
20:07:41 Received response: blah 19
20:07:41 Received response: blah 20
20:07:41 Received response: blah 19
I know that in this specific case it can be enforced at database level
using the unique and unique_together properties, but how would you
handle e.g. this scenario?
1 def transfer_money(from_id, to_id, amount)
2 from = models.BankAccount.objects.get(id=from_id)
3 to = models.BankAccount.objects.get(id=to_id)
4 from.balance -= amount
5 to.balance += amount
6 from.save()
7 to.save()
What, if two Django processes transfer money from account A to account
B simultaneously?
tom
> I know that in this specific case it can be enforced at database level
> using the unique and unique_together properties, but how would you
> handle e.g. this scenario?
>
> 1 def transfer_money(from_id, to_id, amount)
> 2 from = models.BankAccount.objects.get(id=from_id)
> 3 to = models.BankAccount.objects.get(id=to_id)
> 4 from.balance -= amount
> 5 to.balance += amount
> 6 from.save()
> 7 to.save()
>
> What, if two Django processes transfer money from account A to account
> B simultaneously?
>
these situations are (afaik) handled using transactions....simply start
a transaction at the beginning, and commit/rollback at the end...
gabor
SQLite is *not* something to be using if proper locking and safe
concurrent access will be important; SQLite's own docs point out a
couple of cases where its locking just won't work, and they recommend
moving to a client-server RDBMS like PostgreSQL or MySQL if safe
concurrent access is a major issue -- those systems implement much
finer-grained control, for example, where SQLite has to lock the
entire DB file for writing.
--
"May the forces of evil become confused on the way to your house."
-- George Carlin
you're right, Postgres don't do a block in the entire db but if two
proccess want to write, they going to do it anyway, which one finish
first? it doesn't matter. So, IMVHO, that isn't the problem.
>
I already included
"django.middleware.transaction.TransactionMiddleware" in my
MIDDLEWARE_CLASSES (as described on
http://www.djangoproject.com/documentation/transactions/#tying-transactions-to-http-requests)
and it didn't help.
tom
you mean it did not help with the money-transfer code?
gabor
It did not help with the code I posted that was using get_or_create(),
because I didn't use unique_together... I don't think it would help
with the money-transfer code.
The documentation says:
"It works like this: When a request starts, Django starts a
transaction. If the response is produced without problems, Django
commits any pending transactions. If the view function produces an
exception, Django rolls back any pending transactions."
The problem is that the code doesn't produce an exception, so Django
will not roll back anything...
tom
ok, then maybe this is a misunderstanding, so let's restart the discussion:
what is your question? :)
gabor
This is a tricky problem, not least because each of the DBs we support
has a slightly different transaction setup.
If it were me doing this, I'd isolate the critical areas of the code
where it's possible for a race condition to occur, and manually
execute the appropriate statements to secure whatever I was doing.
For example, if I were using PostgreSQL, I'd use the 'commit_manually'
decorator on my view, and manually set the transaction isolation level
to SERIALIZABLE before doing anything else. For truly critical
operations, I'd probably just go ahead and grab an exclusive lock on
whatever I needed, and force other transactions to wait.
Because the requirements of different use cases vary so much, and
because no two DBs handle it exactly the same way, I don't think we
could ever have a successful automated solution.
how would you that?
would you use something in the db? (from what i know about transactions
(very little :), they "solve" the potential conflicts by simply
reporting an error-condition to one of the "writers", so then he has to
retry. but is there some simply (multiprocess-usable) way to lock (as in
"wait until this guy is finished")?
or would you rather do it with something completely different? a
temporary lock-file in a temp-directory perhaps?
gabor
http://www.postgresql.org/docs/8.1/static/sql-lock.html
thanks a lot
(and again something that would be relatively easy to find with a
google("postgresql lock"). next time i will do the googling before
asking, i promise (/me makes a mental note of it AGAINNNNNNN) :)
gabor
This is PostgreSQL specific. There seems also to be something like that
for MySQL: http://dev.mysql.com/doc/refman/5.0/en/lock-tables.html (it
looks like you have to do an UNLOCK TABLES on MySQL, whereas PostgreSQL
releases the lock at transaction end).
Shouldn't there be a function built-in into Django that locks the table
(if the database supports it)? IMHO functions like get_or_create()
should try to lock and unlock the table automatically.
I am not doing critical operations like transferring money, but I
noticed that the race condition happens quite often when using
get_or_create and a lot of AJAX requests, so I need to find a solution.
I will try using PostgreSQL instead of SQLite, maybe this will reduce
the probability of the race condition to happen ;-)
tom
Speaking of this, I also miss a way to do SELECT ... FOR UPDATE.
> Shouldn't there be a function built-in into Django that locks the table
> (if the database supports it)? IMHO functions like get_or_create()
> should try to lock and unlock the table automatically.
Oh no! Are you aware of the impact you can get when you lock a
table under heavy load? This is really very specific of your
application. Maybe for your case it's fine to lock the entire
table. Others might specify more rigid unique constraints and
deal with the exceptions; sometimes a SELECT ... FOR UPDATE is
the right thing.
> I am not doing critical operations like transferring money, but I
> noticed that the race condition happens quite often when using
> get_or_create and a lot of AJAX requests, so I need to find a solution.
It seems you need to spend more thoughts on multi user issues
within your application. It's easy to fall for that in the
beginning, but you need to deal with concurrent access. You must
be aware for all transactions what impact other transactions can
have. There is no silver bullet ... (please repeat ;-)
> I will try using PostgreSQL instead of SQLite, maybe this will reduce
> the probability of the race condition to happen ;-)
Hmm, this might be a first step, but it won't magically solve all
multi user issues. This *is* hard stuff.
Michael
Simply switching to PostgreSQL didn't solve anything. The race
condition was still there.
I am not sure but I think that a lock would be okay for my application,
as I don't want to deal with exceptions. What do you think about the
following code?
def process(request):
if request.POST.get('text'):
cursor = connection.cursor()
# Acquire a lock
cursor.execute('LOCK TABLE testapp_test IN ACCESS EXCLUSIVE
MODE')
t =
models.Test.objects.get_or_create(str=request.POST.get('text'))[0]
# Release the lock
transaction.commit()
return HttpResponse('%s %s' % (request.POST.get('text'), t.id))
else:
return HttpResponse('Missing argument')
process = transaction.commit_manually(process)
tom
No, no, a thousand times no :)
Even though modern database engines do a few things to help avoid the
worst of the problems, automated locking is fraught with peril. Let's
take a simple example... assume you have a UserProfile model tied
one-to-one to User, and the following happens:
Request A wants to create a new user, so it locks the User table for
writing and, because it knows it needs to create a UserProfile as
well, tries to lock that table.
At the same time, request B also wants to create a new user, but it
gets to the UserProfile table first and locks it, then tries to lock
the User table.
And... your application dies. Request A won't release the User table
until it can get a lock on the UserProfile table, but request B won't
release the UserProfile table until it can get a lock on the User
table. You are now in the delightful situation known as "deadlock",
where nothing can go forward because everybody's waiting for locks
that can never happen.
There are ways to reduce the likelihood of a deadlock (see any of the
solutions to the "dining philosophers problem", for example), but
automatic locking by the framework introduces so much additional
complexity and so many additional pitfalls that I don't think it's
worth the payoff. If you're writing an application which has parts
that engage in heavily concurrent writing, you're going to need to
learn your database's locking and isolation features, and take
advantage of them (incidentally, transaction isolation is often
simpler to work with than locking, because it doesn't raise the
possibility of deadlocks).
I bet you mean that get_or_create shouldn't lock the table.
That's really not a good idea.
But what about a database independent command for locking the
table? I'm not sure if it's really worth, but it's certainly not
a "thousand times no", or did I miss anything? (And, by the way,
<commercial>
What about
SELECT ... FOR UPDATE
</commercial>)
Michael
I meant Django should never implicitly/automatically lock a table; if
you decide you need to lock, you should have to explicitly call
something. I'm ambivalent on whether we should provide a syntax for a
"database independent" lock/unlock, but I think it could be
problematic to design -- for one thing, I'm not sure where it would
fit, and for another I'm not sure how best to implement it (since, for
example Postgres automatically releases the lock on COMMIT or
ROLLBACK, but MySQL requires you to explicitly UNLOCK)
Michael, I know you're already aware of this (heck, you're CCed on the
ticket), but for others...
I created a patch in ticket http://code.djangoproject.com/ticket/2705
to allow a .for_update() modifier to be applied to a QuerySet, so you
can do:
something = models.Something.objects.filter(id=333).for_update().get()
This wouldn't solve the get_or_create() problem, but it is helpful for
a lot of other use cases.
--Ben