Database race conditions when using multiple processes

1,344 views
Skip to first unread message

Thomas Steinacher

unread,
Oct 16, 2006, 2:33:19 PM10/16/06
to Django developers
Hello,

I was recently using the get_or_create method and noticed that it added
an entry twice into my database in very rare cases, because it
complained later that "get() returned more than one ___ -- it returned
2!". This problem didn't occur when using the development server.

My question is: Is there a way to avoid these race conditions by e.g.
locking the database? How can I do that?

This is my setup:

- Django (SVN) running with "python manage.py runfcgi method=prefork"
(there are 6 processes running)
- Lighttpd
- SQLite

To prove the race condition, I used the following model:

class Test(models.Model):
str = models.CharField(blank=True, null=True, maxlength=100)

The relevant part of my urls.conf is:

# Testing
(r'^test/$', 'test.main'),
(r'^test/process/$', 'test.process'),

And here's the view:

--- CUT HERE ---

from django.http import HttpResponse

import models
from json import load

def main(request):
return HttpResponse("""
<html>
<head>

<script src="http://prototype.conio.net/dist/prototype-1.4.0.js"
type="text/javascript"></script>
<script src="http://www.json.org/json.js"
type="text/javascript"></script>
<script type="text/javascript">

function send_request(callback, text)
{
new Ajax.Request('process/' /* URL */, {
postBody:'text='+encodeURIComponent(text),
onComplete:callback
});
}

function debug(str)
{
var date = new Date();
str = date.toTimeString().substr(0, 8)+' '+str+'<br />';
$('debug').innerHTML = str+$('debug').innerHTML;
}

function cb(req)
{
if (req.status == 200)
debug('Received response: '+req.responseText);
else
debug('error '+req.status+' '+req.responseText);
}

function clicked()
{
send_request(cb, $('text').value);
send_request(cb, $('text').value);
send_request(cb, $('text').value);
send_request(cb, $('text').value);
send_request(cb, $('text').value);
send_request(cb, $('text').value);
send_request(cb, $('text').value);
send_request(cb, $('text').value);
send_request(cb, $('text').value);
send_request(cb, $('text').value);
}
</script>
</head>
<body>
<h1>test</h1>
<input type="text" id="text" value="x" />
<input type="button" value="Test" onclick="clicked();" />
<div id="debug">
</div>
</body>
</html>
""")

def process(request):
if request.POST.get('text'):
t =
models.Test.objects.get_or_create(str=request.POST.get('text'))[0]
return HttpResponse('%s %s' % (request.POST.get('text'), t.id))
else:
return HttpResponse('Missing argument')

--- CUT HERE ---

When I play around with it, I sometimes get this debug output followed
by an AssertionError (notice the different IDs):

20:07:41 Received response: blah 21
20:07:41 Received response: blah 19
20:07:41 Received response: blah 20
20:07:41 Received response: blah 19

I know that in this specific case it can be enforced at database level
using the unique and unique_together properties, but how would you
handle e.g. this scenario?

1 def transfer_money(from_id, to_id, amount)
2 from = models.BankAccount.objects.get(id=from_id)
3 to = models.BankAccount.objects.get(id=to_id)
4 from.balance -= amount
5 to.balance += amount
6 from.save()
7 to.save()

What, if two Django processes transfer money from account A to account
B simultaneously?

tom

gabor

unread,
Oct 16, 2006, 2:46:33 PM10/16/06
to django-d...@googlegroups.com
Thomas Steinacher wrote:

> I know that in this specific case it can be enforced at database level
> using the unique and unique_together properties, but how would you
> handle e.g. this scenario?
>
> 1 def transfer_money(from_id, to_id, amount)
> 2 from = models.BankAccount.objects.get(id=from_id)
> 3 to = models.BankAccount.objects.get(id=to_id)
> 4 from.balance -= amount
> 5 to.balance += amount
> 6 from.save()
> 7 to.save()
>
> What, if two Django processes transfer money from account A to account
> B simultaneously?
>

these situations are (afaik) handled using transactions....simply start
a transaction at the beginning, and commit/rollback at the end...

gabor

James Bennett

unread,
Oct 16, 2006, 2:50:42 PM10/16/06
to django-d...@googlegroups.com
On 10/16/06, Thomas Steinacher <thom...@gmail.com> wrote:
> This is my setup:
>
> - Django (SVN) running with "python manage.py runfcgi method=prefork"
> (there are 6 processes running)
> - Lighttpd
> - SQLite

SQLite is *not* something to be using if proper locking and safe
concurrent access will be important; SQLite's own docs point out a
couple of cases where its locking just won't work, and they recommend
moving to a client-server RDBMS like PostgreSQL or MySQL if safe
concurrent access is a major issue -- those systems implement much
finer-grained control, for example, where SQLite has to lock the
entire DB file for writing.

--
"May the forces of evil become confused on the way to your house."
-- George Carlin

Mario Gonzalez ( mario__ )

unread,
Oct 16, 2006, 2:56:25 PM10/16/06
to django-d...@googlegroups.com
On 16/10/06, James Bennett <ubern...@gmail.com> wrote:
>
> SQLite is *not* something to be using if proper locking and safe
> concurrent access will be important; SQLite's own docs point out a
> couple of cases where its locking just won't work, and they recommend
> moving to a client-server RDBMS like PostgreSQL or MySQL if safe

you're right, Postgres don't do a block in the entire db but if two
proccess want to write, they going to do it anyway, which one finish
first? it doesn't matter. So, IMVHO, that isn't the problem.

>

--
http://www.advogato.org/person/mgonzalez/

Thomas Steinacher

unread,
Oct 16, 2006, 3:09:48 PM10/16/06
to Django developers
On Oct 16, 2006, at 8:46 PM, gabor wrote:
> > What, if two Django processes transfer money from account A to account
> > B simultaneously?
>
> these situations are (afaik) handled using transactions....simply start
> a transaction at the beginning, and commit/rollback at the end...

I already included
"django.middleware.transaction.TransactionMiddleware" in my
MIDDLEWARE_CLASSES (as described on
http://www.djangoproject.com/documentation/transactions/#tying-transactions-to-http-requests)
and it didn't help.

tom

gabor

unread,
Oct 16, 2006, 3:31:11 PM10/16/06
to django-d...@googlegroups.com

you mean it did not help with the money-transfer code?

gabor

Thomas Steinacher

unread,
Oct 16, 2006, 3:38:54 PM10/16/06
to Django developers
gabor wrote:

> Thomas Steinacher wrote:
> > I already included
> > "django.middleware.transaction.TransactionMiddleware" in my
> > MIDDLEWARE_CLASSES (as described on
> > http://www.djangoproject.com/documentation/transactions/#tying-transactions-to-http-requests)
> > and it didn't help.
> >
>
> you mean it did not help with the money-transfer code?
>
> gabor

It did not help with the code I posted that was using get_or_create(),
because I didn't use unique_together... I don't think it would help
with the money-transfer code.

The documentation says:

"It works like this: When a request starts, Django starts a
transaction. If the response is produced without problems, Django
commits any pending transactions. If the view function produces an
exception, Django rolls back any pending transactions."

The problem is that the code doesn't produce an exception, so Django
will not roll back anything...

tom

gabor

unread,
Oct 16, 2006, 4:12:53 PM10/16/06
to django-d...@googlegroups.com

ok, then maybe this is a misunderstanding, so let's restart the discussion:

what is your question? :)

gabor

James Bennett

unread,
Oct 16, 2006, 4:36:33 PM10/16/06
to django-d...@googlegroups.com
On 10/16/06, Thomas Steinacher <thom...@gmail.com> wrote:
> It did not help with the code I posted that was using get_or_create(),
> because I didn't use unique_together... I don't think it would help
> with the money-transfer code.

This is a tricky problem, not least because each of the DBs we support
has a slightly different transaction setup.

If it were me doing this, I'd isolate the critical areas of the code
where it's possible for a race condition to occur, and manually
execute the appropriate statements to secure whatever I was doing.

For example, if I were using PostgreSQL, I'd use the 'commit_manually'
decorator on my view, and manually set the transaction isolation level
to SERIALIZABLE before doing anything else. For truly critical
operations, I'd probably just go ahead and grab an exclusive lock on
whatever I needed, and force other transactions to wait.

Because the requirements of different use cases vary so much, and
because no two DBs handle it exactly the same way, I don't think we
could ever have a successful automated solution.

gabor

unread,
Oct 16, 2006, 5:07:59 PM10/16/06
to django-d...@googlegroups.com
James Bennett wrote:
> For truly critical
> operations, I'd probably just go ahead and grab an exclusive lock on
> whatever I needed, and force other transactions to wait.

how would you that?

would you use something in the db? (from what i know about transactions
(very little :), they "solve" the potential conflicts by simply
reporting an error-condition to one of the "writers", so then he has to
retry. but is there some simply (multiprocess-usable) way to lock (as in
"wait until this guy is finished")?

or would you rather do it with something completely different? a
temporary lock-file in a temp-directory perhaps?

gabor

James Bennett

unread,
Oct 16, 2006, 5:18:44 PM10/16/06
to django-d...@googlegroups.com
On 10/16/06, gabor <ga...@nekomancer.net> wrote:
> would you use something in the db? (from what i know about transactions
> (very little :), they "solve" the potential conflicts by simply
> reporting an error-condition to one of the "writers", so then he has to
> retry. but is there some simply (multiprocess-usable) way to lock (as in
> "wait until this guy is finished")?

http://www.postgresql.org/docs/8.1/static/sql-lock.html

gabor

unread,
Oct 16, 2006, 5:29:10 PM10/16/06
to django-d...@googlegroups.com
James Bennett wrote:
> On 10/16/06, gabor <ga...@nekomancer.net> wrote:
>> would you use something in the db? (from what i know about transactions
>> (very little :), they "solve" the potential conflicts by simply
>> reporting an error-condition to one of the "writers", so then he has to
>> retry. but is there some simply (multiprocess-usable) way to lock (as in
>> "wait until this guy is finished")?
>
> http://www.postgresql.org/docs/8.1/static/sql-lock.html
>

thanks a lot


(and again something that would be relatively easy to find with a
google("postgresql lock"). next time i will do the googling before
asking, i promise (/me makes a mental note of it AGAINNNNNNN) :)

gabor

Thomas Steinacher

unread,
Oct 17, 2006, 3:59:36 AM10/17/06
to Django developers
James Bennett wrote:
> On 10/16/06, gabor <ga...@nekomancer.net> wrote:
> > would you use something in the db? (from what i know about transactions
> > (very little :), they "solve" the potential conflicts by simply
> > reporting an error-condition to one of the "writers", so then he has to
> > retry. but is there some simply (multiprocess-usable) way to lock (as in
> > "wait until this guy is finished")?
>
> http://www.postgresql.org/docs/8.1/static/sql-lock.html

This is PostgreSQL specific. There seems also to be something like that
for MySQL: http://dev.mysql.com/doc/refman/5.0/en/lock-tables.html (it
looks like you have to do an UNLOCK TABLES on MySQL, whereas PostgreSQL
releases the lock at transaction end).

Shouldn't there be a function built-in into Django that locks the table
(if the database supports it)? IMHO functions like get_or_create()
should try to lock and unlock the table automatically.

I am not doing critical operations like transferring money, but I
noticed that the race condition happens quite often when using
get_or_create and a lot of AJAX requests, so I need to find a solution.

I will try using PostgreSQL instead of SQLite, maybe this will reduce
the probability of the race condition to happen ;-)

tom

Michael Radziej

unread,
Oct 17, 2006, 5:11:43 AM10/17/06
to django-d...@googlegroups.com
Thomas Steinacher schrieb:

> James Bennett wrote:
>> On 10/16/06, gabor <ga...@nekomancer.net> wrote:
>>> would you use something in the db? (from what i know about transactions
>>> (very little :), they "solve" the potential conflicts by simply
>>> reporting an error-condition to one of the "writers", so then he has to
>>> retry. but is there some simply (multiprocess-usable) way to lock (as in
>>> "wait until this guy is finished")?
>> http://www.postgresql.org/docs/8.1/static/sql-lock.html
>
> This is PostgreSQL specific. There seems also to be something like that
> for MySQL: http://dev.mysql.com/doc/refman/5.0/en/lock-tables.html (it
> looks like you have to do an UNLOCK TABLES on MySQL, whereas PostgreSQL
> releases the lock at transaction end).

Speaking of this, I also miss a way to do SELECT ... FOR UPDATE.

> Shouldn't there be a function built-in into Django that locks the table
> (if the database supports it)? IMHO functions like get_or_create()
> should try to lock and unlock the table automatically.

Oh no! Are you aware of the impact you can get when you lock a
table under heavy load? This is really very specific of your
application. Maybe for your case it's fine to lock the entire
table. Others might specify more rigid unique constraints and
deal with the exceptions; sometimes a SELECT ... FOR UPDATE is
the right thing.

> I am not doing critical operations like transferring money, but I
> noticed that the race condition happens quite often when using
> get_or_create and a lot of AJAX requests, so I need to find a solution.

It seems you need to spend more thoughts on multi user issues
within your application. It's easy to fall for that in the
beginning, but you need to deal with concurrent access. You must
be aware for all transactions what impact other transactions can
have. There is no silver bullet ... (please repeat ;-)

> I will try using PostgreSQL instead of SQLite, maybe this will reduce
> the probability of the race condition to happen ;-)

Hmm, this might be a first step, but it won't magically solve all
multi user issues. This *is* hard stuff.

Michael

Thomas Steinacher

unread,
Oct 17, 2006, 7:24:32 AM10/17/06
to Django developers
Michael Radziej wrote:
> It seems you need to spend more thoughts on multi user issues
> within your application. It's easy to fall for that in the
> beginning, but you need to deal with concurrent access. You must
> be aware for all transactions what impact other transactions can
> have. There is no silver bullet ... (please repeat ;-)
>
> > I will try using PostgreSQL instead of SQLite, maybe this will reduce
> > the probability of the race condition to happen ;-)
>
> Hmm, this might be a first step, but it won't magically solve all
> multi user issues. This *is* hard stuff.

Simply switching to PostgreSQL didn't solve anything. The race
condition was still there.

I am not sure but I think that a lock would be okay for my application,
as I don't want to deal with exceptions. What do you think about the
following code?


def process(request):
if request.POST.get('text'):

cursor = connection.cursor()

# Acquire a lock
cursor.execute('LOCK TABLE testapp_test IN ACCESS EXCLUSIVE
MODE')


t =
models.Test.objects.get_or_create(str=request.POST.get('text'))[0]

# Release the lock
transaction.commit()

return HttpResponse('%s %s' % (request.POST.get('text'), t.id))
else:
return HttpResponse('Missing argument')

process = transaction.commit_manually(process)


tom

James Bennett

unread,
Oct 17, 2006, 7:27:14 AM10/17/06
to django-d...@googlegroups.com
On 10/17/06, Thomas Steinacher <thom...@gmail.com> wrote:
> Shouldn't there be a function built-in into Django that locks the table
> (if the database supports it)? IMHO functions like get_or_create()
> should try to lock and unlock the table automatically.

No, no, a thousand times no :)

Even though modern database engines do a few things to help avoid the
worst of the problems, automated locking is fraught with peril. Let's
take a simple example... assume you have a UserProfile model tied
one-to-one to User, and the following happens:

Request A wants to create a new user, so it locks the User table for
writing and, because it knows it needs to create a UserProfile as
well, tries to lock that table.

At the same time, request B also wants to create a new user, but it
gets to the UserProfile table first and locks it, then tries to lock
the User table.

And... your application dies. Request A won't release the User table
until it can get a lock on the UserProfile table, but request B won't
release the UserProfile table until it can get a lock on the User
table. You are now in the delightful situation known as "deadlock",
where nothing can go forward because everybody's waiting for locks
that can never happen.

There are ways to reduce the likelihood of a deadlock (see any of the
solutions to the "dining philosophers problem", for example), but
automatic locking by the framework introduces so much additional
complexity and so many additional pitfalls that I don't think it's
worth the payoff. If you're writing an application which has parts
that engage in heavily concurrent writing, you're going to need to
learn your database's locking and isolation features, and take
advantage of them (incidentally, transaction isolation is often
simpler to work with than locking, because it doesn't raise the
possibility of deadlocks).

Michael Radziej

unread,
Oct 17, 2006, 7:59:47 AM10/17/06
to django-d...@googlegroups.com
James Bennett schrieb:

> On 10/17/06, Thomas Steinacher <thom...@gmail.com> wrote:
>> Shouldn't there be a function built-in into Django that locks the table
>> (if the database supports it)? IMHO functions like get_or_create()
>> should try to lock and unlock the table automatically.
>
> No, no, a thousand times no :)

I bet you mean that get_or_create shouldn't lock the table.
That's really not a good idea.

But what about a database independent command for locking the
table? I'm not sure if it's really worth, but it's certainly not
a "thousand times no", or did I miss anything? (And, by the way,

<commercial>
What about
SELECT ... FOR UPDATE
</commercial>)

Michael

James Bennett

unread,
Oct 17, 2006, 8:27:14 AM10/17/06
to django-d...@googlegroups.com
On 10/17/06, Michael Radziej <m...@noris.de> wrote:
> I bet you mean that get_or_create shouldn't lock the table.
> That's really not a good idea.

I meant Django should never implicitly/automatically lock a table; if
you decide you need to lock, you should have to explicitly call
something. I'm ambivalent on whether we should provide a syntax for a
"database independent" lock/unlock, but I think it could be
problematic to design -- for one thing, I'm not sure where it would
fit, and for another I'm not sure how best to implement it (since, for
example Postgres automatically releases the lock on COMMIT or
ROLLBACK, but MySQL requires you to explicitly UNLOCK)

Hawkeye

unread,
Oct 17, 2006, 8:56:48 AM10/17/06
to Django developers
Michael Radziej wrote:
> <commercial>
> What about
> SELECT ... FOR UPDATE
> </commercial>)

Michael, I know you're already aware of this (heck, you're CCed on the
ticket), but for others...

I created a patch in ticket http://code.djangoproject.com/ticket/2705
to allow a .for_update() modifier to be applied to a QuerySet, so you
can do:

something = models.Something.objects.filter(id=333).for_update().get()

This wouldn't solve the get_or_create() problem, but it is helpful for
a lot of other use cases.

--Ben

Reply all
Reply to author
Forward
0 new messages