Django in a Load-Balanced environment (ORM issues) ...

136 views
Skip to first unread message

ZebZiggle

unread,
Feb 19, 2006, 12:17:40 AM2/19/06
to Django users
I'm sure I must be missing something obvious here folks, but perhaps
someone could shed some light for me.

Could someone please point me to some material on using Django in a
load-balanced environment?

The specific problem I'm facing is model ORM objects going "stale" (the
in-memory representation is different than the database due to a change
from another machine/thread). Typically ORM layers deal with this by
supporting some form of Versioning on the records so the ORM can
freshen the cache automatically.

I find I can't keep around objects that may change and always have to
query for the latest version. This still doesn't solve the problem, but
just lessens the likelihood of it occurring. I know I'm going to get
biten by race conditions. This "stale-ness" will even happen with
concurrent sessions on the same web-server from what I can see.

For example, if I do this I seem to get in trouble:

# Assume objects[id].data = 99

obj1 = objects.get_object(id__exact = id)
obj2 = objects.get_object(id__exact = id)
obj1.data = 1
obj1.save()
assert (obj1.data == obj2.data) # fails

So, I seem to have to "freshen-up" any object I want to access anytime
I need to look at the member data and that's going to make the code
horrific and fragile (and remain broken).

Is there any documentation on best-practices or got-cha's for using
Django for these concurrent situations?

I've got to be missing something here, so I apologize in advance for
the silly question.

Thanks
-Zeb

ZebZiggle

unread,
Feb 27, 2006, 11:11:28 AM2/27/06
to Django users
Does anyone from the Django team have any thoughts on this problem?

Am I missing something here?

-Z

ZebZiggle

unread,
Feb 28, 2006, 9:14:54 AM2/28/06
to Django users
Here is a blog article about what core functionality an ORM should
support:

http://pythonnotes.blogspot.com/2004/09/python-orm-tools.html

Specifically, take note of the paragraph that reads:

To implement true object persistence, each object needs a unique
identifier. Each ORM will use a different approach here. However, one
issue that is common to all ORMs is how to handle multiple copies of
the same object. For the sake of consistence, objects should be
singletons. If the ORM tool doesn't do it automatically, then it's up
to the programmer to make sure that no multiple copies of the same
object exist at the same time, avoiding concurrency and consistency
issues.

Ijonas Kisselbach

unread,
Feb 28, 2006, 9:32:50 AM2/28/06
to django...@googlegroups.com
First of all, I'm not a Django Dev Team member, but have used ORM technology for years.

Singleton objects within an OM context do no scale across multiple machines/VMs. Its pipe dream material. I've saw a demo of this kind of stuff from an OODB vendor in the late nineties and it was impressive, but look how far OODBs have progressed (answer: not very).

Real-life ORM, typically provides different ways of handling stale, transient, and persistent objects. HIbernate, probably the most well-known ORM tool of today, supports some versioning features by adding "system columns" to tables to prevent old data clobbering fresh data.

What you're looking for is for the RDB  to push change events to an ORM-maintained cache in the application server layer, e.g. Django. Microsoft SQL Server 2005 is, AFAIK, the only RDBMS which dispatches these "cache update events" to the application layer. I don't know how successful this new technology is but its sounds expensive in terms of resources.

The problem you describe is a classic ORM, and typically you have to assume worst-case scenario and code around the problem by using transactions and refreshing your objects.

Have you considered Python's SQLObject library ? You might get better mileage.

Hope it helps,
Ijonas

hugo

unread,
Feb 28, 2006, 10:05:42 AM2/28/06
to Django users
>The specific problem I'm facing is model ORM objects going "stale" (the
>in-memory representation is different than the database due to a change
>from another machine/thread). Typically ORM layers deal with this by
>supporting some form of Versioning on the records so the ORM can
>freshen the cache automatically.

This is a standard problem of any database environment - as soon as you
pull stuff out of the database, it's not connected to it any more (at
least after you close the transaction it was connected to - PostreSQL
will guaranty the validity of selected data as long as the transaction
still is open). If your code is high on updates, you will have to
provide a viable solution for that yourself. For example you can do the
named versioning yourself by just adding a auto_now=True field to your
record and compare those timestamps (the auto_now=True fields are
automatically set on update time).

Of course you need to check for the timestamp from the database to make
sure that your cached object still is valid and yes, as long as Django
doesn't have transaction control, this is not 100% reliable. After
Django gets transaction control, you will be able to do the full
versioning thingy in your code and might even hook that up in the
manager objects that you use to access the database (read up on the
manager stuff in the magic-removal branch if you are interested).

bye, Georg

ZebZiggle

unread,
Feb 28, 2006, 2:27:59 PM2/28/06
to Django users
Thx for the feedback guys.

I agree that this problem is pretty fundamental. I don't know what
worries me more, the fact that the problem exists or the architects
don't see the problem.

I think in my case I can isolate the offending code and use some form
of "double-buffering" technique to solve it. But, it's not going to be
pretty.

I've considered SqlObjects, but then I would perhaps just cut over to
TurboGears. A colleague has recommended SqlAlchemy. I'll look at both.

Cheers,
Zeb

Max Battcher

unread,
Feb 28, 2006, 3:25:02 PM2/28/06
to django...@googlegroups.com
On 2/28/06, ZebZiggle <zebz...@yahoo.com> wrote:
> I agree that this problem is pretty fundamental. I don't know what
> worries me more, the fact that the problem exists or the architects
> don't see the problem.

I wouldn't put it that. It is a problem and things like transaction
support do show up on TODO lists, but right now it is simply not the
priority that you would like it to be. Keep in mind that Django
encompasses a lot of uses and as has been pointed out, for the
majority of uses (the majority of website types),
non-"last-write-wins" transaction support is not a priority. Feel
free to come up with your own solution and submit a patch if you'd
like to get it done sooner, or just wait for the TODO items to move up
in the priority list.

> I think in my case I can isolate the offending code and use some form
> of "double-buffering" technique to solve it. But, it's not going to be
> pretty.

I would again suggest you take a look at the magic-removal branch
<http://code.djangoproject.com/wiki/RemovingTheMagic>. You could
encapsulate such code in the actual save() call and fake the Django
ORM having that support, and then when Django does provide something
that you are happy with you can pull out your custom save() code and
the rest of your code probably won't notice.

You can even write your own object "Manager" to have fuller control of
object caching and to implement said save()-overload code for several
model classes.

--
--Max Battcher--
http://www.worldmaker.net/
All progress is based upon a universal innate desire on the part of
every organism to live beyond its income. --Samuel Butler

Adrian Holovaty

unread,
Feb 28, 2006, 3:36:41 PM2/28/06
to django...@googlegroups.com
On 2/28/06, ZebZiggle <zebz...@yahoo.com> wrote:

Max did a good job (in his e-mail) of explaining our thoughts on this.
We don't implement this level of super object-instance consistency at
this point because we aim for an 80% solution. The vast majority of
Web sites don't require it.

With that said, I think it would be a fantastic improvement to
Django's database layer. You could solve it in the interim by creating
a "status" field on your model, override save() so that it checks the
status field against the status field of the newest object in the
database, and throw an exception if the numbers aren't in sync. We
could roll this into Django proper in a generic way if there's a
demand.

Alternatively, SQLAlchemy solves this problem by implementing an
identity map, which is quite cool. You should be able to use
SQLAlchemy with the rest of Django with no problems at all -- the only
thing you won't be able to take advantage of is the admin framework.

Adrian

--
Adrian Holovaty
holovaty.com | djangoproject.com

ZebZiggle

unread,
Feb 28, 2006, 4:21:30 PM2/28/06
to Django users
Hey guys ... that's fantastic ... just the sort of response I was
hoping to see from the start of this conversation. Like they say in AA
"the first step is admitting you have a problem" ;-)

I will definitely explore all those options and make my contributions
to the project.

All the best,
Zeb

Peter Harkins

unread,
Feb 28, 2006, 4:40:23 PM2/28/06
to django...@googlegroups.com
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue, Feb 28, 2006 at 02:36:41PM -0600, Adrian Holovaty wrote:
> With that said, I think it would be a fantastic improvement to
> Django's database layer. You could solve it in the interim by creating
> a "status" field on your model, override save() so that it checks the
> status field against the status field of the newest object in the
> database, and throw an exception if the numbers aren't in sync. We
> could roll this into Django proper in a generic way if there's a
> demand.

I can see why folks would want this, but I'd like to toss in a vote
against it being on by default. It strikes me as spooky action at a
distance -- because another process possibly on another machine saved
the state of one of its objects, the objects in my current script start
tossing exceptions or quietly changing their state. This would be bad
for an I'm writing that has both short-lived pages and a longer-lived
process that wants to have possibly out-of-sync data to keep track of
state and state changes.

I also think it'd be really surprising (in a bad way) to newbies. It
feels a little like thread programming, where getting variables into a
known state requires some kind of locking.

Note that I'm not saying "don't do it" or "it's always bad", I'm just
asking "please don't do it by default". And to touch on an earlier
topic, I'm all for transaction support.


- --
Peter Harkins - http://push.cx


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: If you don't know what this is, it's OK to ignore it.

iD8DBQFEBMLJa6PWv6+ALKoRAjz0AJ9tT89VaTYly2ksc2OU1kSB6t3MTgCfaejL
a7dWe5ddbcECf6VG+hsXQmg=
=7mU0
-----END PGP SIGNATURE-----

Ivan Sagalaev

unread,
Mar 1, 2006, 12:42:11 AM3/1/06
to django...@googlegroups.com
Peter Harkins wrote:

>I can see why folks would want this, but I'd like to toss in a vote
>against it being on by default.
>

I just started to write the same thing :-)

> It strikes me as spooky action at a
>distance -- because another process possibly on another machine saved
>the state of one of its objects, the objects in my current script start
>tossing exceptions or quietly changing their state.
>

I know 3 common ways dealing with this problem:
- Last write wins (which we have now)
- Validating on save. You get an object, process it, try to save and -
bang - "data was modified by someone since you last read it, would you
like to loose your changes"? This is a cheap way but works only with
user interaction and this is unacceptable in most cases since users
usually don't like to debug you system.
- Locking for writing. If you plan to modify an object you lock it
exclusively and other processes wait for you. This affects performance
on checks "if locked" and on waits for update time that now includes all
the cycle read-process-update of the updater.

So the first option is not only viable it's actually better in many
cases (because it's fastest).

DavidA

unread,
Mar 1, 2006, 6:50:37 AM3/1/06
to Django users
I just want to echo Peter and Ivan's sentiments on NOT making this
default behavior. What attracted me to Django was a simple, fast,
elegant framework for quickly building common types of web
applications. I've wandered into the tarpits of J2EE in my past life
and I was looking for something at the other end of the
complexity/usability spectrum. Django is just that. I'd hate to see the
project bog down focusing on features that consume 80% of the dev
resources and are useful to only 20% of the community. And certainly if
this type of functionality is ever added it should be completely
separate and off by default.

Sometimes less is more.

Adrian Holovaty

unread,
Mar 1, 2006, 8:43:37 AM3/1/06
to django...@googlegroups.com
On 3/1/06, DavidA <david.av...@gmail.com> wrote:
> I just want to echo Peter and Ivan's sentiments on NOT making this
> default behavior.

Thanks for the feedback (to all of you)! Rest assured this will not be
made default behavior. Generally our philosophy in building Django is
to activate potentially complex/performance-heavy features only on an
opt-in basis.

Adrian

--
Adrian Holovaty
holovaty.com | djangoproject.com | chicagocrime.org

Leeuw van der, Tim

unread,
Mar 1, 2006, 8:46:49 AM3/1/06
to django...@googlegroups.com
They way I would enivsage adding this functionality to Django would be
that in your model you specify, for each Model object (=table), that it
is to be versioned. Versioning would automatically add a '_version'
column to the generated DDL, and the standard Django save() routines
would take care of atomically checking and incrementing the row-version
and throwing exceptions if updates fail because of concurrent changes.

This is similar to how Hibernate works.

Because developers have to manually add to their Model classes the
'versioned' attribute (in whatever form is appropriate) it automatically
implies that such feature is off by default.

I think it wouldn't even be very hard to do, but it would be best to do
so on the magic-removal branch and I have never looked at that yet.

The real tricky bit is to make it work together with the Forms and
Manipulators, since you need a way to store and retrieve the row-version
field to carry it over from the current HTTP request to the next. I
haven't yet figured out the most satisfying way to do that (hidden form
variables are rather ugly for this purpose, and inherently unsafe).


But I'm not one of the Django devs and I have no idea what thoughts they
might have on the implementation of row-versioning or row-locking
features.


Cheers,

--Tim

-----Original Message-----
From: django...@googlegroups.com
[mailto:django...@googlegroups.com] On Behalf Of DavidA
Sent: woensdag 1 maart 2006 12:51
To: Django users
Subject: Re: Django in a Load-Balanced environment (ORM issues) ...


I just want to echo Peter and Ivan's sentiments on NOT making this

ZebZiggle

unread,
Mar 1, 2006, 10:25:43 AM3/1/06
to Django users
That sounds like a great suggestion Tim. That's the way I would expect
it to work as well.

hugo

unread,
Mar 1, 2006, 11:34:41 AM3/1/06
to Django users
>They way I would enivsage adding this functionality to Django would be
>that in your model you specify, for each Model object (=table), that it
>is to be versioned. Versioning would automatically add a '_version'
>column to the generated DDL, and the standard Django save() routines
>would take care of atomically checking and incrementing the row-version
>and throwing exceptions if updates fail because of concurrent changes.

The right way to add this is to wait for magic-removal and then just
use a Mixin class to add that functionality to all models where you
need it. And in that case this mixin class might even be something we
would want to include into django.contrib somehow, so that it's a
readily available solution for others.

bye, Georg

Jonathan Ellis

unread,
Mar 2, 2006, 6:29:38 PM3/2/06
to django...@googlegroups.com
On 2/28/06, Adrian Holovaty <holo...@gmail.com> wrote:

Alternatively, SQLAlchemy solves this problem by implementing an
identity map, which is quite cool. You should be able to use
SQLAlchemy with the rest of Django with no problems at all -- the only
thing you won't be able to take advantage of is the admin framework.

That's a pretty big "only," isn't it? :)

(Mike's doing a fantastic job with SQLAlchemy.  I'd love to see a way to really integrate that into Django.)

--
Jonathan Ellis
http://spyced.blogspot.com

ChaosKCW

unread,
Mar 4, 2006, 5:56:37 PM3/4/06
to Django users
>They way I would enivsage adding this functionality to Django would be
>that in your model you specify, for each Model object (=table), that it
>is to be versioned. Versioning would automatically add a '_version'
>column to the generated DDL, and the standard Django save() routines
>would take care of atomically checking and incrementing the row-version
>and throwing exceptions if updates fail because of concurrent changes.

Another appraoch is not to change the Schema at all. Unneccessary and
unrelated fields in the Schema is messy. Instead you can use
'Optimisitic Locking'. All you need to do to perform optimistic locking
is add every column you are changing to the where with the old value.
The only requirement is to keep the old value. A version might save on
memory somewhat, but optimistic locking has he advantage of not needing
to alter the schema.

So if you load a model up e.g

Model.ID = 123
Model.Field1 = 'ABC'

and then you change it:

Model.Field1 = 'DEF'
Model.OldValues['Field1'] = 'ABC'

When you save the object you add the old value to the where clause

update model.tablename set Field1 = 'DEF' where ID = 123 and Field1 =
'ABC';

This achieves the same effect as a version. An exception can be thrown
if the update fails, or handled in any number of ways.

Reply all
Reply to author
Forward
0 new messages