Could someone please point me to some material on using Django in a
load-balanced environment?
The specific problem I'm facing is model ORM objects going "stale" (the
in-memory representation is different than the database due to a change
from another machine/thread). Typically ORM layers deal with this by
supporting some form of Versioning on the records so the ORM can
freshen the cache automatically.
I find I can't keep around objects that may change and always have to
query for the latest version. This still doesn't solve the problem, but
just lessens the likelihood of it occurring. I know I'm going to get
biten by race conditions. This "stale-ness" will even happen with
concurrent sessions on the same web-server from what I can see.
For example, if I do this I seem to get in trouble:
# Assume objects[id].data = 99
obj1 = objects.get_object(id__exact = id)
obj2 = objects.get_object(id__exact = id)
obj1.data = 1
obj1.save()
assert (obj1.data == obj2.data) # fails
So, I seem to have to "freshen-up" any object I want to access anytime
I need to look at the member data and that's going to make the code
horrific and fragile (and remain broken).
Is there any documentation on best-practices or got-cha's for using
Django for these concurrent situations?
I've got to be missing something here, so I apologize in advance for
the silly question.
Thanks
-Zeb
Am I missing something here?
-Z
http://pythonnotes.blogspot.com/2004/09/python-orm-tools.html
Specifically, take note of the paragraph that reads:
To implement true object persistence, each object needs a unique
identifier. Each ORM will use a different approach here. However, one
issue that is common to all ORMs is how to handle multiple copies of
the same object. For the sake of consistence, objects should be
singletons. If the ORM tool doesn't do it automatically, then it's up
to the programmer to make sure that no multiple copies of the same
object exist at the same time, avoiding concurrency and consistency
issues.
This is a standard problem of any database environment - as soon as you
pull stuff out of the database, it's not connected to it any more (at
least after you close the transaction it was connected to - PostreSQL
will guaranty the validity of selected data as long as the transaction
still is open). If your code is high on updates, you will have to
provide a viable solution for that yourself. For example you can do the
named versioning yourself by just adding a auto_now=True field to your
record and compare those timestamps (the auto_now=True fields are
automatically set on update time).
Of course you need to check for the timestamp from the database to make
sure that your cached object still is valid and yes, as long as Django
doesn't have transaction control, this is not 100% reliable. After
Django gets transaction control, you will be able to do the full
versioning thingy in your code and might even hook that up in the
manager objects that you use to access the database (read up on the
manager stuff in the magic-removal branch if you are interested).
bye, Georg
I agree that this problem is pretty fundamental. I don't know what
worries me more, the fact that the problem exists or the architects
don't see the problem.
I think in my case I can isolate the offending code and use some form
of "double-buffering" technique to solve it. But, it's not going to be
pretty.
I've considered SqlObjects, but then I would perhaps just cut over to
TurboGears. A colleague has recommended SqlAlchemy. I'll look at both.
Cheers,
Zeb
I wouldn't put it that. It is a problem and things like transaction
support do show up on TODO lists, but right now it is simply not the
priority that you would like it to be. Keep in mind that Django
encompasses a lot of uses and as has been pointed out, for the
majority of uses (the majority of website types),
non-"last-write-wins" transaction support is not a priority. Feel
free to come up with your own solution and submit a patch if you'd
like to get it done sooner, or just wait for the TODO items to move up
in the priority list.
> I think in my case I can isolate the offending code and use some form
> of "double-buffering" technique to solve it. But, it's not going to be
> pretty.
I would again suggest you take a look at the magic-removal branch
<http://code.djangoproject.com/wiki/RemovingTheMagic>. You could
encapsulate such code in the actual save() call and fake the Django
ORM having that support, and then when Django does provide something
that you are happy with you can pull out your custom save() code and
the rest of your code probably won't notice.
You can even write your own object "Manager" to have fuller control of
object caching and to implement said save()-overload code for several
model classes.
--
--Max Battcher--
http://www.worldmaker.net/
All progress is based upon a universal innate desire on the part of
every organism to live beyond its income. --Samuel Butler
Max did a good job (in his e-mail) of explaining our thoughts on this.
We don't implement this level of super object-instance consistency at
this point because we aim for an 80% solution. The vast majority of
Web sites don't require it.
With that said, I think it would be a fantastic improvement to
Django's database layer. You could solve it in the interim by creating
a "status" field on your model, override save() so that it checks the
status field against the status field of the newest object in the
database, and throw an exception if the numbers aren't in sync. We
could roll this into Django proper in a generic way if there's a
demand.
Alternatively, SQLAlchemy solves this problem by implementing an
identity map, which is quite cool. You should be able to use
SQLAlchemy with the rest of Django with no problems at all -- the only
thing you won't be able to take advantage of is the admin framework.
Adrian
--
Adrian Holovaty
holovaty.com | djangoproject.com
I will definitely explore all those options and make my contributions
to the project.
All the best,
Zeb
On Tue, Feb 28, 2006 at 02:36:41PM -0600, Adrian Holovaty wrote:
> With that said, I think it would be a fantastic improvement to
> Django's database layer. You could solve it in the interim by creating
> a "status" field on your model, override save() so that it checks the
> status field against the status field of the newest object in the
> database, and throw an exception if the numbers aren't in sync. We
> could roll this into Django proper in a generic way if there's a
> demand.
I can see why folks would want this, but I'd like to toss in a vote
against it being on by default. It strikes me as spooky action at a
distance -- because another process possibly on another machine saved
the state of one of its objects, the objects in my current script start
tossing exceptions or quietly changing their state. This would be bad
for an I'm writing that has both short-lived pages and a longer-lived
process that wants to have possibly out-of-sync data to keep track of
state and state changes.
I also think it'd be really surprising (in a bad way) to newbies. It
feels a little like thread programming, where getting variables into a
known state requires some kind of locking.
Note that I'm not saying "don't do it" or "it's always bad", I'm just
asking "please don't do it by default". And to touch on an earlier
topic, I'm all for transaction support.
- --
Peter Harkins - http://push.cx
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: If you don't know what this is, it's OK to ignore it.
iD8DBQFEBMLJa6PWv6+ALKoRAjz0AJ9tT89VaTYly2ksc2OU1kSB6t3MTgCfaejL
a7dWe5ddbcECf6VG+hsXQmg=
=7mU0
-----END PGP SIGNATURE-----
>I can see why folks would want this, but I'd like to toss in a vote
>against it being on by default.
>
I just started to write the same thing :-)
> It strikes me as spooky action at a
>distance -- because another process possibly on another machine saved
>the state of one of its objects, the objects in my current script start
>tossing exceptions or quietly changing their state.
>
I know 3 common ways dealing with this problem:
- Last write wins (which we have now)
- Validating on save. You get an object, process it, try to save and -
bang - "data was modified by someone since you last read it, would you
like to loose your changes"? This is a cheap way but works only with
user interaction and this is unacceptable in most cases since users
usually don't like to debug you system.
- Locking for writing. If you plan to modify an object you lock it
exclusively and other processes wait for you. This affects performance
on checks "if locked" and on waits for update time that now includes all
the cycle read-process-update of the updater.
So the first option is not only viable it's actually better in many
cases (because it's fastest).
Sometimes less is more.
Thanks for the feedback (to all of you)! Rest assured this will not be
made default behavior. Generally our philosophy in building Django is
to activate potentially complex/performance-heavy features only on an
opt-in basis.
Adrian
--
Adrian Holovaty
holovaty.com | djangoproject.com | chicagocrime.org
This is similar to how Hibernate works.
Because developers have to manually add to their Model classes the
'versioned' attribute (in whatever form is appropriate) it automatically
implies that such feature is off by default.
I think it wouldn't even be very hard to do, but it would be best to do
so on the magic-removal branch and I have never looked at that yet.
The real tricky bit is to make it work together with the Forms and
Manipulators, since you need a way to store and retrieve the row-version
field to carry it over from the current HTTP request to the next. I
haven't yet figured out the most satisfying way to do that (hidden form
variables are rather ugly for this purpose, and inherently unsafe).
But I'm not one of the Django devs and I have no idea what thoughts they
might have on the implementation of row-versioning or row-locking
features.
Cheers,
--Tim
-----Original Message-----
From: django...@googlegroups.com
[mailto:django...@googlegroups.com] On Behalf Of DavidA
Sent: woensdag 1 maart 2006 12:51
To: Django users
Subject: Re: Django in a Load-Balanced environment (ORM issues) ...
I just want to echo Peter and Ivan's sentiments on NOT making this
The right way to add this is to wait for magic-removal and then just
use a Mixin class to add that functionality to all models where you
need it. And in that case this mixin class might even be something we
would want to include into django.contrib somehow, so that it's a
readily available solution for others.
bye, Georg
Alternatively, SQLAlchemy solves this problem by implementing an
identity map, which is quite cool. You should be able to use
SQLAlchemy with the rest of Django with no problems at all -- the only
thing you won't be able to take advantage of is the admin framework.
Another appraoch is not to change the Schema at all. Unneccessary and
unrelated fields in the Schema is messy. Instead you can use
'Optimisitic Locking'. All you need to do to perform optimistic locking
is add every column you are changing to the where with the old value.
The only requirement is to keep the old value. A version might save on
memory somewhat, but optimistic locking has he advantage of not needing
to alter the schema.
So if you load a model up e.g
Model.ID = 123
Model.Field1 = 'ABC'
and then you change it:
Model.Field1 = 'DEF'
Model.OldValues['Field1'] = 'ABC'
When you save the object you add the old value to the where clause
update model.tablename set Field1 = 'DEF' where ID = 123 and Field1 =
'ABC';
This achieves the same effect as a version. An exception can be thrown
if the update fails, or handled in any number of ways.