Best practice for immutable object structures

262 views
Skip to first unread message

Nebur

unread,
Jun 2, 2008, 3:07:08 AM6/2/08
to sqlalchemy
Our web app needs to use an "immutable" database object structure,
i.e.some mapped classes resp.tables will change with administrative
runs only. These objects mostly are referenced by other (mutable)
persistant objects.
The structure fits well into memory and can remain in each process for
its lifetime.

Loading these objects with any new or clear()ed session would
drastically impact performance. Even more important, these immutable
objects should be easily useable as dict keys, requiring them to
reliable keep their identity. (These objects are nice candidates for
memcached but this is not about instance identity.)

I'm currently using this recipe:
Provide a MapperExtension with create_instance and maintain a custom
cache for the immutable objects (additionally to the short-living
identity maps of the sessions).

class UniqueByPrikeyMapper(MapperExtension):
def create_instance(self, mapper, selectcontext, row, class_):
key = row[class_.c.id] # there's a primary key "id" always
try:
return get_instance_from_somewhere(class_, key=key)
except KeyError: # instance not in the cache: put it there
res = class_.__new__(class_)
register_instance_somewhere(class_,
key=row[instance.c.id])
return res

Now, any persistant instance gets cached fine when loaded. But freshly
created objects need a separate treatment.
Since we use the auto generated "id" as key, any new instance needs to
be "manually" put into the cache after is was flush()ed. I've fiddled
with a SessionExtension/after_flush(), where the new instances can be
found in flush_context.uow.new. This could be a place to put new
instances into the cache althought is seems somewhat hackish.
Alternatively, we can re-load any instance after flush() to let
create_instance() run, and throw away the original reference.
An attempt to use populate_instance() for caching instances there (and
only there) failed. populate_instance() is *not* run when the auto-
generated id is populated after flush(). Required would be the event
semantics "always called when instance got sync'ed with the DB".


Someone has a better practice / comments / can see shortcomings ?

Regards,
Ruben

SA Version: 0.4.x
Related Links:
a recipe for unique objects, but about instance creation, not about
instance loading:
http://www.sqlalchemy.org/trac/wiki/UsageRecipes/UniqueObject
query caching thread but about performance with complex joins:
http://groups.google.de/group/sqlalchemy/browse_thread/thread/e7294628673aad12/6b65276bcd9e8e5a?lnk=gst&q=create_instance#

a...@svilendobrev.com

unread,
Jun 2, 2008, 9:19:25 AM6/2/08
to sqlal...@googlegroups.com
how would u assure that noone changes these objects?
it should be considered a programatical error, so throwing an
exception might be appropriate. i have such concept in dbcook, and it
is implemented (so far) via an assert in
MapperExtension.before_insert().

u may think of such "defensive programming" approach.
ciao
svilen

Michael Bayer

unread,
Jun 2, 2008, 11:04:35 AM6/2/08
to sqlal...@googlegroups.com

On Jun 2, 2008, at 3:07 AM, Nebur wrote:

>
> Loading these objects with any new or clear()ed session would
> drastically impact performance.

im assuming the performance increase is due to the reduced overhead in
creating objects ? From what I can see, you are still issuing SQL,
still processing result rows, still populating state in the given
instances....all of which is very time consuming. Plus, your scheme
actually wont even save object creation overhead since you can't
exactly do it that way...

>
> I'm currently using this recipe:
> Provide a MapperExtension with create_instance and maintain a custom
> cache for the immutable objects (additionally to the short-living
> identity maps of the sessions).
>
> class UniqueByPrikeyMapper(MapperExtension):
> def create_instance(self, mapper, selectcontext, row, class_):
> key = row[class_.c.id] # there's a primary key "id" always
> try:
> return get_instance_from_somewhere(class_, key=key)
> except KeyError: # instance not in the cache: put it there
> res = class_.__new__(class_)
> register_instance_somewhere(class_,
> key=row[instance.c.id])
> return res
>
> Now, any persistant instance gets cached fine when loaded. But freshly
> created objects need a separate treatment.

this wont work out of the box because a particular object instance can
only be in one Session at a time. For this kind of use case we
provide a method called "merge()" which takes a dont_load=True
argument to create local copies of objects in sessions.

>
> Since we use the auto generated "id" as key, any new instance needs to
> be "manually" put into the cache after is was flush()ed. I've fiddled
> with a SessionExtension/after_flush(), where the new instances can be
> found in flush_context.uow.new. This could be a place to put new
> instances into the cache althought is seems somewhat hackish.

why not session.new ? "context.uow.new" is gone in 0.5.

> Alternatively, we can re-load any instance after flush() to let
> create_instance() run, and throw away the original reference.
> An attempt to use populate_instance() for caching instances there (and
> only there) failed. populate_instance() is *not* run when the auto-
> generated id is populated after flush().

after_flush() is a decent place to do things like this (or even
after_insert()). populate_instance() corresponds to object state
loaded from the database but it would be wasteful and complex to rely
upon an expunge-reload scheme just for state management.

> Required would be the event
> semantics "always called when instance got sync'ed with the DB".

that is the populate_instance() method on the load side. On the save
side it's after_insert()/after_update().

> Someone has a better practice / comments / can see shortcomings ?

what is the specific overhead you are looking to reduce ? I don't yet
see much savings here (well, I don't see any, but its only 10:30 AM
for me). An ORM-external approach could reduce SQL and result
processing overhead a lot more vastly.


Nebur

unread,
Jun 2, 2008, 11:37:01 AM6/2/08
to sqlalchemy


On 2 Jun., 15:19, a...@svilendobrev.com wrote:
> how would u assure that noone changes these objects?
> it should be considered a programatical error, so throwing an
> exception might be appropriate. i have such concept in dbcook, and it
> is implemented (so far) via an assert in
> MapperExtension.before_insert().

That's right (althought another topic). It is surely good style to
assure this.

Nebur

unread,
Jun 2, 2008, 11:55:03 AM6/2/08
to sqlalchemy

> im assuming the performance increase is due to the reduced overhead in
> creating objects ? From what I can see, you are still issuing SQL,
> still processing result rows, still populating state in the given
> instances....all of which is very time consuming. Plus, your scheme
> actually wont even save object creation overhead since you can't
> exactly do it that way...

That's right: The queries are the same when no futher optimization is
applied. Object creation surely is very fast when compared with
queries.
But there have been two intentions: Performance, and object identity.
Performance is not significantly improved by saving object creation.
But the results of some performance critical methods are cached, thus
saving a huge amount of queries. This caching in turn relies on the
constant object identities and would be tricky if the objects would
change their identity.
So, the performance improvement is an indirect effect. The object
identity simplifies (at least) this task.


> this wont work out of the box because a particular object instance can
> only be in one Session at a time. For this kind of use case we
> provide a method called "merge()" which takes a dont_load=True
> argument to create local copies of objects in sessions.

Yes, the "custom identity map" cannot be a sessions identity map, it
is a plain dict. Otherwise, the objects indeed would have to be merged
in any new session. But copies of the objects, anyway, are not desired
since identity is to be kept.
>
>
>
> why not session.new ? "context.uow.new" is gone in 0.5.
Thanks. ouw.new indeed did not look adequate.

Nebur

unread,
Jun 2, 2008, 12:05:20 PM6/2/08
to sqlalchemy

> is implemented (so far) via an assert in
> MapperExtension.before_insert().
And before_update() and before_delete(), probably?
Indeed, when the recipe is applied, ensuring that nobody does such
changes is mandatory.

Michael Bayer

unread,
Jun 2, 2008, 12:22:22 PM6/2/08
to sqlal...@googlegroups.com

On Jun 2, 2008, at 11:55 AM, Nebur wrote:

>
>> this wont work out of the box because a particular object instance
>> can
>> only be in one Session at a time. For this kind of use case we
>> provide a method called "merge()" which takes a dont_load=True
>> argument to create local copies of objects in sessions.
>
> Yes, the "custom identity map" cannot be a sessions identity map, it
> is a plain dict. Otherwise, the objects indeed would have to be merged
> in any new session. But copies of the objects, anyway, are not desired
> since identity is to be kept.
>>

so....since these objects are returned by create_instance(), that
means they are getting sent straight into a Session. I'm assuming you
didn't write your own Session, so what happens when the
populate_instance() step gets run on the objects , and several
concurrent threads are all issuing populate_instance() on those items
at the same time ? There is no guarantee within the load of objects
that there are no state changes - including internal state variables
like "state.runid" which definitely will not work with concurrent
modifications. Similarly, I dont see how very basic functions
necessary for SQLA's operation, such as object_session(), can possibly
work correctly here, since you are attempting to place the same object
in multiple sessions - an object's session is identified by a single
attribute placed upon the object's state (in 0.4 its on the object
itself).

Unless your app is using only one Session. Then it could work,
however it would be extremely difficult to take advantage of multiple
threads since you'd have to mutex virtually all access to that single
Session.

If there is some central core of "state" that you dont want to
replicate, it's still possible to have many copies of an object all
reference that same state using a proxying pattern. As far as hash
identity, the __hash__() and __cmp__() methods work fine in that
regard.

a...@svilendobrev.com

unread,
Jun 2, 2008, 11:16:32 AM6/2/08
to sqlal...@googlegroups.com
On Monday 02 June 2008 19:05:20 Nebur wrote:
> > is implemented (so far) via an assert in
> > MapperExtension.before_insert().
>
> And before_update() and before_delete(), probably?
in your case yes. in my case these objects should not have instances
nor be in a db at all (e.g. non-leafs in a class-hierarchy tree).

a...@svilendobrev.com

unread,
Jun 2, 2008, 11:22:06 AM6/2/08
to sqlal...@googlegroups.com
if i get it right, u make some "map" in memory of that terra incognita
of those untouchable readonly objects, and then use it ?
so why not just build all that _once_ into some structure of non-DB
objs, and then throw away the DB-related? or maybe even incrementaly?

Nebur

unread,
Jun 2, 2008, 5:05:18 PM6/2/08
to sqlalchemy


On 2 Jun., 18:22, Michael Bayer <mike...@zzzcomputing.com> wrote:
> On Jun 2, 2008, at 11:55 AM, Nebur wrote:
>
>
> so....since these objects are returned by create_instance(), that
> means they are getting sent straight into a Session. I'm assuming you
> didn't write your own Session, so what happens when the
> populate_instance() step gets run on the objects , and several
> concurrent threads are all issuing populate_instance() on those items

There are more places where multiple threads would fail. E.g. caching
the instances.
The concerned web app will not use multiple threads (but single-
threaded processes only.) So, a threadsafe pattern was simply not
intended.
I should have placed this prerequisite prominently above the recipe.

> modifications. Similarly, I dont see how very basic functions
> necessary for SQLA's operation, such as object_session(), can possibly
> work correctly here, since you are attempting to place the same object
> in multiple sessions - an object's session is identified by a single
> attribute placed upon the object's state (in 0.4 its on the object
> itself).
>
> Unless your app is using only one Session. Then it could work,

It's ensured that there won't be two or more sessions same time.
(Beside that, I do _not_ want to enforce a session to live long; I
think that's not the intention of an ORM session.
I've assumed that generally, an application better should not depend
on very long-living sessions, althought it might be technically
possible. So, the sessions will change but strictly sequentially and
not overlapping.)

> If there is some central core of "state" that you dont want to
> replicate, it's still possible to have many copies of an object all
> reference that same state using a proxying pattern. As far as hash
> identity, the __hash__() and __cmp__() methods work fine in that
> regard.

Yes, another approach would be accepting copies of the immutable
objects with appropriate hash values.
The question was: which version does represent the "real world" best ?
I think in the case of immutable (readonly) objects, representing
these as long-living instances does look adequate. These objects
would change states between persistant and detached when bound and
unbound with subsequent sessions.
Of course, what I want to clearify is: Are long-living objects a
design "against the ORM" ? Is the subsequent change between persistent
and detached state an "anti pattern" (even if we can make it work) ?
Obviously, there's nothing wrong with detached instances that get
persistent again, as the docs say. But here, I'm aware that the
objects to not take the usual "session.update()" way.
If this is against the ORM intention, or in some unspecified "grey
area", I'd clearly withdraw the pattern. Otherwise, i'd prefer the
pattern since it seems to picture the "real world" well (and even does
substantiate this by simplfying tasks in different places.)

Nebur

unread,
Jun 2, 2008, 5:31:58 PM6/2/08
to sqlalchemy


On 2 Jun., 17:22, a...@svilendobrev.com wrote:
> if i get it right, u make some "map" in memory of that terra incognita
> of those untouchable readonly objects, and then use it ?
> so why not just build all that _once_ into some structure of non-DB
> objs, and then throw away the DB-related? or maybe even incrementaly?
>
The readonly objects are heavily referred by mutable ones (which are
written in 1..n transactions per web request). So, things get simple
when the mutable ones can directly reference the readonly ones (as
mapped attributes.) Using a structure of non-DB (e.g. not even
detached) objects would be possible but more elaborate (Well, I'm
trying to find out whether the approach above bears even more
complexity...)
Now I need to follow my concentration that already has gone
sleeping :-) Good night ...
Reply all
Reply to author
Forward
0 new messages