Caching

Anton V. Belyaev

unread,

Dec 19, 2007, 5:19:43 PM12/19/07

to sqlalchemy

Hello,

Several people already wrote something about memcached + SqlAlchemy.

Remember, Mike Nelson wrote a mapper extention, it is available at:
http://www.ajaxlive.com/repo/mcmapper.py
http://www.ajaxlive.com/repo/mcache.py

I've rewritten it a bit to fit 0.4 release of SA.

Any response and comments are welcome, since I am not sure I am doing
right things in the code :) I dont like that dirty tricks with
deleting _state, etc. Maybe it could be done better?

But it works somehow. It manages to cache query get operations.

It has some problems with deferred fetch on inherited mapper because
of some issues of SA (I've found them in Trac).

import memcache as mc

class MCachedMapper(MapperExtension):
def get(self, query, ident, *args, **kwargs):
key = query.mapper.identity_key_from_primary_key(ident)
obj = query.session.identity_map.get(key)
if not obj:
mkey = gen_cache_key(key)
log.debug("Checking cache for %s", mkey)
obj = mc.get(mkey)
if obj is not None:
obj.__dict__["_state"] = InstanceState(obj)
obj.__dict__["_entity_name"] = None
log.debug("Found in cache for %s : %s", mkey, obj)
query.session.update(obj)
else:
obj = query._get(key, ident, **kwargs)
if obj is None:
return None
_state = obj._state
del obj.__dict__["_state"]
del obj.__dict__["_entity_name"]
mc.set(mkey, obj)
obj.__dict__["_state"] = _state
obj.__dict__["_entity_name"] = None
return obj

def before_update(self, mapper, connection, instance):
mkey =
gen_cache_key(mapper.identity_key_from_instance(instance))
log.debug("Clearing cache for %s because of update", mkey)
mc.delete(mkey)
return EXT_PASS

def before_delete(self, mapper, connection, instance):
mkey =
gen_cache_key(mapper.identity_key_from_instance(instance))
log.debug("Clearing cache for %s because of delete", mkey)
mc.delete(mkey)
return EXT_PASS

The mapper can be used like this:

mapper(User, users_table, extension=MCachedMapper())
session = create_session()
user_1234 = session.query(User).get(1234) # this one loads from the DB
session.clear()
user_1234 = session.query(User).get(1234) # this one fetches from
Memcached

Michael Bayer

unread,

Dec 19, 2007, 9:41:34 PM12/19/07

to sqlal...@googlegroups.com

On Dec 19, 2007, at 5:19 PM, Anton V. Belyaev wrote:

>
> Hello,
>
> Several people already wrote something about memcached + SqlAlchemy.
>
> Remember, Mike Nelson wrote a mapper extention, it is available at:
> http://www.ajaxlive.com/repo/mcmapper.py
> http://www.ajaxlive.com/repo/mcache.py
>
> I've rewritten it a bit to fit 0.4 release of SA.
>
> Any response and comments are welcome, since I am not sure I am doing
> right things in the code :) I dont like that dirty tricks with
> deleting _state, etc. Maybe it could be done better?

what happens if you just leave "_state" alone ? there shouldnt be any
need to mess with _state (nor _entity_name). the only attribute
worth deleting for the cache operation is "_sa_session_id" so that the
instance isnt associated with any particular session when it gets
cached. Id also consider using session.merge(dont_load=True) which is
designed for use with caches (and also watch out for that log.debug(),
debug() calls using the standard logging module are notoriously slow).

> It has some problems with deferred fetch on inherited mapper because
> of some issues of SA (I've found them in Trac).

the only trac ticket for this is #490, which with our current
extension architecture is pretty easy to fix so its resolved in 3967 -
MapperExtensions are now fully inherited. If you apply the same
MapperExtension explicitly to a base mapper and a subclass mapper,
using the same ME instance will have the effect of it being applied
only once (and using two different ME instances will have the effect
of both being applied to the subclass separately).

Anton V. Belyaev

unread,

Dec 20, 2007, 5:06:49 AM12/20/07

to sqlalchemy

Mike, thanks for your reply.

> what happens if you just leave "_state" alone ? there shouldnt be any
> need to mess with _state (nor _entity_name). the only attribute
> worth deleting for the cache operation is "_sa_session_id" so that the
> instance isnt associated with any particular session when it gets
> cached. Id also consider using session.merge(dont_load=True) which is
> designed for use with caches (and also watch out for that log.debug(),
> debug() calls using the standard logging module are notoriously slow).

The reason for deleting _state is to save some space in cache. I save
instances to cache on "get" operation, so they are unmodified. But, of
course, it is internal thing so the final decision is yours :)

I gave up trying merge(dont_load=True) after running this sample:

users = Table('users', metadata,
Column('id', Integer, primary_key=True),
Column('name', String(100)),
Column('surname', String(100)))

mapper(User, users,
properties={
'surname': deferred(users.c.surname)
})

s = create_session()
u = User()
u.name = 'anton'
u.surname = 'belyaev'
s.save(u)
s.flush()

# now we need an instance with not loaded surname (because it is
deferred)
s = create_session()
u = s.query(User).get(1)

# cache it
cache = pickle.dumps(u)

# try to restore in a new session
s = create_session()
u = pickle.loads(cache)
u = s.merge(u, dont_load=True)

The latest statement fails with:

File "/home/anton/eggs/lib/python2.5/site-packages/SQLAlchemy-0.4.1-
py2.5.egg/sqlalchemy/orm/session.py", line 1136, in object_session
if obj in sess:
TypeError: argument of type 'NoneType' is not iterable

Some notes on this test case:
1) If "surname" was a simple (not deferred) column property, merge
would work fine.
2) session.update(u) instead of merge would work fine even with
deferred column property, and the property itself would work fine (it
would load on first reference).

> the only trac ticket for this is #490, which with our current
> extension architecture is pretty easy to fix so its resolved in 3967 -
> MapperExtensions are now fully inherited. If you apply the same
> MapperExtension explicitly to a base mapper and a subclass mapper,
> using the same ME instance will have the effect of it being applied
> only once (and using two different ME instances will have the effect
> of both being applied to the subclass separately).

I meant #870 (sorry, I should had provided reference in the first
message).

Back again to the testcase and note 2:

If, let say, I had some inheritance:

class Teacher(User):
pass

with polymorphic_fetch='deferred' (this is important), even
session.update(u) would not work. Because in this case deferred
attributes work through callables in _state, and callable does not
survive pickling.

Thanks.

Michael Bayer

unread,

Dec 20, 2007, 10:51:49 AM12/20/07

to sqlal...@googlegroups.com

pickle isnt going to work with deferred columns unless you implement
__getstate__ and __setstate__. so the issue with session.merge() is
just an extension of that issue, correct ? i.e. without deferreds
merge has no issue.

is it not reasonable to ask that objects which are to be serialized
and cached not have any deferred columns ? (or they are explicitly
loaded before caching )?

Anton V. Belyaev

unread,

Dec 20, 2007, 1:04:00 PM12/20/07

to sqlalchemy

Sorry, I dont understand clearly.

I do understand that pickle saves only __dict__ when no __getstate__
defined.

So, to be cached, an object should fetch all its deferred columns (if
any) and provide all of them at __getstate__. Right?

And if an instance from cache has nothing for one of its deferred
column values, then referencing these properties after merge wont load
them from DB, but just fail?

Thanks.

Michael Bayer

unread,

Dec 20, 2007, 3:57:50 PM12/20/07

to sqlal...@googlegroups.com

On Dec 20, 2007, at 1:04 PM, Anton V. Belyaev wrote:

> So, to be cached, an object should fetch all its deferred columns (if
> any) and provide all of them at __getstate__. Right?

that would work.

> And if an instance from cache has nothing for one of its deferred
> column values, then referencing these properties after merge wont load
> them from DB, but just fail?

as far as merge failing, I need to see what the exact mechanics of
that error message are. For a polymorphic "deferred" in particular,
its a major chunk of an object's state that is deferred, i.e.
everything corresponding to the joined tables, and the callables are
currently established at the per-instance level. So it may be
necessary now for merge to still "fail" if unloadable deferreds are
detected, although we can and should provide a nicer error message.

Some longer term solutions to the "pickled" issue include trying to be
more aggressive about placing class-level attribute loaders which dont
need to be serialized, placing "hints" in the _state which could help
the _state to reconstruct the per-instance deferred callables, or we
might even be able to get the _state to call deferreds during
serialization without the need for an explicit __getstate__, but then
you are caching all that additional state.

Michael Bayer

unread,

Dec 21, 2007, 2:15:33 AM12/21/07

to sqlal...@googlegroups.com

On Dec 20, 2007, at 1:04 PM, Anton V. Belyaev wrote:

>
> And if an instance from cache has nothing for one of its deferred
> column values, then referencing these properties after merge wont load
> them from DB, but just fail?
>

I rearranged instance-level deferred loaders to be serializable
instances in r3968. you can now pickle an instance + its _state and
restore, and all deferred/lazy loaders will be restored as well. I
didnt yet test it specifically with merge() but give it a try, you
shoudnt be getting that error anymore...the pickling issue from ticket
#870 is also no longer present.

Anton V. Belyaev

unread,

Dec 21, 2007, 3:54:14 PM12/21/07

to sqlalchemy

> I rearranged instance-level deferred loaders to be serializable
> instances in r3968. you can now pickle an instance + its _state and
> restore, and all deferred/lazy loaders will be restored as well. I
> didnt yet test it specifically with merge() but give it a try, you
> shoudnt be getting that error anymore...the pickling issue from ticket
> #870 is also no longer present.

Unfortunately it does not work (I am now at r3973).

1) I created an object with deferred property (not None).
2) Reloaded it in a new session (to "erase" deferred property)
3) Pickled/Unpickled
4) Removed everything but properties and _state.
5) obj = s.merge(obj, dont_load=True) (with a fresh session s)
6) obj.deferred_ppty => None

merge worked without an exception this time.

Thanks.

PS. Special thanks for #871 (overheads in backref). It was blocking
the full-featured use of SqlAlchemy while staying as efficient as raw
SQL for me :)

Michael Bayer

unread,

Dec 21, 2007, 4:42:35 PM12/21/07

to sqlal...@googlegroups.com

On Dec 21, 2007, at 3:54 PM, Anton V. Belyaev wrote:
>
> 1) I created an object with deferred property (not None).
> 2) Reloaded it in a new session (to "erase" deferred property)
> 3) Pickled/Unpickled
> 4) Removed everything but properties and _state.

what did you remove exactly ? there are some attributes on the
instance, such as _instance_key and _entity_name, which should not be
erased. also any attribute which doesnt have a deferred or expired
flag on it shouldnt be erased either. if you want to remove
attributes, use session.expire(instance, ['key1', 'key2', ...]). a
test script illustrating pickling/unpickling, which uses update(), is
attached.

>
> 5) obj = s.merge(obj, dont_load=True) (with a fresh session s)

merge is still not working, it raises an exception in this case. will
have a fix soon.

test.py

Michael Bayer

unread,

Dec 21, 2007, 4:58:40 PM12/21/07

to sqlal...@googlegroups.com

On Dec 21, 2007, at 3:54 PM, Anton V. Belyaev wrote:

>
> merge worked without an exception this time.

merge is working rudimentally for objects with unloaded scalar/
instance/collection attributes in r3974. whats not yet happening is
the merging of the various query.options() that may be present on the
original deferred loader, which means the merged instance wont
necessarily maintain the exact eager/lazy/deferred loading of the
original, but this is not especially critical for the basic idea to
work.

example script using merge attached.

test.py

Anton V. Belyaev

unread,

Dec 22, 2007, 5:25:57 PM12/22/07

to sqlalchemy

> merge is working rudimentally for objects with unloaded scalar/
> instance/collection attributes in r3974. whats not yet happening is
> the merging of the various query.options() that may be present on the
> original deferred loader, which means the merged instance wont
> necessarily maintain the exact eager/lazy/deferred loading of the
> original, but this is not especially critical for the basic idea to
> work.
>
> example script using merge attached.

Michael, thanks a lot for your support!

Reply all

Reply to author

Forward