Query caching and garbage collection of related instances

33 views
Skip to first unread message

Adam Dziendziel

unread,
Nov 7, 2009, 6:03:18 PM11/7/09
to sqlalchemy
Hi,

I am trying to use the query caching solution described here:
http://svn.sqlalchemy.org/sqlalchemy/trunk/examples/query_caching/per_session.py

In most cases it works, the returned records are cached, I store them
in a LRU cache modeled after http://code.activestate.com/recipes/498245/

However, when I run a long running operation, which operates on
hundreds of other records, apparently the garbage collection is run on
the session's weak-referencing identity map. The cache keeps the
returned records, but other eagerly loaded related instances of the
returned records are lost. The ORM issues queries to load them again
from the database. I understand that there are no strong references
between an instance and other related instances.

What is the best solution to keep related instances in a session?

If I create a session with weak_identity_map=False, then during my
long running operation I will run out of memory, unless I expunge
unused records, however, it is easy to miss one record and the
identity map will be growing anyway.

Is there possible to get a list of referenced instances of another
instance, so that I could store the list together with the instance in
the MRU cache? Or to make a session with a strong-referencing map and
LRU policy that keeps it below a given size?


Regards,
Adam

Adam Dziendziel

unread,
Nov 8, 2009, 6:21:00 AM11/8/09
to sqlalchemy

After some tests with weak_identity_map=False see that the source of
the ORM queries for attributes is not related to garbage collection.
They aren't issued if query caching is off, so the problem is related
to caching.

When I write:

theme = session.query(Theme).cache_key(....).get(..) # gets from cache
print theme.id # here the 'id' attribute access issues a new SELECT

When the cached object is accessed first, no queries are issued, then
I do a lot of other record creation and flushing (I'm assigning this
'theme' to that records' attributes), then the cache is accessed again
and this is the moment when the unwanted queries are issued. It seems
that my record creation and flushing between cache calls makes that
the ORM expires the cached object. However, the 'theme' object is
itself never modified.

What might be important, they queries aren't issued if the whole long-
running operation is run inside a big transaction.

My caching query implementation is here: http://pastebin.com/mf738cc5
I am using a Session with autoflush=False, autocommit=True.

Do you know what might cause that behavior?

Thanks,
Adam

On 8 Lis, 00:03, Adam Dziendziel <adam.dziendz...@gmail.com> wrote:
> Hi,
>
> I am trying to use the query caching solution described here:http://svn.sqlalchemy.org/sqlalchemy/trunk/examples/query_caching/per...
>
> In most cases it works, the returned records are cached, I store them
> in a LRU cache modeled afterhttp://code.activestate.com/recipes/498245/

Michael Bayer

unread,
Nov 8, 2009, 2:29:24 PM11/8/09
to sqlal...@googlegroups.com

On Nov 7, 2009, at 6:03 PM, Adam Dziendziel wrote:

>
> Hi,
>
> I am trying to use the query caching solution described here:
> http://svn.sqlalchemy.org/sqlalchemy/trunk/examples/query_caching/per_session.py
>
> In most cases it works, the returned records are cached, I store them
> in a LRU cache modeled after http://code.activestate.com/recipes/498245/
>
> However, when I run a long running operation, which operates on
> hundreds of other records, apparently the garbage collection is run on
> the session's weak-referencing identity map. The cache keeps the
> returned records, but other eagerly loaded related instances of the
> returned records are lost.

that's not possible, since those records are strongly referenced by
their cached parents (see below), which in turn are strongly
referenced by the cache. The GC will only collect items that have
a strong reference count of zero.


> The ORM issues queries to load them again
> from the database. I understand that there are no strong references
> between an instance and other related instances.

Assuming you are not using "dynamic" loaders, that's not correct.
When a collection or attribute is "eagerly loaded", it is placed
within the parent's __dict__ during the load operation. Similarly
for "lazy loads", once the lazily loading attribute is referenced, the
then loaded collection or attribute is placed in the parent's __dict__.

In this case it sounds more like the parents have collections or
attributes which are to be lazily loaded upon first access, so the
link between parent/child hasn't yet been established. In the case
of a many-to-one, you can say "parent.child" and not see any SQL, even
though the "lazy load" operation was invoked, because a simple
identity lookup is performed in the session.


>
> What is the best solution to keep related instances in a session?

sounds like you need to ensure your eager loads are working properly
(or another use my other suggestion below).

>
> If I create a session with weak_identity_map=False, then during my
> long running operation I will run out of memory, unless I expunge
> unused records, however, it is easy to miss one record and the
> identity map will be growing anyway.

I recommend against using that option since we're trying to decide if
we should just drop it across the board, its pretty legacy.

>
> Is there possible to get a list of referenced instances of another
> instance, so that I could store the list together with the instance in
> the MRU cache?

What I usually do when i want to ensure what gets cached (since im
usually serializing into memcached), and i dont want to worry what the
particular eager loading configuration is, is to make a method like
"full_load()" which ensures all the important attributes and
collections are present. this will issue lazy loads for anything that
wasn't already loaded:

def full_load(self):
self.collection1
self.some_reference
return self

However, if you are truly eager loading all of those attributes then
this step is unnecessary.


Michael Bayer

unread,
Nov 8, 2009, 2:32:53 PM11/8/09
to sqlal...@googlegroups.com

On Nov 8, 2009, at 6:21 AM, Adam Dziendziel wrote:

>
>
> After some tests with weak_identity_map=False see that the source of
> the ORM queries for attributes is not related to garbage collection.
> They aren't issued if query caching is off, so the problem is related
> to caching.
>
> When I write:
>
> theme = session.query(Theme).cache_key(....).get(..) # gets from cache
> print theme.id # here the 'id' attribute access issues a new SELECT

> When the cached object is accessed first, no queries are issued, then
> I do a lot of other record creation and flushing (I'm assigning this
> 'theme' to that records' attributes), then the cache is accessed again
> and this is the moment when the unwanted queries are issued. It seems
> that my record creation and flushing between cache calls makes that
> the ORM expires the cached object. However, the 'theme' object is
> itself never modified.
>
> What might be important, they queries aren't issued if the whole long-
> running operation is run inside a big transaction.
>
> My caching query implementation is here: http://pastebin.com/mf738cc5
> I am using a Session with autoflush=False, autocommit=True.
>
> Do you know what might cause that behavior?

that behavior occurs if "theme" has been expired. this is
"expire_on_commit" which you should probably disable, since the fact
that you are "caching" the objects means you aren't concerned about
new state coming in from transactions external to your application
thread. http://www.sqlalchemy.org/docs/05/session.html#committing

Adam Dziendziel

unread,
Nov 10, 2009, 9:39:21 AM11/10/09
to sqlalchemy
Indeed, that solved the problem. Thank you for help!

Cheers,
Adam

Adam Dziendziel

unread,
Nov 10, 2009, 9:52:41 AM11/10/09
to sqlalchemy
By the way, is this possible to tell SQLAlchemy to load two lazy
attributes together, using a single query i.e. like it does with
eagerly-loaded attributes, but on demand?
If collection1 and some_reference points to different rows in the
database, the code:
self.collection1
self.some_reference
would issue two SELECTs, two round-trips to the database.

When I know that I need these two, I would write instead something
like:
orm.attributes.load(self, ['collection1', 'some_reference'])

Regards,
Adam

Michael Bayer

unread,
Nov 10, 2009, 11:30:23 AM11/10/09
to sqlal...@googlegroups.com
Adam Dziendziel wrote:
>
> By the way, is this possible to tell SQLAlchemy to load two lazy
> attributes together, using a single query i.e. like it does with
> eagerly-loaded attributes, but on demand?
> If collection1 and some_reference points to different rows in the
> database, the code:
> self.collection1
> self.some_reference
> would issue two SELECTs, two round-trips to the database.
>
> When I know that I need these two, I would write instead something
> like:
> orm.attributes.load(self, ['collection1', 'some_reference'])
>

in fact if you load collection1 eagerly, and some_reference is a many to
one within that collection, referencing "some_reference" will pull the
existing object from the identity map and no SQL is emitted. So the
feature is already "there" from a performance standpoint.

Reply all
Reply to author
Forward
0 new messages