>
> Hi,
>
> I am trying to use the query caching solution described here:
> http://svn.sqlalchemy.org/sqlalchemy/trunk/examples/query_caching/per_session.py
>
> In most cases it works, the returned records are cached, I store them
> in a LRU cache modeled after http://code.activestate.com/recipes/498245/
>
> However, when I run a long running operation, which operates on
> hundreds of other records, apparently the garbage collection is run on
> the session's weak-referencing identity map. The cache keeps the
> returned records, but other eagerly loaded related instances of the
> returned records are lost.
that's not possible, since those records are strongly referenced by
their cached parents (see below), which in turn are strongly
referenced by the cache. The GC will only collect items that have
a strong reference count of zero.
> The ORM issues queries to load them again
> from the database. I understand that there are no strong references
> between an instance and other related instances.
Assuming you are not using "dynamic" loaders, that's not correct.
When a collection or attribute is "eagerly loaded", it is placed
within the parent's __dict__ during the load operation. Similarly
for "lazy loads", once the lazily loading attribute is referenced, the
then loaded collection or attribute is placed in the parent's __dict__.
In this case it sounds more like the parents have collections or
attributes which are to be lazily loaded upon first access, so the
link between parent/child hasn't yet been established. In the case
of a many-to-one, you can say "parent.child" and not see any SQL, even
though the "lazy load" operation was invoked, because a simple
identity lookup is performed in the session.
>
> What is the best solution to keep related instances in a session?
sounds like you need to ensure your eager loads are working properly
(or another use my other suggestion below).
>
> If I create a session with weak_identity_map=False, then during my
> long running operation I will run out of memory, unless I expunge
> unused records, however, it is easy to miss one record and the
> identity map will be growing anyway.
I recommend against using that option since we're trying to decide if
we should just drop it across the board, its pretty legacy.
>
> Is there possible to get a list of referenced instances of another
> instance, so that I could store the list together with the instance in
> the MRU cache?
What I usually do when i want to ensure what gets cached (since im
usually serializing into memcached), and i dont want to worry what the
particular eager loading configuration is, is to make a method like
"full_load()" which ensures all the important attributes and
collections are present. this will issue lazy loads for anything that
wasn't already loaded:
def full_load(self):
self.collection1
self.some_reference
return self
However, if you are truly eager loading all of those attributes then
this step is unnecessary.
>
>
> After some tests with weak_identity_map=False see that the source of
> the ORM queries for attributes is not related to garbage collection.
> They aren't issued if query caching is off, so the problem is related
> to caching.
>
> When I write:
>
> theme = session.query(Theme).cache_key(....).get(..) # gets from cache
> print theme.id # here the 'id' attribute access issues a new SELECT
> When the cached object is accessed first, no queries are issued, then
> I do a lot of other record creation and flushing (I'm assigning this
> 'theme' to that records' attributes), then the cache is accessed again
> and this is the moment when the unwanted queries are issued. It seems
> that my record creation and flushing between cache calls makes that
> the ORM expires the cached object. However, the 'theme' object is
> itself never modified.
>
> What might be important, they queries aren't issued if the whole long-
> running operation is run inside a big transaction.
>
> My caching query implementation is here: http://pastebin.com/mf738cc5
> I am using a Session with autoflush=False, autocommit=True.
>
> Do you know what might cause that behavior?
that behavior occurs if "theme" has been expired. this is
"expire_on_commit" which you should probably disable, since the fact
that you are "caching" the objects means you aren't concerned about
new state coming in from transactions external to your application
thread. http://www.sqlalchemy.org/docs/05/session.html#committing
in fact if you load collection1 eagerly, and some_reference is a many to
one within that collection, referencing "some_reference" will pull the
existing object from the identity map and no SQL is emitted. So the
feature is already "there" from a performance standpoint.