Dangers of setting a session's expire_on_commit=False?

4,908 views
Skip to first unread message

Paul Molodowitch

unread,
Jul 25, 2014, 6:23:05 PM7/25/14
to sqlal...@googlegroups.com
Hi all - so I'm thinking of disabling the expire_on_commit property for my default sessionmaker object, and I was wondering what the potential issues with this were.  Is it simply that the next access of the data on it could be using out-of-date information?  Don't objects potentially have this problem anyway, in the sense that if they are accessed TWICE after a commit, the second access will use the data cached from the first, and could again be out of date?

To give some background - we're in the middle of converting an existing codebase to use sqlalchemy, and there are number of classes that act both as database wrappers, AND "data structures."  That is, when first constructed, they are populated with data from the database; but from then on out, they just keep the cached data.  So they would behave similarly to ORM-mapped objects, if expire_on_commit is False.  The thinking here is that for most of these classes, the data changes fairly infrequently, and it's not catastrophic if it's somewhat out of date. Also we don't want to keep hitting the database more than necessary...  and, finally, we might need to have access to the cached data for a long time (ie, as long as the user has a ui window open).

Michael Bayer

unread,
Jul 25, 2014, 6:55:58 PM7/25/14
to sqlal...@googlegroups.com
On Jul 25, 2014, at 6:22 PM, Paul Molodowitch <elro...@gmail.com> wrote:

Hi all - so I'm thinking of disabling the expire_on_commit property for my default sessionmaker object, and I was wondering what the potential issues with this were.  Is it simply that the next access of the data on it could be using out-of-date information?  

pretty much, yup


Don't objects potentially have this problem anyway, in the sense that if they are accessed TWICE after a commit, the second access will use the data cached from the first, and could again be out of date?

only if you have a low transaction isolation level set up.   The Session tries to make a choice as to the most reasonable place that concurrent changes should be anticipated.  Transaction demarcations are the best place.  If you are expecting to code your app to specifically expect READ COMMITTED or READ UNCOMMITTED behavior where your transaction relies upon seeing a row change value from a concurrent transaction, that’s a special use case, in which case you can use expire() for those object that have this requirement.   The ORM Session can obviously not guess when such an expiration is to be detected otherwise.



To give some background - we're in the middle of converting an existing codebase to use sqlalchemy, and there are number of classes that act both as database wrappers, AND "data structures."  That is, when first constructed, they are populated with data from the database; but from then on out, they just keep the cached data.  So they would behave similarly to ORM-mapped objects, if expire_on_commit is False.  The thinking here is that for most of these classes, the data changes fairly infrequently, and it's not catastrophic if it's somewhat out of date. Also we don't want to keep hitting the database more than necessary...  and, finally, we might need to have access to the cached data for a long time (ie, as long as the user has a ui window open).

--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Claudio Freire

unread,
Jul 25, 2014, 7:16:10 PM7/25/14
to sqlal...@googlegroups.com
On Fri, Jul 25, 2014 at 7:55 PM, Michael Bayer <mik...@zzzcomputing.com> wrote:
> Don't objects potentially have this problem anyway, in the sense that if
> they are accessed TWICE after a commit, the second access will use the data
> cached from the first, and could again be out of date?
>
>
> only if you have a low transaction isolation level set up. The Session
> tries to make a choice as to the most reasonable place that concurrent
> changes should be anticipated. Transaction demarcations are the best place.
> If you are expecting to code your app to specifically expect READ COMMITTED
> or READ UNCOMMITTED behavior where your transaction relies upon seeing a row
> change value from a concurrent transaction, that’s a special use case, in
> which case you can use expire() for those object that have this requirement.
> The ORM Session can obviously not guess when such an expiration is to be
> detected otherwise.


I don't see how transaction isolation levels relate to this.

The effect of disabling expire_on_commit is that of not seeing
subsequent commits. It would be a fictious DO NOT READ COMMITTED
level. Having it on, somewhat caters to possible SERIALIZED settings,
where strict ordering is to be expected, since without serialized
transactions there's no way expiring helps correctness in any way.
None of those seem overly common to me, so I don't see how one can
ignore the serialization level in effect or possible concurrent
updates that are happening at the same time, with or without
expire_on_commit.

IMHO, expire_on_commit is something that really has no sensible
default. You pick your own, the library authors pick one default
because, well, why not?

For the record, I'm using expire_on_commit=off, because I also use
model instances outside the scope of their originating transaction.
I've had no problems with it, but I did have to be very careful with
the semantics and lifetime of those objects, and of expiring manually
anything I put on concurrently-accessed structures, lest someone
modify it before the session's scope is over and it gets (or tries to
be) committed to the DB.

Michael Bayer

unread,
Jul 25, 2014, 9:09:39 PM7/25/14
to sqlal...@googlegroups.com

On Jul 25, 2014, at 7:16 PM, Claudio Freire <klauss...@gmail.com> wrote:

> On Fri, Jul 25, 2014 at 7:55 PM, Michael Bayer <mik...@zzzcomputing.com> wrote:
>> Don't objects potentially have this problem anyway, in the sense that if
>> they are accessed TWICE after a commit, the second access will use the data
>> cached from the first, and could again be out of date?
>>
>>
>> only if you have a low transaction isolation level set up. The Session
>> tries to make a choice as to the most reasonable place that concurrent
>> changes should be anticipated. Transaction demarcations are the best place.
>> If you are expecting to code your app to specifically expect READ COMMITTED
>> or READ UNCOMMITTED behavior where your transaction relies upon seeing a row
>> change value from a concurrent transaction, that’s a special use case, in
>> which case you can use expire() for those object that have this requirement.
>> The ORM Session can obviously not guess when such an expiration is to be
>> detected otherwise.
>
>
> I don't see how transaction isolation levels relate to this.
>
> The effect of disabling expire_on_commit is that of not seeing
> subsequent commits. It would be a fictious DO NOT READ COMMITTED
> level. Having it on, somewhat caters to possible SERIALIZED settings,
> where strict ordering is to be expected, since without serialized
> transactions there's no way expiring helps correctness in any way.
> None of those seem overly common to me, so I don't see how one can
> ignore the serialization level in effect or possible concurrent
> updates that are happening at the same time, with or without
> expire_on_commit.

yes, it caters the most to SERIALIZED settings, a little bit less so to REPEATABLE READ (works for individual objects but not collections), then still less to READ COMMITTED (works only to the extent that you’re worried about other transactions in progress), etc.


>
> IMHO, expire_on_commit is something that really has no sensible
> default. You pick your own, the library authors pick one default
> because, well, why not?

So, for a long time, all through 0.4, the default was, “never, and not even possible”. There was no expire on commit, at that time I thought it was insane to throw away all that great data that you’ve loaded unless you absolutely want to.

As it turns out the current defaults are not by accident! We had a pretty steady parade of users who complained that their data was stale, and for years I scratched my head, how are we do to this? just blow away all objects all the time on every query? that seemed so wrong, so wasteful, and of course so complicated since we would want pending changes to remain around.

As I’ve written many times, it was the Storm ORM that introduced me to the concept of “expire on commit”. The linkage to the transaction is also kind of where Hibernate and JSR 220 is coming from, though not necessarily wiping out the object on commit…the spec doesn’t make that aspect very clear. Overall the “expire on commit” idea is very strict and assumes entities are row proxies only.


> For the record, I'm using expire_on_commit=off, because I also use
> model instances outside the scope of their originating transaction.
> I've had no problems with it, but I did have to be very careful with
> the semantics and lifetime of those objects, and of expiring manually
> anything I put on concurrently-accessed structures, lest someone
> modify it before the session's scope is over and it gets (or tries to
> be) committed to the DB.

Maybe “expire on begin” might be useful. So your data is still there after commit, but if you start a new transaction, then things refresh. I’m up for it in a 1.0 release, if you think it’s useful. Though explaining to everyone another option…what a PITA….



>

Claudio Freire

unread,
Jul 25, 2014, 10:05:08 PM7/25/14
to sqlal...@googlegroups.com
> Maybe "expire on begin" might be useful. So your data is still there after commit, but if you start a new transaction, then things refresh. I'm up for it in a 1.0 release, if you think it's useful. Though explaining to everyone another option...what a PITA....

I don't see how that'd work.

On session.add? When set as dependent of an object of another (active) session?

Seems very error prone to make it too automatic.

All in all, the current state isn't bad. It's understandable, and controllable.

I'm just saying there's no one-size-fits-all.

Jonathan Vanasco

unread,
Jul 26, 2014, 4:06:39 PM7/26/14
to sqlal...@googlegroups.com
Paul, I've avoided your approach in the past. 

I wrestled with similar needs a while back, and decided that it was too messy.  I couldn't figure out a way to make SqlAlchemy objects "read only", to ensure they're never updated or written/associated.  In your application code, you're never really guaranteed to know when an object is a real ORM object or a detached/expired item.  It adds a level of complexity and uncertainty that I'm just uncomfortable with.

I'm fine with (potentially) stale data for reading and display, but dislike it for updates. 

i ended up using a handful of techniques to achieve some similar needs:

• data is loaded via the ORM and then translated into data structures in a 'constants' namespace; application code (even ORM objects) deal with that data

• a loose "CachingApi" with objects that are modeled after SqlAlchemy.  objects are dicts that allow dotted-attribute notation for keys.  column values are raw keys in the dicts, associations are properties for memoized & lazyloaded cache-gets to another object.

• a secondary session for long-lived read-only objects.  this session has read-only database permissions, so accidental writes/updates aren't a problem.


I looked into extending SqlAlchemy like this, but couldn't find non-messy ways to do it, and gave up :

• Extend objects with a "read-only" mode; they raise errors when an attribute/association is changed.
• Add a `do_not_expire=[objects,attributes]` or `expire=boolean` option to `session.commit()`, or 'unexpire()' to objects; this would make them "read-only" like above.    The practical use is that I often need object data for logging or success messaging. 

Anyways, something that I would definitely recommend for your situation (if you're not doing it already) is to add a 'revision_id' integer to every table (or a last-modified timestamp).  Require update / delete operations to factor in the revision identifier, and raise an error if they don't match.   You want to avoid this scenario:

00:00 - App Instance 1 - Loads Foo. Caches Foo.bar as "A"
00:05 - App Instance 2 - Loads Foo.  Updates Foo.bar to "B"
00:10 - App Instance 1 - Loads Foo.  Updates Foo.bar from "A" to "C"




 
Reply all
Reply to author
Forward
0 new messages