before_expunge or after_expunge

14 visningar
Hoppa till det första olästa meddelandet

Kent

oläst,
15 dec. 2011 11:46:072011-12-15
till sqlalchemy
I notice no such API events as before_expunge() or after_expunge().
Hopefully without taking much of your time, can you point me in any
direction of how I might go about being notified when an object is
expunged?

Why?

There are certain states a persistent record reaches where further
changes to it are forbidden and it becomes a "historical" record, for
example, for accounting purposes. For example, consider a closed
order, shipped, paid for, done. There are times, however, when I want
to re-run calculations on such objects to calculate a delta for
finances, etc. Suppose a discount were retroactively applied to this
order, such that you want to use existing code to rerun tax
calculations, etc for the closed order and inspect all the changes
without danger of it being persisted to the database.

I am heading down the road of expunging objects that have reached this
"archived, historical" state so that I can still use the object for
all it's calculations and be assured that no changes will be persisted
to the database. (Another thought I had was using transient objects,
but I think the end result is similar)

My concern is that once such an object has been expunged, I don't want
any relationship or code or anything automatically reloading or
refreshing it from the database with session.query().get. My plan was
to record expunged objects on the session object so I could look them
up during reconstruct_instance() and raise an error if the same record
is being loaded from the database that I've already expunged.

Do you understand where I am headed and can you think of a better
mechanism to deal with what I'm trying to accomplish?

Thanks!
Kent

Michael Bayer

oläst,
15 dec. 2011 12:20:482011-12-15
till sqlal...@googlegroups.com

On Dec 15, 2011, at 11:46 AM, Kent wrote:

> I notice no such API events as before_expunge() or after_expunge().
> Hopefully without taking much of your time, can you point me in any
> direction of how I might go about being notified when an object is
> expunged?
>
> Why?

If we start adding events like that, it suggests we'd be adding a few dozen more hooks for before_/after_ pretty much every operation in the Session, essentially decorating every method - before_add(), after_add(), before_close(), after_close(), before_expire(), after_expire(), before_delete(), after_delete() etc etc right ? This would greatly clutter and clog the Session object and add latency. They'd also be difficult to define as there are cascades which take place - do the event hooks accommodate just the call of the method itself or also all the objects reached by a cascade ?

The existing events are specifically to intercept places in the code that otherwise can't be intercepted - that is, deep within a series of steps. Intercepting public method calls is just as easily done with custom Session subclass, or using wrapper code within your own application.

>
> There are certain states a persistent record reaches where further
> changes to it are forbidden and it becomes a "historical" record, for
> example, for accounting purposes. For example, consider a closed
> order, shipped, paid for, done. There are times, however, when I want
> to re-run calculations on such objects to calculate a delta for
> finances, etc. Suppose a discount were retroactively applied to this
> order, such that you want to use existing code to rerun tax
> calculations, etc for the closed order and inspect all the changes
> without danger of it being persisted to the database.
>
> I am heading down the road of expunging objects that have reached this
> "archived, historical" state so that I can still use the object for
> all it's calculations and be assured that no changes will be persisted
> to the database. (Another thought I had was using transient objects,
> but I think the end result is similar)

why not use a before_flush() extension, or similarly a before_insert/before_update/before_delete extension, and simply reject flush() calls which include such objects in the list of those to be modified ? The fact is they still represent database data and will work a lot better if they have access to their persistent state.

>
> My concern is that once such an object has been expunged, I don't want
> any relationship or code or anything automatically reloading or
> refreshing it from the database with session.query().get.

you mean, because that raises an error on a detached , or because the objects represent in-memory-only state that should no longer reflect database state ?

> My plan was
> to record expunged objects on the session object so I could look them
> up during reconstruct_instance() and raise an error if the same record
> is being loaded from the database that I've already expunged.

if you are going that approach I would just use a library call within your own app called "mark_as_historical()", which does the expunge() and also adds the element to the collection in question. This is functionality in addition to what expunge() does.

Kent

oläst,
15 dec. 2011 13:16:152011-12-15
till sqlal...@googlegroups.com
On 12/15/2011 12:20 PM, Michael Bayer wrote:
> On Dec 15, 2011, at 11:46 AM, Kent wrote:
>
>> I notice no such API events as before_expunge() or after_expunge().
>> Hopefully without taking much of your time, can you point me in any
>> direction of how I might go about being notified when an object is
>> expunged?
>>
>> Why?

(By the way, I didn't mean "Why didn't you add these?" I meant "Why?,
You might ask... well, I'll tell you why...")

> If we start adding events like that, it suggests we'd be adding a few dozen more hooks for before_/after_ pretty much every operation in the Session, essentially decorating every method - before_add(), after_add(), before_close(), after_close(), before_expire(), after_expire(), before_delete(), after_delete() etc etc right ? This would greatly clutter and clog the Session object and add latency. They'd also be difficult to define as there are cascades which take place - do the event hooks accommodate just the call of the method itself or also all the objects reached by a cascade ?
>
> The existing events are specifically to intercept places in the code that otherwise can't be intercepted - that is, deep within a series of steps. Intercepting public method calls is just as easily done with custom Session subclass, or using wrapper code within your own application.

I didn't know I could specify a subclassed Session class, but I see the
"class_" parameter to sessionmaker() now, that might be helpful.

>> There are certain states a persistent record reaches where further
>> changes to it are forbidden and it becomes a "historical" record, for
>> example, for accounting purposes. For example, consider a closed
>> order, shipped, paid for, done. There are times, however, when I want
>> to re-run calculations on such objects to calculate a delta for
>> finances, etc. Suppose a discount were retroactively applied to this
>> order, such that you want to use existing code to rerun tax
>> calculations, etc for the closed order and inspect all the changes
>> without danger of it being persisted to the database.
>>
>> I am heading down the road of expunging objects that have reached this
>> "archived, historical" state so that I can still use the object for
>> all it's calculations and be assured that no changes will be persisted
>> to the database. (Another thought I had was using transient objects,
>> but I think the end result is similar)
> why not use a before_flush() extension, or similarly a before_insert/before_update/before_delete extension, and simply reject flush() calls which include such objects in the list of those to be modified ? The fact is they still represent database data and will work a lot better if they have access to their persistent state.

While I like the sound of this idea, it doesn't deal very well with the
notion of being able to expunge(order) and have everything under its
expunge cascade tree automatically excluded, does it? I would need
numerous backrefs to reach the "parent" so I could determine status, and
I'd need to implement this for all related classes along the expunge
path. I've also found in the past that some situations of referencing
an unloaded relationship from within before_update, etc. have caused me
grief.

If I head this direction, how do I "reject the flush()" for certain
instances?

>> My concern is that once such an object has been expunged, I don't want
>> any relationship or code or anything automatically reloading or
>> refreshing it from the database with session.query().get.
> you mean, because that raises an error on a detached , or because the objects represent in-memory-only state that should no longer reflect database state ?

A while ago (with your help), I already dealt with loading relationships
for transient/expunged based on my use case of always a single session,
which made that framework change workable. So the reason for my concern
is the latter, because it represents a state in memory only. I admit,
leaving the object in the session solves this problem, but creates a new
one where I can't easily utilize the expunge cascade. If I leave it in
the session, I think it also creates a different problem: that I'd need
to worry about a different query overwriting (refreshing) my object with
the database contents.


>> My plan was
>> to record expunged objects on the session object so I could look them
>> up during reconstruct_instance() and raise an error if the same record
>> is being loaded from the database that I've already expunged.
> if you are going that approach I would just use a library call within your own app called "mark_as_historical()", which does the expunge() and also adds the element to the collection in question. This is functionality in addition to what expunge() does.

I had that thought as well, but it doesn't deal with cascaded expunges.
I could subclass session for this in a non public API way. But I'm
becoming fond of your idea of leaving the objects in the session as you
suggested with before_flush() or _update()... any thoughts on how I
could work needing to avoid backrefs for every relationship in the
expunge cascade? Are there reasons one need to avoid referencing
unloaded relationships from within before_upate()? (I can't recall the
exact problem I've had in the past with that at the moment.)

Which does your gut tell you is "safer": the expunged approach or the
leave in session but prevent writing changes approach?

Thanks very much!
(For what its worth, I've made appeals to my company on more than one
occasion about donating to your project, but with little success so far
(I think because we can't afford to pay all our current employees quite
yet), but we're growing and we'll get there... I would love to have our
company donate to the project, you've always been so helpful. For now, a
sincere "thank you" will have to suffice.. sorry.)


Michael Bayer

oläst,
15 dec. 2011 13:31:552011-12-15
till sqlal...@googlegroups.com

On Dec 15, 2011, at 1:16 PM, Kent wrote:

>> why not use a before_flush() extension, or similarly a before_insert/before_update/before_delete extension, and simply reject flush() calls which include such objects in the list of those to be modified ? The fact is they still represent database data and will work a lot better if they have access to their persistent state.
>
> While I like the sound of this idea, it doesn't deal very well with the notion of being able to expunge(order) and have everything under its expunge cascade tree automatically excluded, does it?

I'm suggesting in that particular case that you aren't expunging - your object model would include features such that a flag like "historical" be present on those objects which fall under this category - or some kind of event model outside of SQLAlchemy. If that needs to cascade among collections, you'd have to build that too. This is an object model state/validation issue, not a persistence issue.

> If I head this direction, how do I "reject the flush()" for certain instances?

You'd iterate through session.dirty inside of before_flush(), check for the correct state, otherwise raise an exception.

>
>>> My concern is that once such an object has been expunged, I don't want
>>> any relationship or code or anything automatically reloading or
>>> refreshing it from the database with session.query().get.
>> you mean, because that raises an error on a detached , or because the objects represent in-memory-only state that should no longer reflect database state ?
> A while ago (with your help), I already dealt with loading relationships for transient/expunged based on my use case of always a single session, which made that framework change workable. So the reason for my concern is the latter, because it represents a state in memory only. I admit, leaving the object in the session solves this problem, but creates a new one where I can't easily utilize the expunge cascade. If I leave it in the session, I think it also creates a different problem: that I'd need to worry about a different query overwriting (refreshing) my object with the database contents.

OK so if these objects truly become an in-memory state not represented in the DB then yes you'd want to expunge them. In the session means the object is proxied to an actual row in the DB. Out of it, means it is not.

>>> s being loaded from the database that I've already expunged.
>> if you are going that approach I would just use a library call within your own app called "mark_as_historical()", which does the expunge() and also adds the element to the collection in question. This is functionality in addition to what expunge() does.
> I had that thought as well, but it doesn't deal with cascaded expunges.

I might cascade the operation myself then.....it seems like you have business logic that requires traversal. Piggybacking off persistence operations that happen to do the same kind of traversal (for now) is brittle.

> I could subclass session for this in a non public API way. But I'm becoming fond of your idea of leaving the objects in the session as you suggested with before_flush() or _update()... any thoughts on how I could work needing to avoid backrefs for every relationship in the expunge cascade? Are there reasons one need to avoid referencing unloaded relationships from within before_upate()? (I can't recall the exact problem I've had in the past with that at the moment.)

yeah I'm not sure what you mean here, I'd have to look this up in the huge catalog of problems you've had with backrefs.... :)

>
> Which does your gut tell you is "safer": the expunged approach or the leave in session but prevent writing changes approach?

if they really aren't rows anymore then I'd expunge, but id consider this a separate concern versus management of application state.

>
> (For what its worth, I've made appeals to my company on more than one occasion about donating to your project, but with little success so far (I think because we can't afford to pay all our current employees quite yet), but we're growing and we'll get there... I would love to have our company donate to the project, you've always been so helpful. For now, a sincere "thank you" will have to suffice.. sorry.)

You know most donations I get are from people I've never heard of. I get donations from active mailing list folks too but not as much, but don't worry just the back and forth on all the backref/Oracle tickets we've worked on have been a great contribution from you, leading to a better product and therefore more contributions from other folks who didn't have to go through all that!


Kent

oläst,
15 dec. 2011 14:13:532011-12-15
till sqlal...@googlegroups.com
On 12/15/2011 1:31 PM, Michael Bayer wrote:
> Are there reasons one need to avoid referencing unloaded relationships from within before_upate()? (I can't recall the exact problem I've had in the past with that at the moment.)

Thanks for all the input. Regarding the issue of referencing a
relationship from within before_update()/before_insert(), etc. I looked
up my previous problem and it is related to this code from strategies.py:

class LazyLoader()
...
def _load_for_state(self, state, passive):
...
if self.use_get:
if session._flushing:
get_attr =
instance_mapper._get_committed_state_attr_by_column
else:
get_attr = instance_mapper._get_state_attr_by_column

If we are in the middle of a flush, we don't attempt a query but rather
use the committed state. This caused me problems when the committed
state of the relationship was "None" (for example, not loaded yet?)

For my own understanding, what implications does this have? It kind of
freaked me out because I interpreted that as meaning "I'm not sure if I
can rely upon unloaded relationships at all from within these mid-flush
event methods..." because I certainly don't want to be using stale, old
data. Is there a threat of this, or only of it sometimes being "None"?


Michael Bayer

oläst,
15 dec. 2011 14:39:162011-12-15
till sqlal...@googlegroups.com

so this is we're inside of a flush, and suppose we need to get at someobject.some_related specifically to target what's present in the database - if you've changed the value of someobject.some_related_id, that isn't flushed yet - we need some_related relative to the some_related_id that was loaded from the database.

There's still potentially a query run to get at some_related, just the foreign key value we use from the parent is different - it's looking at the persistent value.

To get more specific, comment out the conditional and just make it do the non-committed, then run the orm unit tests. The tests that fail will be the ones that illustrate the use case.


Kent

oläst,
15 dec. 2011 17:05:482011-12-15
till sqlalchemy

On Dec 15, 2:39 pm, Michael Bayer <mike...@zzzcomputing.com> wrote:
> On Dec 15, 2011, at 2:13 PM, Kent wrote:
> > On 12/15/2011 1:31 PM, Michael Bayer wrote:
> >> Are there reasons one need to avoid referencing unloaded relationships from within before_upate()?  (I can't recall the exact problem I've had in the past with that at the moment.)
>
> > Thanks for all the input.  Regarding the issue of referencing a relationship from within before_update()/before_insert(), etc. I looked up my previous problem and it is related to this code from strategies.py:
>
> > class LazyLoader()
> > ...
> > def _load_for_state(self, state, passive):
> > ...
> >        if self.use_get:
> >            if session._flushing:
> >                get_attr = instance_mapper._get_committed_state_attr_by_column
> >            else:
> >                get_attr = instance_mapper._get_state_attr_by_column
>
> > If we are in the middle of a flush, we don't attempt a query but rather use the committed state.  This caused me problems when the committed state of the relationship was "None" (for example, not loaded yet?)
>
> > For my own understanding, what implications does this have?  It kind of freaked me out because I interpreted that as meaning "I'm not sure if I can rely upon unloaded relationships at all from within these mid-flush event methods..." because I certainly don't want to be using stale, old data.  Is there a threat of this, or only of it sometimes being "None"?
>
> so this is we're inside of a flush, and suppose we need to get at someobject.some_related specifically to target what's present in the database - if you've changed the value of someobject.some_related_id, that isn't flushed yet - we need some_related relative to the some_related_id that was loaded from the database.
>

wait...that's where you lose me. In this condition where user's
changed someobject.some_related_id, then as soon as that is flushed,
someobject.some_related by definition is no longer going to be the
object we get with the currently committed id.... This is circling
back to the automatic expiration of relationships dilemma. it seems
we are purposefully giving them the wrong object. Directly after the
flush, were we to query that same relationship, it would return a
different object/collection.

I believe that is the other reason I needed to abandon trusting the
relationship when within a flush. I think I recall this exact problem
occurring to me. The id had changed and anywhere else within
SQLAlchemy land (that isn't in a flush), I could trust the
relationship to bring back the correct object given the current state
of the id. However, in that case (inside a flush) it was bringing
back the wrong object (the one inconsistent with the current id) so
inspecting that related object's values would give me the wrong
answer.

You sure this logic (and therefore also test cases) aren't wrong?

Kent

oläst,
15 dec. 2011 17:56:032011-12-15
till sqlal...@googlegroups.com

> wait...that's where you lose me. In this condition where user's
> changed someobject.some_related_id, then as soon as that is flushed,
> someobject.some_related by definition is no longer going to be the
> object we get with the currently committed id.... This is circling
> back to the automatic expiration of relationships dilemma. it seems
> we are purposefully giving them the wrong object. Directly after the
> flush, were we to query that same relationship, it would return a
> different object/collection.
>
> I believe that is the other reason I needed to abandon trusting the
> relationship when within a flush. I think I recall this exact problem
> occurring to me. The id had changed and anywhere else within
> SQLAlchemy land (that isn't in a flush), I could trust the
> relationship to bring back the correct object given the current state
> of the id. However, in that case (inside a flush) it was bringing
> back the wrong object (the one inconsistent with the current id) so
> inspecting that related object's values would give me the wrong
> answer.
>
> You sure this logic (and therefore also test cases) aren't wrong?
Here is the script output for a demonstration. I can't imagine this
being the behavior we want, is it?

******* Before move: rockid=[1] rock object: <Rock: id=[1]>


******* During flush: rockid=[0] rock object: <Rock: id=[1]>


******* After flush: rockid=[0] rock object: <Rock: id=[0]>

=========================================================================
from sqlalchemy import *
from sqlalchemy.orm import *

engine = create_engine('sqlite:///', echo=False)
metadata = MetaData(engine)
Session = sessionmaker(bind=engine)

rocks_table = Table("rocks", metadata,
Column("id", Integer, primary_key=True),
)

bugs_table = Table("bugs", metadata,
Column("id", Integer, primary_key=True),
Column("rockid", Integer, ForeignKey('rocks.id'),),
)

class Rock(object):
def __repr__(self):
return '<Rock: id=[%s]>' % self.__dict__.get('id')

class Bug(object):
def __repr__(self):
return '<Bug: id=[%s]>' % self.__dict__.get('id')

def printstate(self, msg):
print "\n******* %s: rockid=[%s] rock object: %r\n" % (msg,
self.__dict__.get('rockid'), self.rock)


class BugAgent(MapperExtension):
def before_update(self, mapper, connection, instance):
instance.printstate("During flush")
session.expire(instance, ['rock'])

mapper(Rock, rocks_table,
properties={'bugs': relationship(Bug,
cascade='all,delete-orphan',
single_parent=True,
lazy=False,
backref='rock')
})

mapper(Bug, bugs_table, extension=BugAgent(),
allow_partial_pks=False)

metadata.create_all()

session = Session()

# add a rock and bug
rock=Rock()
rock.id = 0
bug=Bug()
bug.id = 0
rock.bugs.append(bug)
session.add(rock)

# add another rock and bug
rock=Rock()
rock.id = 1
bug=Bug()
bug.id = 1
rock.bugs.append(bug)
session.add(rock)

session.commit()
session.expunge_all()

# later... new session
session = Session()

b1 = session.query(Bug).get(1)

b1.printstate("Before move")

# Move bug to another rock...
b1.rockid = 0
session.expire(b1, ['rock'])

session.flush()

b1.printstate("After flush")
=========================================================================

Michael Bayer

oläst,
15 dec. 2011 18:16:272011-12-15
till sqlal...@googlegroups.com

On Dec 15, 2011, at 5:05 PM, Kent wrote:

> wait...that's where you lose me. In this condition where user's
> changed someobject.some_related_id, then as soon as that is flushed,
> someobject.some_related by definition is no longer going to be the
> object we get with the currently committed id.... This is circling
> back to the automatic expiration of relationships dilemma. it seems
> we are purposefully giving them the wrong object. Directly after the
> flush, were we to query that same relationship, it would return a
> different object/collection.
>
> I believe that is the other reason I needed to abandon trusting the
> relationship when within a flush. I think I recall this exact problem
> occurring to me. The id had changed and anywhere else within
> SQLAlchemy land (that isn't in a flush), I could trust the
> relationship to bring back the correct object given the current state
> of the id. However, in that case (inside a flush) it was bringing
> back the wrong object (the one inconsistent with the current id) so
> inspecting that related object's values would give me the wrong
> answer.
>
> You sure this logic (and therefore also test cases) aren't wrong?

The logic and test cases are definitely correct for the one-to-many case, it is attached as test_o2m.py and if you change line 417 of orm/strategies.py to not use committed_state, it fails. We need to use what was loaded from the DB, not what is pending.

For the many-to-one case with use_get here there are actually no tests that cover the case where the two values are different - I can see this by changing get_attr() to call *both* functions and compare the values to see if they're different and there is no such test. I can't really come up with a test that makes any difference here as the mechanics of things tends to "work out" the way m2o's currently work.

But to change it would mean it's inconsistent. Within a flush, the dependency system needs to load relationships based on the last known database state, not the not-yet-flushed pending changes. When this is called during a flush, we know we haven't yet updated the object since we're still looking at its attributes, which always results from one of the objects its dependent on, or from itself.

If the issue you're having is that you have a session handler that wants things to act as though we're outside of the flush, the internal APIs need to be changed such that dependency.py passes another flag in to use "committed state" without the need to check session._flushing. Or perhaps it needs to somehow check if the target object has already been handled by the mapper - if your object has already been the subject of an UPDATE by the time your extension does something, then the committed value *is* wrong. Try out the attached patch and if you can produce a test case which is repaired by it, then it's a candidate for inclusion.


test_o2m.py
finer_grained_check.patch

Michael Bayer

oläst,
15 dec. 2011 18:29:222011-12-15
till sqlal...@googlegroups.com

On Dec 15, 2011, at 5:56 PM, Kent wrote:

>
>> ere is the script output for a demonstration. I can't imagine this being the behavior we want, is it?
>
> ******* Before move: rockid=[1] rock object: <Rock: id=[1]>
>
>
> ******* During flush: rockid=[0] rock object: <Rock: id=[1]>
>
>
> ******* After flush: rockid=[0] rock object: <Rock: id=[0]>

That's exactly the behavior it wants. Not for you, of course, but the UOW needs it that way. If you tried the one-to-many use case, that makes it more clear why it works how it does right now. For m2o I'd have to dig a little harder to produce a justifying case.

It's probably important to support "normal" behavior within the before_flush() and after_flush() hooks - that's pretty doable. Within before_update/after_update() it means an elaborate internal API change throughout, I'm not even sure it's appropriate, and these methods are pretty clear in their docs that the Session doesn't behave normally. I never use before_update()/before_insert() for anything outside of direct SQL manipulation, which is what they are intended for. That's why they get a Connection but not a Session.

Kent

oläst,
15 dec. 2011 22:06:502011-12-15
till sqlal...@googlegroups.com
On 12/15/2011 6:29 PM, Michael Bayer wrote:
> I never use before_update()/before_insert() for anything outside of direct SQL manipulation, which is what they are intended for. That's why they get a Connection but not a Session.

So are you suggesting if I did this all within before_flush(), the
problem goes away and stick to only SQL manipulation in
before_update()/before_insert()?


Svara alla
Svara författaren
Vidarebefordra
0 nya meddelanden