feature discussion: before_exec() Instance event

25 views
Skip to first unread message

Randy Syring

unread,
Mar 13, 2012, 6:10:03 PM3/13/12
to sqlalche...@googlegroups.com
I'm wondering what the chances are of getting some kind of instance event that would send INSERT/UPDATE values back to it right before sess.exec() is called for an INSERT or UPDATE operation.  I'd imagine something like:

@event.listen_for(Car, 'before_exec'):
def validate_entity(car, values, type):
    assert type in ('insert', 'update')
    if 'active_flag' in values:
        values['active_flag'] = 1 if values['active_flag'] else 0
    if values['make'] not in ('Ford', 'Chevy'):
        raise ValueError('make not recognized')
    if 'mileage' not in values:
        values['mileage'] = 'unknown'
    return values


I currently have a validation library (https://bitbucket.org/rsyring/sqlalchemy-validation/) that uses the before_flush session event to try and do validation, but I run into problems with validating values that come from column defaults and other more "low level" places. 

I realize there are different ways to do validation and there are arguments to be made against ever needing to validate a default value.  However, I'd like to be able to do validation of an entity in as low-level of a place as possible so as to guarauntee catching as many exceptions as I can without relying on form libraries to do input validation on my model.  I want the model to be self-validating so that I can use it in a web application, GUI, or console app and not have to write validation code in three places.

If this is possible, I'm willing to work on providing a patch.  I just want to make sure its possible and would be considered for acceptance before I go down that route.

Thanks.
---------------------------------------------
Randy Syring
Development & Executive Director
Level 12 Technologies (formerly Intelicom)
Direct: 502-276-0459
Office: 502-212-9913

Intelicom is now Level 12 Technologies, learn more about our name change.
Please update your address book with my new email address.

Principled People, Technology that Works

Michael Bayer

unread,
Mar 13, 2012, 9:04:59 PM3/13/12
to sqlalche...@googlegroups.com
On Mar 13, 2012, at 3:10 PM, Randy Syring wrote:

I'm wondering what the chances are of getting some kind of instance event that would send INSERT/UPDATE values back to it right before sess.exec() is called for an INSERT or UPDATE operation.  I'd imagine something like:

@event.listen_for(Car, 'before_exec'):
def validate_entity(car, values, type):
    assert type in ('insert', 'update')
    if 'active_flag' in values:
        values['active_flag'] = 1 if values['active_flag'] else 0
    if values['make'] not in ('Ford', 'Chevy'):
        raise ValueError('make not recognized')
    if 'mileage' not in values:
        values['mileage'] = 'unknown'
    return values


I currently have a validation library (https://bitbucket.org/rsyring/sqlalchemy-validation/) that uses the before_flush session event to try and do validation, but I run into problems with validating values that come from column defaults and other more "low level" places. 

I realize there are different ways to do validation and there are arguments to be made against ever needing to validate a default value.  However, I'd like to be able to do validation of an entity in as low-level of a place as possible so as to guarauntee catching as many exceptions as I can without relying on form libraries to do input validation on my model.  I want the model to be self-validating so that I can use it in a web application, GUI, or console app and not have to write validation code in three places.

If this is possible, I'm willing to work on providing a patch.  I just want to make sure its possible and would be considered for acceptance before I go down that route.

One issue with raising validation errors like that is if the object has multiple validation issues, you only get to see one of them at a time.   If you're flushing 50 objects, and five of them have two validation issues, you get just one validation issue out of ten.    This suggests the better way to handle these validation issues is to aggregate them together, then report the multiple issues as a batch, which could be a single exception that has a collection of all issues found.

But working with validation issues in the aggregate means you need to look for them before the persistence system has drilled down too far, where you're well within the weeds and can only see the ground right in front of you, not anything else.    Invoking a ValueError right at the point of a single SQL statement, for example, would by necessity blow away the whole flush process - if the INSERT fails, it follows that everything has to stop - other objects and states might be dependent on it.   Invoking a ValueError at the point of when the mapper batches up a list of commonly-typed objects together might be able to aggregate validation across that subset, but prevents validation for other mappers to occur once an exception has cancelled the process.

In any case, validation is kind of a model-level concern, such as with the event you have above (which is analogous to before_flush).   Column default values aren't known until after the INSERT occurs - you can of course get at those within the after_flush() event, so if you're looking for post-INSERT validation that's where you'd put that.    Local validations can also be in events like before_insert() and after_insert() though you have limited ability to look beyond the scope of the object immediately at hand.    As far as other "low level" places, I suppose you're talking about triggers, again no way to know their effects in the application layer until after they've fired off, so after_flush()/after_insert() would still be suggested as where this might be.

As far as an event that acts like "before_flush()" but fires off for individual objects, there have always been plans to add helpers to the before_flush() (as well as after_flush(), others) listener whereby the listener can be established with filters that provide individual object instances to hooks, pulled from various collections like .new, .dirty, .identity_map, etc.    These aren't very hard to roll manually for now as I'm sure you've done with before_flush().














Thanks.
---------------------------------------------
Randy Syring
Development & Executive Director
Level 12 Technologies (formerly Intelicom)
Direct: 502-276-0459
Office: 502-212-9913

Intelicom is now Level 12 Technologies, learn more about our name change.
Please update your address book with my new email address.

Principled People, Technology that Works

--
You received this message because you are subscribed to the Google Groups "sqlalchemy-devel" group.
To post to this group, send email to sqlalche...@googlegroups.com.
To unsubscribe from this group, send email to sqlalchemy-dev...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sqlalchemy-devel?hl=en.

Randy Syring

unread,
Mar 14, 2012, 10:16:24 AM3/14/12
to sqlalche...@googlegroups.com

On 03/13/2012 09:04 PM, Michael Bayer wrote:

On Mar 13, 2012, at 3:10 PM, Randy Syring wrote:

I'm wondering what the chances are of getting some kind of instance event that would send INSERT/UPDATE values back to it right before sess.exec() is called for an INSERT or UPDATE operation.  I'd imagine something like:

@event.listen_for(Car, 'before_exec'):
def validate_entity(car, values, type):
    assert type in ('insert', 'update')
    if 'active_flag' in values:
        values['active_flag'] = 1 if values['active_flag'] else 0
    if values['make'] not in ('Ford', 'Chevy'):
        raise ValueError('make not recognized')
    if 'mileage' not in values:
        values['mileage'] = 'unknown'
    return values


I currently have a validation library (https://bitbucket.org/rsyring/sqlalchemy-validation/) that uses the before_flush session event to try and do validation, but I run into problems with validating values that come from column defaults and other more "low level" places. 

I realize there are different ways to do validation and there are arguments to be made against ever needing to validate a default value.  However, I'd like to be able to do validation of an entity in as low-level of a place as possible so as to guarauntee catching as many exceptions as I can without relying on form libraries to do input validation on my model.  I want the model to be self-validating so that I can use it in a web application, GUI, or console app and not have to write validation code in three places.

If this is possible, I'm willing to work on providing a patch.  I just want to make sure its possible and would be considered for acceptance before I go down that route.

One issue with raising validation errors like that is if the object has multiple validation issues, you only get to see one of them at a time.   If you're flushing 50 objects, and five of them have two validation issues, you get just one validation issue out of ten.    This suggests the better way to handle these validation issues is to aggregate them together, then report the multiple issues as a batch, which could be a single exception that has a collection of all issues found.

That is, indeed, how I do it.  My example above was simplified to just show how the event would work.



But working with validation issues in the aggregate means you need to look for them before the persistence system has drilled down too far, where you're well within the weeds and can only see the ground right in front of you, not anything else.    Invoking a ValueError right at the point of a single SQL statement, for example, would by necessity blow away the whole flush process - if the INSERT fails, it follows that everything has to stop - other objects and states might be dependent on it.   Invoking a ValueError at the point of when the mapper batches up a list of commonly-typed objects together might be able to aggregate validation across that subset, but prevents validation for other mappers to occur once an exception has cancelled the process.

Agreed.  However, my intention was not to throw a validation error.  Rather, i would save the errors on the instance itself.  Then, I just need to be able to abort the process once all the mappers have been processed (say in before_flush() or after_flush()).



In any case, validation is kind of a model-level concern, such as with the event you have above (which is analogous to before_flush).   Column default values aren't known until after the INSERT occurs - you can of course get at those within the after_flush() event, so if you're looking for post-INSERT validation that's where you'd put that.

Yes, but if the values you want to validate are invalid in the DB, then you get a DB exception and the problem is worse (more on this below).


   Local validations can also be in events like before_insert() and after_insert() though you have limited ability to look beyond the scope of the object immediately at hand.   

Yes, and its really the "limited ability to look beyond the scope of the object immediately at hand" that gets me.  Other than this, everything works as-is currently.  The main difficulty I have, and the reason for this post, is that some values that actually get sent to the DB are not available in the before_flush() event and I'd like to be able to validate them before they hit the DB (lest they cause an exception and a bigger, uglier mess).

So, maybe my question really is: can we create an event to address the fact that there are values getting sent to the DB that are not available in the before_flush() event?  Just for visuals sake, here is an example of my problem:

http://paste.pocoo.org/show/565484/

Currently, I don't have the ability to test either the default or FK values until the after_flush() event.  The FK issue is an even bigger problem IMO b/c it really is something dependent on the data a user might send.  Not being able to validate that value means I get different DB errors that I have to deal with (not-null error on some, FK on others, different text in the exceptions) instead of a standard error exception from my validation library telling me that 'Commen.blog_id may not be empty'.  Granted, I can and do do validation of values before they get to the model, but currently that has to be repeated for each interface type.  Requiring Comment.blog_id seems like something I should be able to enforce at the model level.

Ok, so I'm starting to understand my problem a little better too.  As long as a value going to the DB isn't going to create a DB exception, it can be validated in the after_flush() context without a problem.  The only time where there are going to be problems is when the invalid value will cause a DB exception that you want to avoid (like a missing FK which is required).  So, how about creating an exception that can be thrown in the event I propose above that would tell SA to not send that SQL to the DB.  Something like:

import sqlalchemy as sa
import sqlalchemy.orm as saorm

# this event should also accept mapper which would indicate all instance types (just like other events)
@event.listen_for(Comment, 'before_exec'):
def validate_entity(comment, values, type):
    if not values.get('blog_id'):
        comment.add_validation_error('blog_id', 'must not be empty')
        # tells SA internals not to send this flush to the DB.  Obviously, it
        # assumes that there is code in a after_flush() event to pick up the
        # validation error, do something with it, and ultimately abort the flush
        raise saorm.ABORT_EXEC
The above would only abort the exec() for this given instance, the flush would proceed for all other mappers as if this mapper had succeeded.

Obviously, aborting the exec() for an instance might throw off some internal state of the session, but I'm assuming the validation errors would be caught in after_flush() and a more general exception indicating a validation error would be raised at that point (negating the flush anyway).

Michael Bayer

unread,
Mar 14, 2012, 2:16:53 PM3/14/12
to sqlalche...@googlegroups.com
On Mar 14, 2012, at 7:16 AM, Randy Syring wrote:



But working with validation issues in the aggregate means you need to look for them before the persistence system has drilled down too far, where you're well within the weeds and can only see the ground right in front of you, not anything else.    Invoking a ValueError right at the point of a single SQL statement, for example, would by necessity blow away the whole flush process - if the INSERT fails, it follows that everything has to stop - other objects and states might be dependent on it.   Invoking a ValueError at the point of when the mapper batches up a list of commonly-typed objects together might be able to aggregate validation across that subset, but prevents validation for other mappers to occur once an exception has cancelled the process.

Agreed.  However, my intention was not to throw a validation error.  Rather, i would save the errors on the instance itself.  Then, I just need to be able to abort the process once all the mappers have been processed (say in before_flush() or after_flush()).


In any case, validation is kind of a model-level concern, such as with the event you have above (which is analogous to before_flush).   Column default values aren't known until after the INSERT occurs - you can of course get at those within the after_flush() event, so if you're looking for post-INSERT validation that's where you'd put that.

Yes, but if the values you want to validate are invalid in the DB, then you get a DB exception and the problem is worse (more on this below).

   Local validations can also be in events like before_insert() and after_insert() though you have limited ability to look beyond the scope of the object immediately at hand.   

Yes, and its really the "limited ability to look beyond the scope of the object immediately at hand" that gets me.  Other than this, everything works as-is currently.  The main difficulty I have, and the reason for this post, is that some values that actually get sent to the DB are not available in the before_flush() event and I'd like to be able to validate them before they hit the DB (lest they cause an exception and a bigger, uglier mess).

The only kind of value that's not available in before_flush(), unless you have other events going on, is that of a newly generated primary key value which is then populated into the foreign key attributes of another instance.    The closest place you can get to these would be the before_insert and after_insert hooks.   The restriction on these methods is that you can't assume relationships are traversable in this state.


So, maybe my question really is: can we create an event to address the fact that there are values getting sent to the DB that are not available in the before_flush() event?  Just for visuals sake, here is an example of my problem:

http://paste.pocoo.org/show/565484/

Currently, I don't have the ability to test either the default or FK values until the after_flush() event. 

right, you'd see it on Comment in the before_insert event and createdts in after_insert.   But note that SQLAlchemy doesn't load the value of "createdts" until this attribute is accessed.   You'll have a bit of a performance hit when your validator requires that this attribute be read unconditionally within the flush.

The FK issue is an even bigger problem IMO b/c it really is something dependent on the data a user might send.  Not being able to validate that value means I get different DB errors that I have to deal with (not-null error on some, FK on others, different text in the exceptions) instead of a standard error exception from my validation library telling me that 'Commen.blog_id may not be empty'.  Granted, I can and do do validation of values before they get to the model, but currently that has to be repeated for each interface type.  Requiring Comment.blog_id seems like something I should be able to enforce at the model level.

this may be obvious but normally I try to consider validations to be against the geometry of the object graph, not of foreign key values.   For example, the validation for Comment/Blog would be that Comment has a parent "Blog" object.   But I assume the issues you're dealing with are more DB-specific than this...


Ok, so I'm starting to understand my problem a little better too.  As long as a value going to the DB isn't going to create a DB exception, it can be validated in the after_flush() context without a problem.  The only time where there are going to be problems is when the invalid value will cause a DB exception that you want to avoid (like a missing FK which is required).  So, how about creating an exception that can be thrown in the event I propose above that would tell SA to not send that SQL to the DB.  Something like:

import sqlalchemy as sa
import sqlalchemy.orm as saorm

# this event should also accept mapper which would indicate all instance types (just like other events)
@event.listen_for(Comment, 'before_exec'):
def validate_entity(comment, values, type):
    if not values.get('blog_id'):
        comment.add_validation_error('blog_id', 'must not be empty')
        # tells SA internals not to send this flush to the DB.  Obviously, it
        # assumes that there is code in a after_flush() event to pick up the
        # validation error, do something with it, and ultimately abort the flush
        raise saorm.ABORT_EXEC

in before_flush(), you can remove the object from the flush plan by expiring or evicting it from the Session.    As I've mentioned before, when the flush plan has been assembled (which happens after that hook) and it's well within emitting SQL for objects, throwing some of the objects out can quickly screw things up - if you omit the INSERT for a row, then the dependency rules continue on to a child row for that parent row, you'll get a NULL constraint violation or similar.    The whole system is designed to get everything exactly as it's going to be, minus the actual PK/FK values, and just whip through it as fast as possible.    

For example, I just tried implementing this, as this is something we had long ago (and it was removed as the flush process became more sophisticated):


immediately, bookkeeping is broken elsewhere:

  File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/orm/unitofwork.py", line 347, in finalize_flush_changes
    self.session._register_newly_persistent(state)
  File "/Users/classic/dev/sqlalchemy/lib/sqlalchemy/orm/session.py", line 1207, in _register_newly_persistent
    "a load() event." % mapperutil.state_str(state)
sqlalchemy.orm.exc.FlushError: Instance <A at 0x101490110> has a NULL identity key.  If this is an auto-generated value, check that the database table allows generation of new primary key values, and that the mapped Column object is configured to expect these generated values.  Ensure also that this flush() is not occurring at an inappropriate time, such as within a load() event.

this error is checking that we don't put a NULL into the identity map, which otherwise creates other issues obviously.   So more logic needs to be added to figure out which objects were "late evicted", then put them back into the "pending" state, which in itself is quite an issue as we've now completed a flush, not raised any errors, but not everything was actually flushed.    In your case, you'll be raising an exception before we get that far, but from my perspective, the feature is awkward - it guarantees that the flush will be broken.  The rationale is not clear and it complicates the codebase.   

I'm still thinking it's awkward that you need to see SQL defaults emitted in order to determine if a validation will succeed.   I really need to see a very specific case.      The case you gave me is a little too contrived - for example why would I need to validate that a column default returned a certain timestamp or not (and if I just wanted to check the database time, why wouldn't I invoke "now" against the database within before_flush()?  much more efficient than waiting for it to be hit on many individual rows when I can just do it en-masse for all objects that would be hitting that default) ?   Why would I need to look at the FK value of Comment.blog_id when I can just validate that the Comment is associated with a proper Blog in before_flush ?   This still seems like an issue that can be solved entirely before a flush proceeds, rather than wishing for the flush process itself to be dual purposed as a validation mechanism.


Randy Syring

unread,
Apr 13, 2012, 9:40:19 AM4/13/12
to sqlalche...@googlegroups.com, Michael Bayer
Mike,

You are right.  Somehow, I missed the before_insert() and before_update() events.  Those allowed me to validate foreign key constraints.  I'm content with that and am not going to worry about validating default constraints anymore.

Thanks for your help.
---------------------------------------------
Randy Syring
Development & Executive Director
Level 12 Technologies
Direct: 502-276-0459
Office: 502-212-9913

Principled People, Technology that Works
Reply all
Reply to author
Forward
0 new messages