Validation of new objects before committing

3,075 views
Skip to first unread message

Kirk Strauser

unread,
Sep 27, 2011, 1:57:15 PM9/27/11
to sqlal...@googlegroups.com
Does SA natively support (or is there a module on PyPI that supports) client-side validation of SQLAlchemy objects? For example, I have this declarative class:

class ImportedPayment(Base):
__tablename__ = 'importedpayment'
__table_args = {'schema': 'public'}
paymentid = Column(Integer, primary_key=True)
externalid = Column(String(16), nullable=False)
line = Column(Integer, nullable=False)
invoicestatus = Column(String(32), nullable=False)
quantity = Column(Numeric(scale=2), nullable=False)
rate = Column(Numeric(scale=2), nullable=False)

I'm reading data from a directory full of spreadsheets and generating millions of these objects. If one contains invalid data, it's OK to just log it and move on. Unfortunately, that pretty much means that I have to commit after every insertion so that I can catch any potential exceptions and this makes the whole process take ages to run.

Now, I've already defined the error conditions I'm likely to encounter and that I can easily handle: they're the constraints I defined in the class above. I'd love for the objects I'm creating to validate themselves through a method call like:

newobject = ImportedPayment(externalid='foo', line=None, [...])
try:
newobject.validate()
except ValueError as e:
print e.column, e.errmsg
else:
session.add(newobject)

yielding

'line', 'None in not-nullable column'

or similar. Granted, I could write these tests easily myself:

if externalid is None or len(externalid) > 16:
return False

but I've already specified them once and I don't want to repeat myself as it's a lot of extra typing and a lot harder to maintain (if I change "invoicestatus" to a String(64), I have to update every module which manually validates that data). Note that I'm not talking about higher-level checks like "the emailaddress Column contains a valid email address", but just the simple data type checks that can be inferred from class definitions (strings of appropriate length, not null, etc.).

Is there an easy way to do this? If not, why? And if the answer to that is "because you haven't written it yet", would anyone be interested in using it if I were to create such a thing?

Kirk

Michael Bayer

unread,
Sep 27, 2011, 2:56:02 PM9/27/11
to sqlal...@googlegroups.com

On Sep 27, 2011, at 1:57 PM, Kirk Strauser wrote:

> Does SA natively support (or is there a module on PyPI that supports) client-side validation of SQLAlchemy objects? For example, I have this declarative class:
>
> class ImportedPayment(Base):
> __tablename__ = 'importedpayment'
> __table_args = {'schema': 'public'}
> paymentid = Column(Integer, primary_key=True)
> externalid = Column(String(16), nullable=False)
> line = Column(Integer, nullable=False)
> invoicestatus = Column(String(32), nullable=False)
> quantity = Column(Numeric(scale=2), nullable=False)
> rate = Column(Numeric(scale=2), nullable=False)
>
>

> Is there an easy way to do this?

OK so the aspect you're looking for here is to define those validations just once, this is easy enough through a recipe like this:

def notnull(key):
def validate(obj):
if getattr(obj, key) is None:
return "Object %s key %s is None" % (obj, key)
else:
return False
return validate

_validators = {}

def get_validators(someobject):
mapper = object_mapper(someobject)
if mapper in _validators:
return _validators[mapper]
_validators[mapper] = v = []
for prop in mapper.iterate_properties():
if hasattr(prop, "columns"):
col = prop.columns[0]
if not col.nullable:
v.append(notnull(prop.key))
# ... ad nauesum, i.e.
# if <something else about the column>:
# v.append(<some other kind of validation function>)
return v

def validate(someobject):
for validator in get_validators(someobject):
msg = validator()
if msg:
log(msg)
# etc., i.e.
# alert_the_authorities()


> If not, why?

So the theme for today is "why does SQLA have recipes", basically when we can provide the core fragment of a feature but not a fully polished, documented, tested, packaged result, something that can just as easily be delivered as a small batch of customizable source code gets the job done pretty well, and would be better suited as a separate library if fully fleshed out.

The above recipe lacks a lot of features one might want, such as customizable ways of defining the validation failure, behavior on the receipt of a failed validation, etc. A full blown "validation" library might use the idea above but expand upon it in a much bigger way. I've had other ad-hoc validation use cases that wouldn't work with the above structure, instead needing a slightly different structure, so having a small thing just as code for now is more flexible than a built in "feature" that only handles a small subset of use cases.


> And if the answer to that is "because you haven't written it yet", would anyone be interested in using it if I were to create such a thing?

You might want to check around if similar things don't exist already, I did find http://pypi.python.org/pypi/SAValidation/ and http://pypi.python.org/pypi/sqlalchemy_elixir_validations/ for example, there might be features there that are of use. But by all means, produce a better validation library for SQLAlchemy, the more the merrier and I'd love to see more.


Kirk Strauser

unread,
Sep 28, 2011, 10:08:08 AM9/28/11
to sqlal...@googlegroups.com
On Sep 27, 2011, at 1:56 PM, Michael Bayer wrote:

> OK so the aspect you're looking for here is to define those validations just once, this is easy enough through a recipe like this:
>

> def notnull(key): [...]
>
> def get_validators(someobject): [...]
>
> def validate(someobject): [...]

Yep, that's pretty much what I had in mind.

> The above recipe lacks a lot of features one might want, such as customizable ways of defining the validation failure, behavior on the receipt of a failed validation, etc. A full blown "validation" library might use the idea above but expand upon it in a much bigger way. I've had other ad-hoc validation use cases that wouldn't work with the above structure, instead needing a slightly different structure, so having a small thing just as code for now is more flexible than a built in "feature" that only handles a small subset of use cases.

I get what you're saying, truly, and agree with the underlying argument: SQLA is an ORM, not a data validator, template parser, or cheese grater. But I would contend that the kind of validations I'm proposing would operate on the level SQLA occupies. What I had in mind wouldn't answer the question "does this column contain the correctly spelled name of a Pantone color?", but "is the data in this column capable of being stored in the underlying database?"

For instance, if Model.foo is Column(Integer), and the client app sets it to "creqcrq", then most databases (ignoring SQLite for the moment) will balk at insertion time. Similarly, PostgreSQL won't let you store "value" in Column(String(1)). If my model has already defined that column as a String(1), it'd be convenient if I could ask SQLA if the object I've just created can even be stored.

> You might want to check around if similar things don't exist already, I did find http://pypi.python.org/pypi/SAValidation/ and http://pypi.python.org/pypi/sqlalchemy_elixir_validations/ for example, there might be features there that are of use. But by all means, produce a better validation library for SQLAlchemy, the more the merrier and I'd love to see more.

The ones I've found are pretty high-level form validation stuff. Which is well and good! I'm glad we have those types of things. I'm looking for something more akin to checking for type safety, though. If I can get something usably working, I'll toss it up on Github for everyone's amusement. :-)

- Kirk

Michael Bayer

unread,
Sep 28, 2011, 10:45:29 AM9/28/11
to sqlal...@googlegroups.com

On Sep 28, 2011, at 10:08 AM, Kirk Strauser wrote:

> I get what you're saying, truly, and agree with the underlying argument: SQLA is an ORM, not a data validator, template parser, or cheese grater. But I would contend that the kind of validations I'm proposing would operate on the level SQLA occupies. What I had in mind wouldn't answer the question "does this column contain the correctly spelled name of a Pantone color?", but "is the data in this column capable of being stored in the underlying database?"
>
> For instance, if Model.foo is Column(Integer), and the client app sets it to "creqcrq", then most databases (ignoring SQLite for the moment) will balk at insertion time. Similarly, PostgreSQL won't let you store "value" in Column(String(1)). If my model has already defined that column as a String(1), it'd be convenient if I could ask SQLA if the object I've just created can even be stored.

SQLA is very loathe to duplicate what the DBAPI and/or database already does. These use cases are all already covered, the issue here is wanting those validations to be duplicated under the specific use case that invalid values should be skipped, instead of halting the operation. This already cuts down the usefulness of such a feature to a small percentage of projects.

The scope of column-based validations we can do is limited. "nullable" is simple, sure.

Also, there are type-based validations, fine, string, numeric, integer, dates, including length of strings. If someone throws on postgresql.INET, not really, unless the contract of TypeEngine objects is extended such that they *all* get in-python validation functions. Which is really, really redundant, as DBAPIs do that most of the time. Very heavyhanded for very little use - we definitely don't want these validations turned on all the time as they'd kill performance unnecessarily. String length in particular, we have to deal with unicode conversions before checking length, some databases store unicode as "number of chars" others as "number of encoded bytes", it's complicated, and entirely redundant vs. what the database already does.

Other validations that can be derived from schema include CheckConstraint. It contains a SQL expression - if it's a string, we can't test it without going to the database. If it's derived from a SQL expression construct, we can use the techniques in orm.evaluator to run an extremely limited subset of that expression, it would be awkward, it would fail all the time due to all kinds of database-specfic SQL functions we don't have a translation for. To get Python-side validation here, the user probably has to supply an in-python validation function separately from the constraint itself (and we have hooks for that).

Another one, UniqueConstraint. How do we check that in Python without hitting the DB ? not really possible. Same for ForeignKeyConstraint.

I didn't mention this last time but we do have almost everything you need to do "validations" except for the validation functionality itself - the @validates decorator and its more generalized version, the attribute set event, receives the attribute, the value, all you need to do is take a look at the Column passed in, associated with the incoming attribute. From there you can look at "nullable", and maybe check within five or six possible types that are built-in. It still seems like a trivial amount of code for anyone to do for a specific subset of validations - to do it for a very wide range of validations, as a built-in feature demands, its an enormous job which would require a ton of testing and would almost never be used as it would be an unnecessary performance hit for a job the database already does much better.

SQLAlchemy itself really cannot implement such a feature in any reasonable way, it would have a huge amount of holes in it. It is definitely best as a third party project, the hooks are there.

Mengu

unread,
Sep 28, 2011, 12:27:30 PM9/28/11
to sqlalchemy
while we are on the subject, i'd like to ask a question if i'm doing
it right or wrong.

i have created a class called Validation with a method called is_valid
and I have mixed it in my SQLAlchemy models. all of my models has an
attribute called validation that consists of model's attributes that
must be valid. it's like this:

class User(DeclarativeBase, Validation):
validation = {
'username': [formencode.validations.NotEmpty],
'email': [formencode.validations.NotEmpty,
formencode.validations.Email]
}

# model definition here..

and then in my controller i check if model_instance.is_valid() and
then add it to my db. if it's not valid, i render the previous page.

let me know if you guys have any recommendations.
> You might want to check around if similar things don't exist already, I did findhttp://pypi.python.org/pypi/SAValidation/andhttp://pypi.python.org/pypi/sqlalchemy_elixir_validations/for example, there might be features there that are of use.   But by all means, produce a better validation library for SQLAlchemy, the more the merrier and I'd love to see more.

Kirk Strauser

unread,
Sep 28, 2011, 1:28:55 PM9/28/11
to sqlal...@googlegroups.com
On Sep 28, 2011, at 9:45 AM, Michael Bayer wrote:

> Also, there are type-based validations, fine, string, numeric, integer, dates, including length of strings. If someone throws on postgresql.INET, not really, unless the contract of TypeEngine objects is extended such that they *all* get in-python validation functions. Which is really, really redundant, as DBAPIs do that most of the time. Very heavyhanded for very little use - we definitely don't want these validations turned on all the time as they'd kill performance unnecessarily. String length in particular, we have to deal with unicode conversions before checking length, some databases store unicode as "number of chars" others as "number of encoded bytes", it's complicated, and entirely redundant vs. what the database already does.

OK, you've convinced me. I hadn't taken those cases into consideration; they don't come up much in the stuff I'm working with. For the record, though (in case anyone ever Googles this and wonders what I was thinking), I never thought of this as a mandatory behavior but as something that would be there if you wanted to use it, like:

>>> from sqlalchemy import getvalidationerrors
>>> newobj = MyClass(column1='foo', column2='bar')
>>> for error in getvalidationerrors(newobj): [...]

Anyway, thanks for the pointers to a workable ad-hoc approach.

- Kirk

Reply all
Reply to author
Forward
0 new messages