Validations and when they are run

1 view
Skip to first unread message

myobie

unread,
Nov 6, 2009, 4:17:00 PM11/6/09
to DataMapper
I want to catalog my thoughts on this here and see if anyone else can
chime in with some good ideas.

I am seeing a lot of SQL traffic in my logs dealing with validating
over and over again on the same objects.

It makes sense to me to assume that if a resource has been persisted
and it's not dirty, then it's valid (assuming the validations have not
change). The question is tho, what do we do if the validations have
changed since the resource was persisted? I thought of #revalidate? as
an explicit way to say don't assume anything, just run the normal
validators.

Talking with dkubb, it is apparent that custom validations should
never be skipped, as they could be anything. However, per-field
validations could be skipped if it can generally be assumed that the
field is valid based on the cleanliness of its resource, its persisted
state, and if it's loaded or not.

I vote that if it's clean and saved, then don't validate. And I also
vote for an explicit way to revalidate the resource. Also, I don't
think attributes on clean, saved resources should be lazy_loaded just
to validate them.

My example of this is: Book.create(:author => Author.get(1))
Author has a validation that it's :name is unique.

Every Book that is created, I see SQL searching to make sure that they
Author's name isn't take yet (since the valid? on book cascades to the
belongs_to :author). This is a lot of SQL traffic for a resource that
I think can be assumed to be valid in the first place.

Any thoughts?

Jordan Ritter

unread,
Nov 6, 2009, 4:47:04 PM11/6/09
to datam...@googlegroups.com
I watch the generated SQL all the time too, and have seen variants of what you're describing.  

FYI I logged a related bug recently, which is that auto-validations trigger lazy, not-yet-loaded (and thus not dirty) fields to get loaded -- one at a time, per property (and type of validation):


cheers,
--jordan

Dan Kubb (dkubb)

unread,
Nov 7, 2009, 10:17:30 PM11/7/09
to DataMapper
myobie,

> I am seeing a lot of SQL traffic in my logs dealing with validating
> over and over again on the same objects.

This is something I recommend everyone do. Turn on your logs, watch
what happens for specific actions, and make not of anything that seems
unnecessary. While developing DM we always try to ensure minimal work
is performed, but we're not perfect so if if seems like extra queries
are being issues please create a ticket.

> It makes sense to me to assume that if a resource has been persisted
> and it's not dirty, then it's valid (assuming the validations have not
> change).

This makes sense. We can be reasonably certain that anything pulled
from the datastore is valid. I think this should be the default
behavior of dm-validations.

One of the common problems we see is that a lazy attribute is
validated, so DM lazy-loads attribute, and then validates it. If
there are several of these then each is loaded individually and
validated. If we make this change, then the already persisted
attributes will be trusted and not lazy-loaded.

> The question is tho, what do we do if the validations have
> changed since the resource was persisted? I thought of #revalidate? as
> an explicit way to say don't assume anything, just run the normal
> validators.

I think this is a valid concern, but we should also remember it is not
as common as the problem you outline, so any decision we make should
be weighted properly towards reducing unnecessary validation when we
can.

If someone makes their validation rules *more* strict, it really is a
good idea to make a batch script that loads up all the existing
records and ensures they are still valid, flagging those that need
manual intervention to resolve. It's in this case we should provide a
way to override the default behavior, and force validation to always
occur.

> Talking with dkubb, it is apparent that custom validations should
> never be skipped, as they could be anything. However, per-field
> validations could be skipped if it can generally be assumed that the
> field is valid based on the cleanliness of its resource, its persisted
> state, and if it's loaded or not.
>
> I vote that if it's clean and saved, then don't validate. And I also
> vote for an explicit way to revalidate the resource. Also, I don't
> think attributes on clean, saved resources should be lazy_loaded just
> to validate them.

Most of our per-attribute validation is things like testing a String
is the valid length, or that a required value is not-nil. I think we
can skip performing these types of tests when the attribute is not
dirty (which we can tell from Resource#attribute_dirty?).

This will be especially good with uniqueness validation, because
there's no need to test if something is unique if that's what already
exists in the datastore.

--

Dan
(dkubb)

myobie

unread,
Nov 9, 2009, 11:10:00 AM11/9/09
to DataMapper
So I am testing this brute forced method out right now:
http://github.com/myobie/dm-more/commit/f781627c21f999182bd1625760782b0b9e03f5ed

I am not proposing that this is the correct way to go, but at first
glance it gave me a hook for testing.

The SQL traffic is so different using this patch that I am blown away.
It's amazing really, my dm-sweatshop script is now 90% INSERT and
UPDATE with very few SELECTs.

Anyway, food for thought. Anyone know where the correct place to hook
into the validations might be?
Reply all
Reply to author
Forward
0 new messages