So my question, from a Django design perspective, is how much
validation built-in fields should do. The Finnish localflavor, for
example, validates check digits on its SSN field[1], and I'd be happy
to sit down and work out the logic for "fully" validating a US SSN
(rejecting rserved groups and invalid combinations, etc.), but wanted
to make sure this was the preferred method before going forward.
Additionally, in the case of the US SSN, the "valid" numbers do change
occasionally, since the Social Security Administration can choose to
allocate previously unused blocks of numbers (right now, a number
starting with any group higher than 799 is invalid, and probably will
be for a while, but eventually those numbers will come into use); if
we try to validate as much as possible, should that include rejecting
currently-unused blocks and adding them in if/when the SSA decides to
put them into service?
Another area where this is likely to come up is credit-card numbers --
if we ever ship a validator for those, will it just verify the number
of digits, or should it also know how to examine the number for
tentative validity?
[1] http://code.djangoproject.com/browser/django/trunk/django/contrib/localflavor/fi/forms.py
--
"Bureaucrat Conrad, you are technically correct -- the best kind of correct."
Maybe what we should have is the ability to have progressive levels of
validation:
1. Basic field validation of format (i.e. xxx-yy-zzz for SSN's) or
xxxxx for Zip Codes
3. Correct data based on other fields. For instance, given a zip code
and state, we can validate that they match up.
As far as SSNs, we should validate format and maybe some versions
that are obviously wrong (000-00-0000, etc) but otherwise I think if
it is too smart, then it gets to be a pain to manage. If I really
need to collect SSN's or some other data there will probably be some
sort of validation I need to do on my end after I get the data.
I do think Luhn checksums for credit cards would be a nice thing to
add. I do have one in Satchmo here -
http://www.satchmoproject.com/trac/browser/satchmo/trunk/satchmo/shop/views/utils.py
and would be happy to add a patch if folks would like to see it.
Also, I've created the Barnum project to generate fake but accurate
data. It's sort of the reverse of what you're trying to do but I do
have a large file of US Zip Codes corresponding to cities and states.
You can see all the code here-
http://barnum.googlecode.com/svn/trunk/
-Chris
> [1]http://code.djangoproject.com/browser/django/trunk/django/contrib/loc...
Yeah. I feel like the best option is sort of a compromise here -- in
the case of SSNs, there are some combinations which will always be
invalid (e.g., any block of all zeroes, any "666" area number, and
anything in the reserved blocks used for advertising/explanation).
I'm just not sure whether we want even that amount of logic, or if we
want to just verify that it's a 9-digit number with optional dashes
and correct grouping.
On Mar 31, 2:56 pm, "James Bennett" <ubernost...@gmail.com> wrote:
> On 3/31/07, chris.moff...@gmail.com <chris.moff...@gmail.com> wrote:
>
> > As far as SSNs, we should validate format and maybe some versions
> > that are obviously wrong (000-00-0000, etc) but otherwise I think if
> > it is too smart, then it gets to be a pain to manage. If I really
> > need to collect SSN's or some other data there will probably be some
> > sort of validation I need to do on my end after I get the data.
>
> Yeah. I feel like the best option is sort of a compromise here -- in
> the case of SSNs, there are some combinations which will always be
> invalid (e.g., any block of all zeroes, any "666" area number, and
> anything in the reserved blocks used for advertising/explanation).
>
> I'm just not sure whether we want even that amount of logic, or if we
> want to just verify that it's a 9-digit number with optional dashes
> and correct grouping.
If you *do* put in that much logic, something like a strict=False
argument to the constructor would be a good idea. False should
possibly even be the default.
Joseph
Can you elaborate on the logic behind this request? These are meant to
validate the fields, right? So you are asking for validation that
doesn't validate.
Given how easy it is to write custom cleaning functions, I'd rather we
shipped reasonably correct versions and if people wanted less strict
constraints, they can write their own.
Regards,
Malcolm
The Norwegian one can tell if you you're male or female. There's some
sort of features arms race going on between our Nordic contributors. :-)
> and I'd be happy
> to sit down and work out the logic for "fully" validating a US SSN
> (rejecting rserved groups and invalid combinations, etc.), but wanted
> to make sure this was the preferred method before going forward.
>
> Additionally, in the case of the US SSN, the "valid" numbers do change
> occasionally, since the Social Security Administration can choose to
> allocate previously unused blocks of numbers (right now, a number
> starting with any group higher than 799 is invalid, and probably will
> be for a while, but eventually those numbers will come into use); if
> we try to validate as much as possible, should that include rejecting
> currently-unused blocks and adding them in if/when the SSA decides to
> put them into service?
Don't go overboard, would be my suggestion. The pain with having to
retrofit a bunch of existing production products because numbers in the
"for future expansion" range started being used is non-trivial. I've
played this game in banking systems. It is the opposite of fun, because
you have to upgrade quickly and very carefully at the same time. It's
not like you're going to notice the problem much in advance of somebody
complaining that their number doesn't work.
> Another area where this is likely to come up is credit-card numbers --
> if we ever ship a validator for those, will it just verify the number
> of digits, or should it also know how to examine the number for
> tentative validity?
As you can see from the Sachmo code Chris posted, even just "number of
digits" is non-trivial, because it varies by card type. That also gives
an example of why being too prescriptive can hurt: Sachmo's card
validation looks pretty correct to me (I have some domain experience
here), but there's at least one major credit card, used throughout
Australia, that would be needlessly rejected by their system because
it's not in the whitelist. That's always going to be a problem with
being too strict.
I think Django should err on the side of permissiveness a little bit in
cases like this. It takes five minutes to write your own, more
restrictive data cleanser function if you want something stricter.
Cheers,
Malcolm
By the way, this isn't a dig at Sachmo. You can't default to "accept" in
e-payments, so they have to use a whitelist. It was an example of the
drawbacks of the (necessary in this case) whitelist approach.
Cheers,
Malcolm
This is a good question. It seems like the policy should be: "Include
as strict of validation as possible, without being so strict that the
validation may have to be frequently loosened in the future."
So, for the case of U.S. ZIP codes, it should validate that it's a
five-digit number, because it's doubtful that the ZIP code system will
change to six digits any time (and ignoring ZIP+4). But it shouldn't
actually validate the numbers themselves, because new ZIP codes get
added occasionally.
Another example -- and sorry for being American-centric here -- is
U.S. state abbreviations. That validator checks not only that the
abbreviation is two letters, but that it's one of the valid
state/territory abbreviations. States/territories don't get added very
often, so it's worth the extra level of validation in this case.
How does that sound as a policy?
Adrian
--
Adrian Holovaty
holovaty.com | djangoproject.com
Sounds good to me. Based on that, I'll whip up a US SSN field, and I
think that following this it will:
1. Validate the number of digits (9).
2. Validate that the number is not one which is known to be
permanently invalid (there aren't many of these and they're easy to
test for).
And leave it at that. Sound good to everyone?
Also, @Malcolm: in theory, a US SSN validator could tell which
state/territory you were in when you applied for the number (which in
most cases will be the state/territory in which you were born), but I
don't think it needs to go that far; we'll let the Nordic SSN arms
race continue unchallenged ;)
Sounds good. You've probably already thought of this, but it should
accept numbers with or without hyphens, and normalize it to the number
*with* hyphens.
+1
> Sounds good. You've probably already thought of this, but it should
> accept numbers with or without hyphens, and normalize it to the number
> *with* hyphens.
+1
I noticed that the UK postal code will raise a validation error if you
don't type a space separating the two character groups. Brazilian
phone numbers are similarly affected. IMHO, this is something that the
cleaner should fix, not raise as an error. We should be liberal in
what we accept, conservative in what we save, and all that jazz.
Two other quick localflavor related things -
1) Is there any particular reason that the Brazilian validation
messages are in Portuguese, rather than i18n wrapped English?
2) Is there any reason not to normalize localflavour.usa to
localflavor.us (to match the 2 letter country code scheme used by
other flavors)?
Yours,
Russ Magee %-)
Good call -- those should be in English, I would think.
> 2) Is there any reason not to normalize localflavour.usa to
> localflavor.us (to match the 2 letter country code scheme used by
> other flavors)?
None of this stuff is documented yet, and it's still in flux, so I
think it's fine to rename "usa" to "us" in localflavor. I hadn't
expected such an explosion of, well, great local flavor when I first
created the package. :)
Ha. Uploaded a patch right before I saw this; it's late and I didn't
think of it, but that's easy enough to do.
Agreed.
>
> Two other quick localflavor related things -
>
> 1) Is there any particular reason that the Brazilian validation
> messages are in Portuguese, rather than i18n wrapped English?
Whoops, my bad. :-( I'd noticed that originally and then forgot about it
when I was going through things to commit. We should fix it. I'll get in
touch with the contributor.
> 2) Is there any reason not to normalize localflavour.usa to
> localflavor.us (to match the 2 letter country code scheme used by
> other flavors)?
One other thing we'll want to keep any eye on is how much duplication of
code goes in there. Up to now, I haven't noticed any great opportunity
for factoring out common pieces that will save space, but it might come
up in the future.
Regards,
Malcolm
One other thing we'll want to keep any eye on is how much duplication of
code goes in there. Up to now, I haven't noticed any great opportunity
for factoring out common pieces that will save space, but it might come
up in the future.
> I'm going to contribute some of the South African localflavour that I
> have around. South African ID numbers can be checked with a Luhn
> checksum ("the mod10 algorithm"), and I was wondering how that can
> best be packaged.
>
> Should it just live in the forms.py for now?
Try this: create a django/utils/checksums.py file and put it in there.
It may be useful in more than just form validation (otherwise it could
go in newforms/utils.py). I suspect we might see some more weighted
checksums in the future, too (such as the Norwegian social security
number) and we can extract out the general algorithm into that file at
some point.
Regards,
Malcolm
> Can you elaborate on the logic behind this request? These are meant to
> validate the fields, right? So you are asking for validation that
> doesn't validate.
Well, I need this for the Serbian JMBG validation (something similar to SSN)
because there are numbers which are invalid when validated and still in use
(strange, don't ask me why ;)).
--
Nebojša Đorđević - nesh, ICQ#43799892, http://www.linkedin.com/in/neshdj
Studio Quattro - Niš - Serbia
http://studioquattro.biz/ | http://code.google.com/p/django-utils/
Registered Linux User 282159 [http://counter.li.org]
You could never use the truly strict validation in this case. Remember,
these are things designed to be used on a website. So imagine you are
constructing a website that accepts JMBG entries. You have to accept all
in-use numbers (which I would argue are, by definition, valid). So a
validator that only used some particular algorithm and rejected certain
legally in-use numbers is not a validator at all, since it generates
false negatives. My point is that, in this case, there aren't two
possible settingsi, there is only one -- the other one doesn't accept
the right numbers.
Regards,
Malcolm
> You could never use the truly strict validation in this case. Remember,
> these are things designed to be used on a website. So imagine you are
> constructing a website that accepts JMBG entries. You have to accept all
> in-use numbers (which I would argue are, by definition, valid). So a
Well this numbers are valid, problem is that lot's of them are manually
created so some of them have incorrect checksum fields. :)
> validator that only used some particular algorithm and rejected certain
> legally in-use numbers is not a validator at all, since it generates
> false negatives. My point is that, in this case, there aren't two
> possible settingsi, there is only one -- the other one doesn't accept
> the right numbers.
(it was a quick response, while I worked on validator it seemed that strict
option is a good idea)
I agree, for the my case I must accept invalid data so strict=True/False is
a no-option. I planned just to do some basic checks (without checksum
calculations) and, maybe, show a warning when incorrect entry is detected.
OTOH, there are fields which can be validated in full so for them strict
validation is a must.