Value of tightening URLValidator/EmailValidator regular expressions?

564 views
Skip to first unread message

Tim Graham

unread,
Mar 14, 2016, 2:09:24 PM3/14/16
to Django developers (Contributions to Django itself)
On a pull request that proposes to tighten the validation of EmailValidator [0], Ned Batchelder questioned the usefulness of this:

"Can I respectfully suggest that continuing to tweak this complex regex to get asymptotically closer to perfection is not worth it? Especially to fix false positives. What real-world problem is happening because "gmail.-com" is accepted? "gmail.ccomm" is also accepted, but is just as useless as an email address."

Collin Anderson proposed:

"I think we should try to just match the standard html <input type="email"> validation. I'd imagine that most uses cases would want to match that. We might be able to use the regex verbatim from the standard itself:

https://html.spec.whatwg.org/multipage/forms.html#e-mail-state-(type=email)

If people want to allow things outside of that they could use a custom regex.
Though it gets more complicated when considering Unicode. Unicode needs to get normalized to ascii before running through the official regex."

(Of course, this may be somewhat backwards-incompatible.)

What are your thoughts on this? I don't mind putting a halt to enhancements to the validation as long as we can articulate a sensible policy in the documentation.

[0] https://github.com/django/django/pull/5612

James Bennett

unread,
Mar 14, 2016, 2:17:15 PM3/14/16
to django-d...@googlegroups.com
Personally I've long been in favor of drastically simplifying the email regex and essentially telling people that if they want to support triply-nested comments in a bang-path address they can write their own :)

Is there an actual compelling reason to not just pare it down to "word characters and/or some punctuation, followed by an @, followed by some more word characters and/or punctuation"?

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/eb04034e-ea07-489f-aaf9-a08a5d241c4b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Aymeric Augustin

unread,
Mar 14, 2016, 2:31:22 PM3/14/16
to django-d...@googlegroups.com
Indeed, for some reason, the URL and email validators get anywhere from 2 to 8 changes in every Django version, and there’s no end in sight. (I contributed to this. Sorry.)

Like James, I’m in favor of making the validation much more simple and documenting it. This seems better than perpetually modifying it at the risk of introducing regressions.

-- 
Aymeric.

Michael Manfre

unread,
Mar 14, 2016, 3:08:09 PM3/14/16
to django-d...@googlegroups.com
Simple is better. Anyone who needs/wants something more complex is not prevented by Django from doing so.

Regards,
Michael Manfre

Florian Apolloner

unread,
Mar 14, 2016, 3:34:40 PM3/14/16
to Django developers (Contributions to Django itself)


On Monday, March 14, 2016 at 8:08:09 PM UTC+1, Michael Manfre wrote:
Simple is better. Anyone who needs/wants something more complex is not prevented by Django from doing so.

+1 to that and what the rest said ;)

Josh Smeaton

unread,
Mar 14, 2016, 7:29:42 PM3/14/16
to Django developers (Contributions to Django itself)
+1. I don't think we need strict email validation. "looks vaguely like an email address" is enough for validation purposes in forms. Are there any security concerns we need to be aware of though?

Florian Apolloner

unread,
Mar 14, 2016, 7:39:58 PM3/14/16
to Django developers (Contributions to Django itself)


On Tuesday, March 15, 2016 at 12:29:42 AM UTC+1, Josh Smeaton wrote:
+1. I don't think we need strict email validation. "looks vaguely like an email address" is enough for validation purposes in forms. Are there any security concerns we need to be aware of though?

Absolutely, the crazier the regex, the more likely the chance of catastrophic backtracing. And emails are usually confirmed by sending an email anyways, everything else is not useful anyways -- validation is ment to prevent (some) stupid typos imo, not more.

Markus Holtermann

unread,
Mar 14, 2016, 10:52:12 PM3/14/16
to django-d...@googlegroups.com
On Mon, Mar 14, 2016 at 12:34:40PM -0700, Florian Apolloner wrote:
>
>
>On Monday, March 14, 2016 at 8:08:09 PM UTC+1, Michael Manfre wrote:
>>
>> Simple is better. Anyone who needs/wants something more complex is not
>> prevented by Django from doing so.
>>
>
>+1 to that and what the rest said ;)

+1

As mentioned already on the PR, I'd go with the HTML5 validator. That
way we at least don't invent another standard.

WRT the backwards compatibility issues:

1) You might be able to submit an email address that passes the new
regex but not the old one. --> not an issue from my perspective

2) You're not able to submit an email address that does not pass the new
validator but the old one. --> Unlikely, but when the new field is of
type="email" your rather modern browser will tell you before Django
anyway

/Markus

>
>>
>> Regards,
>> Michael Manfre
>>
>> On Mon, Mar 14, 2016 at 2:31 PM, Aymeric Augustin <
>> aymeric....@polytechnique.org <javascript:>> wrote:
>>
>>> Indeed, for some reason, the URL and email validators get anywhere from 2
>>> to 8 changes in every Django version, and there’s no end in sight. (I
>>> contributed to this. Sorry.)
>>>
>>> Like James, I’m in favor of making the validation much more simple and
>>> documenting it. This seems better than perpetually modifying it at the risk
>>> of introducing regressions.
>>>
>>> --
>>> Aymeric.
>>>
>>> On 14 Mar 2016, at 19:17, James Bennett <ubern...@gmail.com <javascript:>>
>>>> an email to django-develop...@googlegroups.com <javascript:>.
>>>> To post to this group, send email to django-d...@googlegroups.com
>>>> <javascript:>.
>>>> <https://groups.google.com/d/msgid/django-developers/eb04034e-ea07-489f-aaf9-a08a5d241c4b%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Django developers (Contributions to Django itself)" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to django-develop...@googlegroups.com <javascript:>.
>>> To post to this group, send email to django-d...@googlegroups.com
>>> <javascript:>.
>>> Visit this group at https://groups.google.com/group/django-developers.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/django-developers/CAL13Cg8L6Gduwv4n%2BD68YqjEOmE1KWCKPPGnXjQr%2BR6a1HSSsA%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/django-developers/CAL13Cg8L6Gduwv4n%2BD68YqjEOmE1KWCKPPGnXjQr%2BR6a1HSSsA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Django developers (Contributions to Django itself)" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to django-develop...@googlegroups.com <javascript:>.
>>> To post to this group, send email to django-d...@googlegroups.com
>>> <javascript:>.
>>> Visit this group at https://groups.google.com/group/django-developers.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/django-developers/EB89DAC8-7D6F-4A9E-B3F2-476E6EE1F377%40polytechnique.org
>>> <https://groups.google.com/d/msgid/django-developers/EB89DAC8-7D6F-4A9E-B3F2-476E6EE1F377%40polytechnique.org?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>
>--
>You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
>To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
>To post to this group, send email to django-d...@googlegroups.com.
>Visit this group at https://groups.google.com/group/django-developers.
>To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/c6d9ac2a-bb67-4a41-93bc-a1220d14e1c2%40googlegroups.com.
signature.asc

Aymeric Augustin

unread,
Mar 15, 2016, 4:23:07 AM3/15/16
to django-d...@googlegroups.com
> On 15 Mar 2016, at 03:51, Markus Holtermann <in...@markusholtermann.eu> wrote:
>
> I'd go with the HTML5 validator.

Indeed, it would be a good idea to align the behavior of <input type=“email”>
and Django’s validation. Currently a@b passes the former but not the latter.

--
Aymeric.

Joakim Saario

unread,
Mar 15, 2016, 12:42:17 PM3/15/16
to Django developers (Contributions to Django itself)
Is there a reason for having a backend validation at all?

There is no reliable way to validate an email-address without actually sending a message to it.

In my opinion EmailField should use a widget that uses `type="email"` to trigger frontend validation.
It may also set a max length for the database column. Nothing more.

Kevin Grinberg

unread,
Mar 15, 2016, 2:36:10 PM3/15/16
to Django developers (Contributions to Django itself)
Validation that doesn't rely on browser behavior is useful, if only for the (admittedly shrinking, but still non-zero) population of folks using older browsers. Also API clients and so forth.

Very much agreed that it should match the HTML5 spec, though - fewer edge cases and more predictable behavior.

Shai Berger

unread,
Mar 26, 2016, 5:25:04 AM3/26/16
to django-d...@googlegroups.com
On Tuesday 15 March 2016 04:51:50 Markus Holtermann wrote:

>
> WRT the backwards compatibility issues:
>
> 2) You're not able to submit an email address that does not pass the new
> validator but the old one. --> Unlikely, but when the new field is of
> type="email" your rather modern browser will tell you before Django
> anyway
>

While this is correct, in many of your rather modern web applications Django
does not own the front-end, and for web services, a relevant front end doesn't
even necessarily exist.

So the backward-incompatibility is not horrible, but it exists and needs to be
mitigated by a deprecation cycle.

Shai

Tim Graham

unread,
Mar 30, 2016, 8:45:23 AM3/30/16
to Django developers (Contributions to Django itself)
How did you imagine the deprecation cycle working? Do you want Django to raise a warning saying that the regular expression is changing and provide a temporary setting or something to opt-in to the simpler validation?

Shai Berger

unread,
Mar 30, 2016, 6:53:25 PM3/30/16
to django-d...@googlegroups.com
On Wednesday 30 March 2016 15:45:23 Tim Graham wrote:
> How did you imagine the deprecation cycle working? Do you want Django to
> raise a warning saying that the regular expression is changing and provide
> a temporary setting or something to opt-in to the simpler validation?
>

Yes, that's one option; another is to define a HTML5EmailField (or a better
name) that uses the simpler validation, and warn that EmailField is going to
turn into an alias of that -- so that each EmailField in the project is
handled separately.

Also, I'm not sure it's possible to have the warning produced by a check; but
if it is, that would be preferable.

Shai.

Florian Apolloner

unread,
Mar 30, 2016, 7:03:27 PM3/30/16
to Django developers (Contributions to Django itself)
Having a new field seems overkill to me -- a new validation routine which is less strict is something we should be able to do without backward compat considerations. The reasoning for this is easy: As long as it is not proven that the current regex covers only valid addresses a less strict validation is not harming anyone. Especially since even if an email address is technically valid, it does not mean that is actually exists -- so you will have to send an email to verify the address anyways…

Shai Berger

unread,
Mar 30, 2016, 7:22:11 PM3/30/16
to django-d...@googlegroups.com
On Thursday 31 March 2016 02:03:26 Florian Apolloner wrote:
> Having a new field seems overkill to me -- a new validation routine which
> is less strict is something we should be able to do without backward compat
> considerations.

Strictly speaking, the new method is not less strict. It does forbid things
the current validation lets through (mostly wrt unicode domains, IIRC).

> The reasoning for this is easy: As long as it is not proven
> that the current regex covers only valid addresses a less strict validation
> is not harming anyone. Especially since even if an email address is
> technically valid, it does not mean that is actually exists -- so you will
> have to send an email to verify the address anyways…
>

But we could, considering this, just call it a "backwards incompatible
change".

Shai.

Zach Borboa

unread,
Apr 1, 2016, 2:07:38 PM4/1/16
to Django developers (Contributions to Django itself)
-1 on less strict validation. Saying we need less strict validation because emails are usually confirmed by sending an email to it, is akin to saying urls are only valid if the url can be fetched. "Looks vaguely like a url" would not be enough for validation purposes. I believe we should strive to keep a reasonably strict and correct email validator.

Josh Smeaton

unread,
Apr 2, 2016, 2:44:55 AM4/2/16
to Django developers (Contributions to Django itself)
For what reason Zach? Without a canonical regex implementation to copy or include, we're stuck poorly reimplementing a bunch of esoteric rules to what end? The main purpose of email validation is to provide relevant feedback to the user, and to guard against obviously bad or malicious data. "Looks vaguely like an email address" is probably too loose to be useful, I admit. The proposal to copy the regex from the html5 email input widget seems like a fine compromise to me.

We should also err on the side of allowing incorrect addresses rather than rejecting correct addresses. I'd much rather have bad signups that need to be done again rather than users that can't sign up with their valid addresses. 

Shai Berger

unread,
Apr 2, 2016, 3:05:14 AM4/2/16
to django-d...@googlegroups.com
On Saturday 02 April 2016 09:44:54 Josh Smeaton wrote:
> For what reason Zach?

There is only one reason for which a strict and accurate validation is
required, as far as I can see, and that is if your application is not just
using existing email addresses (i.e. sending mail to users) but actually
manages them (i.e. creates mail addresses).

Such applications are few and far between...

> Without a canonical regex implementation to copy or
> include, we're stuck poorly reimplementing a bunch of esoteric rules to
> what end? The main purpose of email validation is to provide relevant
> feedback to the user, and to guard against obviously bad or malicious data.
> "Looks vaguely like an email address" is probably too loose to be useful, I
> admit. The proposal to copy the regex from the html5 email input widget
> seems like a fine compromise to me.
>
> We should also err on the side of allowing incorrect addresses rather than
> rejecting correct addresses. I'd much rather have bad signups that need to
> be done again rather than users that can't sign up with their valid
> addresses.
>

...and their needs should not imply a high burden of maintenance on the rest
of the community; they can and should implement their own validation.

+1 everything Josh said.

Shai.

Tim Graham

unread,
Apr 5, 2016, 1:41:14 PM4/5/16
to Django developers (Contributions to Django itself)
Any thoughts about whether or not to make similar simplifications to URLValidator? There's an old ticket to add a DomainNameValidator [0] which may or may not be worth moving forward with based on the decision.

[0] https://code.djangoproject.com/ticket/18119

Collin Anderson

unread,
May 31, 2017, 3:28:32 PM5/31/17
to Django developers (Contributions to Django itself)
Hi All,

There's a PR [0] to make validation match HTML. Though there's a question about what to do with domain_whitelist.

Here's the background:
- Originally Django didn't allow any dotless (non-FQDN) domain names.

- People wanted to use "localhost", but the SMTP spec said "Local nicknames or unqualified names MUST NOT be
   used." So domain_whitelist was added to allow more user-specified domains. https://code.djangoproject.com/ticket/4833

- We're proposing to change the behavior of email validation to allow a lot more email addresses including all dotless domains, so you don't need to specify specific domains to allow.

- Though maybe it still makes sense to deny dotless domain names.

I think the options currently are:
1 - Immediately remove domain_whitelist so people get a hard error. It's probably not used in third-party apps where multiple Django version support might be important.

2 - Deprecate domain_whitelist and ignore with a warning.

3 - Keep restricting dotless domains unless they are in domain_whitelist.

Collin

Claude Paroz

unread,
Jun 1, 2017, 3:50:37 AM6/1/17
to Django developers (Contributions to Django itself)
As for me, I still think the current validator is valid for 99% of use cases. And 99% of the time, an email address with dot-less domain is a user input error.

So I would prefer fixing #25594 (validator propagation from db field to form field), adding a "looser" validator in validators.py and better documenting usage of alternate validators for EmailFields.
But I won't block the boat if I'm in the minority!

Claude

Aymeric Augustin

unread,
Jun 1, 2017, 5:27:10 PM6/1/17
to django-d...@googlegroups.com
I agree with Claude.

-- 
Aymeric.



--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.

Tim Graham

unread,
Jun 2, 2017, 11:19:23 AM6/2/17
to Django developers (Contributions to Django itself)
Aymeric, did anything specific change your mind from your March 2016 mail:


"Indeed, for some reason, the URL and email validators get anywhere from 2 to 8 changes in every Django version, and there’s no end in sight. (I contributed to this. Sorry.) Like James, I’m in favor of making the validation much more simple and documenting it. This seems better than perpetually modifying it at the risk of introducing regressions."

How should we make a determination about future Email/URLValidator changes? Put a halt to them completely? I've closed a few tickets about EmailValidator (e.g. [1]) as wontfix under the assumption that the regex will be simplified.

Aymeric Augustin

unread,
Jun 3, 2017, 6:38:48 AM6/3/17
to django-d...@googlegroups.com
Hello Tim,

I got confused and didn't realize Claude was arguing against moving to the HTML validation rules. Oops.

I'm still +0 on copying HTML validation rules strictly so that <input type="email"> and EmailField behave identically by default. (+0 rather than +1 because I'm mostly care about ending this debate.)

The part I found really interesting in Claude's proposal is the ability to customize validation rules. It looks like we have a reasonable plan. Once that's done I don't care very much about the default rules; they'll be good enough and their definition was already well into bikeshedding territory before I started contributing to Django.

Best regards,

-- 
Aymeric.



Reply all
Reply to author
Forward
0 new messages