Particularly, the `validate_domain_part` allows domains with a hyphen
character in the TLD:
{{{
from django.core.validators import validate_email
validate_email.validate_domain_part('gmail.-com')
True
}}}
Nearly all other special characters return correctly:
{{{
from django.core.validators import validate_email
validate_email.validate_domain_part('gmail._com')
False
}}}
Unless my knowledge of valid TLDs is wrong, I don't think this is correct
:(
--
Ticket URL: <https://code.djangoproject.com/ticket/25452>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* cc: cmawebsite@… (added)
* needs_better_patch: => 0
* needs_tests: => 0
* easy: 1 => 0
* needs_docs: => 0
* stage: Unreviewed => Accepted
Comment:
Confirmed on 1.8 and 1.9. Chrome's email validation rejects this, so I
assume this is unintentional.
--
Ticket URL: <https://code.djangoproject.com/ticket/25452#comment:1>
Comment (by phalt):
We've been investigating this more and it appears that hyphens can be in
TLDs, just not at the start of the beginning:
> Domain names may be formed from the set of alphanumeric ASCII characters
(a-z, A-Z, 0-9), but characters are case-insensitive. In addition the
hyphen is permitted if it is surrounded by characters, digits or hyphens,
although it is not to start or end a label.
--
Ticket URL: <https://code.djangoproject.com/ticket/25452#comment:2>
Comment (by timgraham):
Please check `URLValidator` to see if it handles this (if so, maybe you
could borrow from it) or if it requires a similar fix.
--
Ticket URL: <https://code.djangoproject.com/ticket/25452#comment:3>
* status: new => assigned
* owner: nobody => bak1an
Comment:
According to the https://tools.ietf.org/html/rfc1035 domain labels can
contain hyphens but not as their first character.
[https://github.com/django/django/blob/71ebcb85b931f43865df5b322b2cf06d3da23f69/django/core/validators.py#L160
EmailValidator.domain_regex] checks this for all labels but the last one
(the TLD) and it looks like a bug to me (not something that was done on
intention).
[https://github.com/django/django/blob/71ebcb85b931f43865df5b322b2cf06d3da23f69/django/core/validators.py#L89
URLValidator] uses more complex domain validation regex set (including
unicode, etc).
I will double check if borrowing those checks into {{{EmailValidator}}}
won't violate any standards and come back with a patch in case it's ok.
--
Ticket URL: <https://code.djangoproject.com/ticket/25452#comment:4>
* cc: dheeru.rathor14@… (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/25452#comment:5>
* has_patch: 0 => 1
Comment:
[https://github.com/django/django/pull/5612 The pull request]
Unfortunately I have not found a good way to merge URLValidator and
EmailValidator since there are tons of small differences.
So I decided to fix EmailValidator instead. I tried to be as accurate as
possible (actual regex changes are just few characters long).
During reading various RFCs and articles I've found some other easy-
fixable issues (like allowing quoted '@', space or backslash for dot atom
local part, allowing spaces inside quoted local part, etc) those are
included in above PR as well.
Proper RFC-based email validation is a
[http://haacked.com/archive/2007/08/21/i-knew-how-to-validate-an-email-
address-until-i.aspx/ surprisingly hard task] and definitely no one wants
to have [http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html few pages
long regex] in Django.
So updated validator is still not fully RFC compliant but few issues are
fixed now (or covered with test cases).
Commit messages includes detailed description of all changes that were
made.
--
Ticket URL: <https://code.djangoproject.com/ticket/25452#comment:6>
Comment (by nedbatchelder):
Can I respectfully suggest that continuing to tweak this complex regex to
get asymptotically closer to perfection is not worth it? Especially to
fix false positives. What real-world problem is happening because
"gmail.-com" is accepted? "gmail.ccomm" is also accepted, but is just as
useless as an email address.
--
Ticket URL: <https://code.djangoproject.com/ticket/25452#comment:7>
Comment (by timgraham):
I am open to that if you can get consensus on the DevelopersMailingList on
a set of limitations that we can document so that we have something to
point to when we get requests for enhancements. I imagine this policy
would also include `URLValidator`.
--
Ticket URL: <https://code.djangoproject.com/ticket/25452#comment:8>
Comment (by collinanderson):
I think we should try to match the standard html <input type="email">
validation. I'd imagine that most uses cases would want to match that. We
use the regex verbatim from the standard itself:
https://html.spec.whatwg.org/multipage/forms.html#e-mail-
state-(type=email)
If people want to allow things outside of that they could use a custom
regex.
--
Ticket URL: <https://code.djangoproject.com/ticket/25452#comment:9>
Comment (by collinanderson):
Though it gets more complicated when considering Unicode. Unicode needs to
get normalized to ascii before running through the official regex.
Here's how chrome does it:
https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/core/html/forms/EmailInputType.cpp
--
Ticket URL: <https://code.djangoproject.com/ticket/25452#comment:10>
Comment (by bak1an):
How about resolving this issue with the
[https://github.com/django/django/compare/master...bak1an:ticket_25452_minimal
smallest possible change ] and moving future validation regex improvement
discussion into separate ticket? Would this be legit?
--
Ticket URL: <https://code.djangoproject.com/ticket/25452#comment:11>
Comment (by nedbatchelder):
@bak1an: Can you explain why it's important to reject "gmail.-com"? Why
add *any* more complexity just to reject more bogus email addresses?
--
Ticket URL: <https://code.djangoproject.com/ticket/25452#comment:12>
* needs_better_patch: 0 => 1
Comment:
I started a [https://groups.google.com/d/topic/django-
developers/ASBJ0ge2KYo/discussion thread on django-developers] to find a
way forward.
--
Ticket URL: <https://code.djangoproject.com/ticket/25452#comment:13>
* status: assigned => closed
* resolution: => wontfix
Comment:
The consensus on the mailing list seems to be to simplify the validation,
not make it more comprehensive.
--
Ticket URL: <https://code.djangoproject.com/ticket/25452#comment:14>
Comment (by claudep):
Tim, is there a ticket about the simplification, or should we create one?
--
Ticket URL: <https://code.djangoproject.com/ticket/25452#comment:15>
Comment (by timgraham):
Ticket for simplifying the validation is #26423.
--
Ticket URL: <https://code.djangoproject.com/ticket/25452#comment:16>