Djangos EmailValidator accepts that, but should not:
{{{#!python
from django.core import validators
validators.validate_email('test@exampleexample.com')
}}}
Pythons formataddr does not accept it:
{{{#!python
from email.utils import formataddr
formataddr(('','test@exampleexample.com'))
}}}
{{{
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.6/email/utils.py", line 91, in formataddr
address.encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode character '\xad' in
position 12: ordinal not in range(128)
}}}
Djangos EmailValidator should not accept soft hyphens.
--
Ticket URL: <https://code.djangoproject.com/ticket/31053>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* status: new => closed
* cc: Joachim Jablon (added)
* component: Uncategorized => Core (Mail)
* version: 2.2 => master
* resolution: => needsinfo
Comment:
I'm not sure about this, `email.headerregistry.parser.get_mailbox()`
doesn't raise any exception on soft hyphens. Can you share link to the RFC
which forbids such characters in domains?
Joachim, What do you think?
--
Ticket URL: <https://code.djangoproject.com/ticket/31053#comment:1>
Comment (by Mogoh Viol):
RFC 1035 2.3.1 Says (https://tools.ietf.org/html/rfc1035#section-2.3.1):
> They must start with a letter, end with a letter or digit, and have as
interior characters only letters, digits, and hyphen.
Of course, there are by now some internationalized domain names
(https://en.wikipedia.org/wiki/Internationalized_domain_name) for non-
ascii characters.
I do not know those specifications in detail.
What I know is, that non-ascii characters are encoded in ascii using
punycode.
But if special characters accepted, then other special characters like
"äöü" should also be accepted.
If "äöü" are not allowed, soft hyphens should also be forbidden.
--
Ticket URL: <https://code.djangoproject.com/ticket/31053#comment:2>
Comment (by felixxm):
Yes, non-ASCII domains are supported (see
[https://github.com/django/django/blob/3b347a8a00273e9cc2fd9d4a5c61569c08398769/tests/mail/tests.py#L741-L760
tests]).
--
Ticket URL: <https://code.djangoproject.com/ticket/31053#comment:3>
Comment (by Mogoh Viol):
Ok, I have made a mistake, but I am still not a 100% convinced.
Indeed the EmailValidator accepts non-ascii domains.
It does not accept, non-ascii local-parts as in the example below.
{{{
In [2]: from django.core import validators
...: validators.validate_email('to@éxample.com')
...: validators.validate_email('tó@example.com')
---------------------------------------------------------------------------
ValidationError Traceback (most recent call
last)
<ipython-input-2-1da70ef004db> in <module>
1 from django.core import validators
2 validators.validate_email('to@éxample.com')
----> 3 validators.validate_email('tó@example.com')
~/.local/share/virtualenvs/website-10TxyhRr/lib/python3.6/site-
packages/django/core/validators.py in __call__(self, value)
194
195 if not self.user_regex.match(user_part):
--> 196 raise ValidationError(self.message, code=self.code)
197
198 if (domain_part not in self.domain_whitelist and
ValidationError: ['Bitte gültige E-Mail-Adresse eingeben.']
}}}
'''However, this is a different issue (if this is an issue at all).'''
The questions remains: Is a domain containing a soft hyphen a valid
domain?
I guess not, but I honestly don't know.
I think, it is really complicated, to test for a valid domain, including
only allowed unicode characters.
So I understand, that we only make a simple "sanity test" and, in case of
doubt, allow more invalid email-addresses.
If know one else thinks, filtering out emails with soft hyphens is a good
idea, we can leave the bug closed.
--
Ticket URL: <https://code.djangoproject.com/ticket/31053#comment:4>
Comment (by Joachim Jablon):
Ok, gonna do my best from a phone.
If I recall correctly, the idea is that as much as possible, emails that
pass the validator should be properly processed.
Given that it’s fairly easy to split the local and domain parts (the last
@ sign is the separator), then it’s feasible to blindly apply punycode if
the domain contains non-ascii characters, which is done in the code. The
same cannot be done for local part.
For the local part, it’s a bit complicated and there are some things to
take into account:
- on the validator side, special chars are accepted if the local part is
enclosed between double quotes: "kéké"@example.com
- The algorithm is a bit different if the validation part and in the
sending part because validation only accepts emails whereas email sending
accepts boths emails and mailboxes (Your Name <youra...@example.com>)
--
Ticket URL: <https://code.djangoproject.com/ticket/31053#comment:5>
* resolution: needsinfo => invalid
Comment:
Thanks Joachim.
--
Ticket URL: <https://code.djangoproject.com/ticket/31053#comment:6>
Comment (by Mogoh Viol):
Thanks for explaining.
--
Ticket URL: <https://code.djangoproject.com/ticket/31053#comment:7>