As many people know the whole subject of validating email is really difficult. There are those who like a rule that insists on being as general as possible including no @ and also the possibility of @localhost and those who like a rule that totally focuses on more or less the majority of email addresses that would have a full domain name including a tld. You can even find validation schemes that run look ups against actual registered domains. Assuming we aren't doing that, we have a problem with our current rule which is that it allows a maximum of 4 characters in the tld and with the many new tlds being added this is not appropriate any more. We could just make 4 a higher arbitrary number and it would solve the immediate problem. You could also say it is a problem that it requires a tld so @localhost does not validate nor does an address without an @ because currently we have:
I'm wondering if we would want to consider switching the email validation to that.
Alternatively maybe we want to give the option of requiring an address that has a tld if we want the option of being more strict than the w3c which for many (perhaps most) use cases I think people would like to probably require a "normal" email.
I was wondering if people had any thoughts on this?
My thoughts are, why try to reinvent the wheel? Meaning, the w3c has a
published standard for it's email matching regex (which you referenced) so
why not just use that?
I would agree with you as well on adding the option to require a tld. How
to implement that would be a good question.
On Sun, Nov 11, 2012 at 12:44 PM, Elin Waring <elin.war...@gmail.com> wrote:
> As many people know the whole subject of validating email is really
> difficult. There are those who like a rule that insists on being as general
> as possible including no @ and also the possibility of @localhost and those
> who like a rule that totally focuses on more or less the majority of email
> addresses that would have a full domain name including a tld. You can even
> find validation schemes that run look ups against actual registered
> domains. Assuming we aren't doing that, we have a problem with our
> current rule which is that it allows a maximum of 4 characters in the tld
> and with the many new tlds being added this is not appropriate any more. We
> could just make 4 a higher arbitrary number and it would solve the
> immediate problem. You could also say it is a problem that it requires a
> tld so @localhost does not validate nor does an address without an @
> because currently we have:
> I'm wondering if we would want to consider switching the email validation
> to that.
> Alternatively maybe we want to give the option of requiring an address
> that has a tld if we want the option of being more strict than the w3c
> which for many (perhaps most) use cases I think people would like to
> probably require a "normal" email.
> I was wondering if people had any thoughts on this?
On Sunday, November 11, 2012 2:05:56 PM UTC-5, Donald Gilbert wrote:
> My thoughts are, why try to reinvent the wheel? Meaning, the w3c has a > published standard for it's email matching regex (which you referenced) so > why not just use that?
> I would agree with you as well on adding the option to require a tld. How > to implement that would be a good question.
> On Sun, Nov 11, 2012 at 12:44 PM, Elin Waring <elin....@gmail.com<javascript:>
> > wrote:
>> As many people know the whole subject of validating email is really >> difficult. There are those who like a rule that insists on being as general >> as possible including no @ and also the possibility of @localhost and those >> who like a rule that totally focuses on more or less the majority of email >> addresses that would have a full domain name including a tld. You can even >> find validation schemes that run look ups against actual registered >> domains. Assuming we aren't doing that, we have a problem with our >> current rule which is that it allows a maximum of 4 characters in the tld >> and with the many new tlds being added this is not appropriate any more. We >> could just make 4 a higher arbitrary number and it would solve the >> immediate problem. You could also say it is a problem that it requires a >> tld so @localhost does not validate nor does an address without an @ >> because currently we have:
>> I'm wondering if we would want to consider switching the email validation >> to that.
>> Alternatively maybe we want to give the option of requiring an address >> that has a tld if we want the option of being more strict than the w3c >> which for many (perhaps most) use cases I think people would like to >> probably require a "normal" email.
>> I was wondering if people had any thoughts on this?
For development I often use @localhost or @box email addresses. At one
point I had to hack validation on one site I was working on to get it
to work properly with what I'm testing. So I object to putting
arbitrary limitations on the domain.
On Sun, Nov 11, 2012 at 11:05 AM, Donald Gilbert <dilbert4l...@gmail.com> wrote:
> My thoughts are, why try to reinvent the wheel? Meaning, the w3c has a
> published standard for it's email matching regex (which you referenced) so
> why not just use that?
> I would agree with you as well on adding the option to require a tld. How to
> implement that would be a good question.
> On Sun, Nov 11, 2012 at 12:44 PM, Elin Waring <elin.war...@gmail.com> wrote:
>> As many people know the whole subject of validating email is really
>> difficult. There are those who like a rule that insists on being as general
>> as possible including no @ and also the possibility of @localhost and those
>> who like a rule that totally focuses on more or less the majority of email
>> addresses that would have a full domain name including a tld. You can even
>> find validation schemes that run look ups against actual registered
>> domains. Assuming we aren't doing that, we have a problem with our current
>> rule which is that it allows a maximum of 4 characters in the tld and with
>> the many new tlds being added this is not appropriate any more. We could
>> just make 4 a higher arbitrary number and it would solve the immediate
>> problem. You could also say it is a problem that it requires a tld so
>> @localhost does not validate nor does an address without an @ because
>> currently we have:
>> I'm wondering if we would want to consider switching the email validation
>> to that.
>> Alternatively maybe we want to give the option of requiring an address
>> that has a tld if we want the option of being more strict than the w3c which
>> for many (perhaps most) use cases I think people would like to probably
>> require a "normal" email.
>> I was wondering if people had any thoughts on this?
I'm tempted to say you should make a custom rule if you don't want to use the w3c standards but then I usually take the position that it's good to be useful and not force people to write custom rules and fields for common use cases so I'm a bit torn on this. I guess it would be possible to have two regexes available. But that still leaves the JS in the field (and I know Louis said that the platform is getting rid of all its js but that hasn't happened at this point). Right now putting type="email" and validate="email" gives you the same regex.
On Sunday, November 11, 2012 3:03:15 PM UTC-5, Samuel Moffatt wrote:
> For development I often use @localhost or @box email addresses. At one > point I had to hack validation on one site I was working on to get it > to work properly with what I'm testing. So I object to putting > arbitrary limitations on the domain.
> On Sun, Nov 11, 2012 at 11:05 AM, Donald Gilbert <dilber...@gmail.com<javascript:>> > wrote: > > My thoughts are, why try to reinvent the wheel? Meaning, the w3c has a > > published standard for it's email matching regex (which you referenced) > so > > why not just use that?
> > I would agree with you as well on adding the option to require a tld. > How to > > implement that would be a good question.
> > On Sun, Nov 11, 2012 at 12:44 PM, Elin Waring <elin....@gmail.com<javascript:>> > wrote:
> >> As many people know the whole subject of validating email is really > >> difficult. There are those who like a rule that insists on being as > general > >> as possible including no @ and also the possibility of @localhost and > those > >> who like a rule that totally focuses on more or less the majority of > email > >> addresses that would have a full domain name including a tld. You can > even > >> find validation schemes that run look ups against actual registered > >> domains. Assuming we aren't doing that, we have a problem with our > current > >> rule which is that it allows a maximum of 4 characters in the tld and > with > >> the many new tlds being added this is not appropriate any more. We > could > >> just make 4 a higher arbitrary number and it would solve the immediate > >> problem. You could also say it is a problem that it requires a tld so > >> @localhost does not validate nor does an address without an @ because > >> currently we have:
> >> I'm wondering if we would want to consider switching the email > validation > >> to that.
> >> Alternatively maybe we want to give the option of requiring an address > >> that has a tld if we want the option of being more strict than the w3c > which > >> for many (perhaps most) use cases I think people would like to probably > >> require a "normal" email.
> >> I was wondering if people had any thoughts on this?
Just a clarification of the differences for who doesn't read regexps. I'll call the regular expression now used in Joomla J and the W3c-expression W:
- in W the part in front of the @ may contain the characters !#$%&’*+/=?^`{|} and ~. J is too strict here, concerning RFC 5322 defined 'atext' as the standard.
- in J a domainname and a tld can contain an underscore _ which is not correct following RFC 1123
- in J there must be at least one tld (the last "extension") of length between 2 and 4 characters; as mentioned, such a tld is not necessary in local environments (and the length is not correct anymore).
Both expressions need a @ in an email address. W is a little bit more precise, following the standards. But on the internet it could be handy to check for a tld. That option can be implemented by changing the last * in W into a +.
On Sunday, 11 November 2012 21:20:17 UTC, Herman Peeren wrote:
> - in J there must be at least one tld (the last "extension") of length > between 2 and 4 characters; as mentioned, such a tld is not necessary in > local environments (and the length is not correct anymore).
> Not just in local environments is this a problem - with all the newly
approved TLD coming online in the near future the 2-4 characters is an issue
On Sunday, November 11, 2012 4:32:10 PM UTC-5, brian teeman wrote:
> On Sunday, 11 November 2012 21:20:17 UTC, Herman Peeren wrote:
>> - in J there must be at least one tld (the last "extension") of >> length between 2 and 4 characters; as mentioned, such a tld is not >> necessary in local environments (and the length is not correct anymore).
>> Not just in local environments is this a problem - with all the newly > approved TLD coming online in the near future the 2-4 characters is an > issue
On Sunday, 11 November 2012 22:32:10 UTC+1, brian teeman wrote: > Not just in local environments is this a problem - with all the newly > approved TLD coming online in the near future the 2-4 characters is an > issue
Correct. I meant: a tld is not necessary in local environments. But is handy on the internet. So, that is about the only thing that could be made optional in this expression: in a production-environment on the internet you could make a tld an obligation (by changing the last * into a +).
On Sunday, November 11, 2012 4:42:43 PM UTC-5, Herman Peeren wrote:
> On Sunday, 11 November 2012 22:32:10 UTC+1, brian teeman wrote:
>> Not just in local environments is this a problem - with all the newly >> approved TLD coming online in the near future the 2-4 characters is an >> issue
> Correct. I meant: a tld is not necessary in local environments. But is > handy on the internet. So, that is about the only thing that could be made > optional in this expression: in a production-environment on the internet > you could make a tld an obligation (by changing the last * into a +).
I know this doesn't necessarily address the problem directly, but in a
recent conference the point was made several times that regex's, while
powerful, are notorious for being unmaintainable. By that I mean can you
look at that email regex and tell me immediately what the designer of the
regex intended? Maybe you can tell me what the regex does, but in the
event there is a bug, it's very difficult to determine what the correct
behaviour was supposed to be.
The point is, there is a school of thought to move away from regex's and go
back to, shock horror, writing code to parse strings (developers, here
this: it's ok to write code). So in trying to fix email validation, I
would lean towards ditching the regex altogether and going back to
step-wise PHP/JS. I would also have no problem in adding rule support by
composition. It's a valid use case that you may want some sort of black or
white list approach to validation.
I personally am way happier writing the code parts of rules as you could probably guess from the ones I have written but rewriting JFormRule was not particularly my agenda this week :P (dealing with the travel tld is what started me looking at it).
On Sunday, November 11, 2012 5:00:57 PM UTC-5, Andrew Eddie wrote:
> I know this doesn't necessarily address the problem directly, but in a > recent conference the point was made several times that regex's, while > powerful, are notorious for being unmaintainable. By that I mean can you > look at that email regex and tell me immediately what the designer of the > regex intended? Maybe you can tell me what the regex does, but in the > event there is a bug, it's very difficult to determine what the correct > behaviour was supposed to be.
> The point is, there is a school of thought to move away from regex's and > go back to, shock horror, writing code to parse strings (developers, here > this: it's ok to write code). So in trying to fix email validation, I > would lean towards ditching the regex altogether and going back to > step-wise PHP/JS. I would also have no problem in adding rule support by > composition. It's a valid use case that you may want some sort of black or > white list approach to validation.
> That's fine in principle if you are throwing out JFormRule's basic
> structure which starts with that test method that is based on having a
> regex property.
> I personally am way happier writing the code parts of rules as you could
> probably guess from the ones I have written but rewriting JFormRule was not
> particularly my agenda this week :P (dealing with the travel tld is what
> started me looking at it).
> Elin
> On Sunday, November 11, 2012 5:00:57 PM UTC-5, Andrew Eddie wrote:
>> I know this doesn't necessarily address the problem directly, but in a
>> recent conference the point was made several times that regex's, while
>> powerful, are notorious for being unmaintainable. By that I mean can you
>> look at that email regex and tell me immediately what the designer of the
>> regex intended? Maybe you can tell me what the regex does, but in the
>> event there is a bug, it's very difficult to determine what the correct
>> behaviour was supposed to be.
>> The point is, there is a school of thought to move away from regex's and
>> go back to, shock horror, writing code to parse strings (developers, here
>> this: it's ok to write code). So in trying to fix email validation, I
>> would lean towards ditching the regex altogether and going back to
>> step-wise PHP/JS. I would also have no problem in adding rule support by
>> composition. It's a valid use case that you may want some sort of black or
>> white list approach to validation.
> On 12 November 2012 09:05, Elin Waring <elin....@gmail.com <javascript:>>wrote:
>> That's fine in principle if you are throwing out JFormRule's basic >> structure which starts with that test method that is based on having a >> regex property.
>> I personally am way happier writing the code parts of rules as you could >> probably guess from the ones I have written but rewriting JFormRule was not >> particularly my agenda this week :P (dealing with the travel tld is what >> started me looking at it).
>> Elin
>> On Sunday, November 11, 2012 5:00:57 PM UTC-5, Andrew Eddie wrote:
>>> I know this doesn't necessarily address the problem directly, but in a >>> recent conference the point was made several times that regex's, while >>> powerful, are notorious for being unmaintainable. By that I mean can you >>> look at that email regex and tell me immediately what the designer of the >>> regex intended? Maybe you can tell me what the regex does, but in the >>> event there is a bug, it's very difficult to determine what the correct >>> behaviour was supposed to be.
>>> The point is, there is a school of thought to move away from regex's and >>> go back to, shock horror, writing code to parse strings (developers, here >>> this: it's ok to write code). So in trying to fix email validation, I >>> would lean towards ditching the regex altogether and going back to >>> step-wise PHP/JS. I would also have no problem in adding rule support by >>> composition. It's a valid use case that you may want some sort of black or >>> white list approach to validation.
>> On 12 November 2012 09:05, Elin Waring <elin....@gmail.com> wrote:
>>> That's fine in principle if you are throwing out JFormRule's basic >>> structure which starts with that test method that is based on having a >>> regex property.
>>> I personally am way happier writing the code parts of rules as you could >>> probably guess from the ones I have written but rewriting JFormRule was not >>> particularly my agenda this week :P (dealing with the travel tld is what >>> started me looking at it).
>>> Elin
>>> On Sunday, November 11, 2012 5:00:57 PM UTC-5, Andrew Eddie wrote:
>>>> I know this doesn't necessarily address the problem directly, but in a >>>> recent conference the point was made several times that regex's, while >>>> powerful, are notorious for being unmaintainable. By that I mean can you >>>> look at that email regex and tell me immediately what the designer of the >>>> regex intended? Maybe you can tell me what the regex does, but in the >>>> event there is a bug, it's very difficult to determine what the correct >>>> behaviour was supposed to be.
>>>> The point is, there is a school of thought to move away from regex's >>>> and go back to, shock horror, writing code to parse strings (developers, >>>> here this: it's ok to write code). So in trying to fix email validation, I >>>> would lean towards ditching the regex altogether and going back to >>>> step-wise PHP/JS. I would also have no problem in adding rule support by >>>> composition. It's a valid use case that you may want some sort of black or >>>> white list approach to validation.
On Sun, Nov 11, 2012 at 9:36 PM, Elin Waring <elin.war...@gmail.com> wrote:
> I hate to do this to myself but we should also consider adding the
> multiple attribute since it is part of the standard.
> Elin
> On Sunday, November 11, 2012 9:09:24 PM UTC-5, Elin Waring wrote:
>> I know.
>> Also the W3C will take an IP address whereas our current one won't. That
>> is a plus I think.
>> Elin
>> On Sunday, November 11, 2012 6:31:39 PM UTC-5, Andrew Eddie wrote:
>>> I just shared that as a general principle - no more, no less.
>>>> I personally am way happier writing the code parts of rules as you
>>>> could probably guess from the ones I have written but rewriting JFormRule
>>>> was not particularly my agenda this week :P (dealing with the travel tld is
>>>> what started me looking at it).
>>>> Elin
>>>> On Sunday, November 11, 2012 5:00:57 PM UTC-5, Andrew Eddie wrote:
>>>>> I know this doesn't necessarily address the problem directly, but in a
>>>>> recent conference the point was made several times that regex's, while
>>>>> powerful, are notorious for being unmaintainable. By that I mean can you
>>>>> look at that email regex and tell me immediately what the designer of the
>>>>> regex intended? Maybe you can tell me what the regex does, but in the
>>>>> event there is a bug, it's very difficult to determine what the correct
>>>>> behaviour was supposed to be.
>>>>> The point is, there is a school of thought to move away from regex's
>>>>> and go back to, shock horror, writing code to parse strings (developers,
>>>>> here this: it's ok to write code). So in trying to fix email validation, I
>>>>> would lean towards ditching the regex altogether and going back to
>>>>> step-wise PHP/JS. I would also have no problem in adding rule support by
>>>>> composition. It's a valid use case that you may want some sort of black or
>>>>> white list approach to validation.
As well J (the current regexp) as W (the W3C regexp) handle ip-numbers. I don't see a problem there.
What do you mean by "the multiple attribute" in this context?
+1 for going from regexp to coding rules. Or more general: going to an interface of rules, independent of the implementation. You could still implement some things with a regexp: as long as it passes your tests it is irrelevant how it is implemented.
On 12 November 2012 17:35, Herman Peeren <herman.pee...@gmail.com> wrote:
> +1 for going from regexp to coding rules. Or more general: going to an
> interface of rules, independent of the implementation. You could still
> implement some things with a regexp: as long as it passes your tests it is
> irrelevant how it is implemented.
I know this will be a shock but I totally agree :)
- If using code instead of regex, you will have to write and maintain two
codes: one in PHP, one in JS. With regex, it's possible to use the same
regex, don't you think ? But I agree on the fact, that regex are hard to
read, each time you need to debug them, you need several minutes to read
them again.
- Another comment, not directly linked to the initial subject.
*"For development I often use @localhost or @box email addresses"
*With Zend Framework, there is a way to have several configuration based on
a "state" (developpment,production,etc...)
This is done in the ini configuration file with keywords like [production]
[development]
Currently in Joomla, there is "debug mode", but it could be great to have a
way to switch between different configuration. (several configuration.ini)
and in the configuration.ini, you could add "tld" required or not, path to
the installation, db password,etc...
*
> On 12 November 2012 17:35, Herman Peeren <herman.pee...@gmail.com> wrote:
>> +1 for going from regexp to coding rules. Or more general: going to an
>> interface of rules, independent of the implementation. You could still
>> implement some things with a regexp: as long as it passes your tests it is
>> irrelevant how it is implemented.
> I know this will be a shock but I totally agree :)
Hearing no objections to switching to the HTML5 standard I sent a PR for
that change and some of the related details discussed above. I know that's
tremendously shortsighted of me but the cms would like to solve the problem
users are having with newer tlds, and I would just as soon do it correctly
once rather than do a hack. In this case since the W3C has specifically
promulgated a regex I think it's fine to go ahead using it pending of
course the rewriting of the class that perhaps will come out of this
thread.
As part of the pr I kept the js and php validation the same by default
though they won't be if the tld option attribute is used (the js will be
less strict).
This is a fun discussion, I look forward to seeing the code that comes out
of it. Thanks Herman for stepping up to do the rewriting.
Elin
On Mon, Nov 12, 2012 at 8:40 AM, Thomas PAPIN <thomas.pa...@gmail.com>wrote:
> - If using code instead of regex, you will have to write and maintain two
> codes: one in PHP, one in JS. With regex, it's possible to use the same
> regex, don't you think ? But I agree on the fact, that regex are hard to
> read, each time you need to debug them, you need several minutes to read
> them again.
> - Another comment, not directly linked to the initial subject.
> *"For development I often use @localhost or @box email addresses"
> *
> With Zend Framework, there is a way to have several configuration based on
> a "state" (developpment,production,etc...)
> This is done in the ini configuration file with keywords like [production]
> [development]
> Currently in Joomla, there is "debug mode", but it could be great to have
> a way to switch between different configuration. (several
> configuration.ini) and in the configuration.ini, you could add "tld"
> required or not, path to the installation, db password,etc...
>> On 12 November 2012 17:35, Herman Peeren <herman.pee...@gmail.com> wrote:
>>> +1 for going from regexp to coding rules. Or more general: going to an
>>> interface of rules, independent of the implementation. You could still
>>> implement some things with a regexp: as long as it passes your tests it is
>>> irrelevant how it is implemented.
>> I know this will be a shock but I totally agree :)
On Sunday, November 11, 2012 5:00:57 PM UTC-5, Andrew Eddie wrote:
> The point is, there is a school of thought to move away from regex's and > go back to, shock horror, writing code to parse strings (developers, here > this: it's ok to write code). So in trying to fix email validation, I > would lean towards ditching the regex altogether and going back to > step-wise PHP/JS. I would also have no problem in adding rule support by > composition. It's a valid use case that you may want some sort of black or > white list approach to validation.
While I highly dislike regex, I think for cross-language complex tests it is the best choice. If you store the regex in one place, you can use it for both the PHP and the Javascript validation - and when you want to change it you change it in one place.
Writing dual sets of code to perform the same test opens up the risk of having different validation formula's - or weird edge cases where something passes one test and not the other[it is very frustrating for the user to submit what is the client says is valid input only to have it rejected by the server. Worse yet is to have perfectly valid input that the server does accept - but on some forms the javascript rejects it. As an example, at various points in time the super admin is allowed to submit user edits in the backend that the frontend will reject...leading a user to not be able to edit their own profile!]
For that reason, I'd suggest to use regex *in this case* - especially since there is a well known standard regex test available.
> While I highly dislike regex, I think for cross-language complex tests it
> is the best choice. If you store the regex in one place, you can use it
> for both the PHP and the Javascript validation - and when you want to
> change it you change it in one place.