Email rule

165 views
Skip to first unread message

Elin Waring

unread,
Nov 11, 2012, 1:44:55 PM11/11/12
to joomla-de...@googlegroups.com
As many people know the whole subject of validating email is really difficult. There are those who like a rule that insists on being as general as possible including no @ and also the possibility of @localhost and those who like a rule that totally focuses on more or less the majority of email addresses that would have a full domain name including a tld. You can even find validation  schemes that run look ups against actual registered domains.   Assuming we aren't doing that, we have a problem with our current rule which is that it allows a maximum of 4 characters in the tld and with the many new tlds being added this is not appropriate any more. We could just make 4 a higher arbitrary number and it would solve the immediate problem. You could also say it is a problem that it requires a tld so @localhost does not validate nor does an address without an @ because currently  we have:

$regex = '^[\w.-]+(\+[\w.-]+)*@\w+[\w.-]*?\.\w{2,4}$'

For HTML5 the w3c actually has published a different regex for its field type email that is much less restrictive:

^[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$

http://www.w3.org/TR/html-markup/input.email.html

I'm wondering if we would want to consider switching the email validation to that. 

Alternatively maybe we want to give the option of requiring an address that has a tld if we want the option of being more strict than the w3c which  for many (perhaps most) use cases I think people would like to probably require a "normal" email.

I was wondering if people had any thoughts on this? 

Elin

Donald Gilbert

unread,
Nov 11, 2012, 2:05:35 PM11/11/12
to joomla-de...@googlegroups.com
My thoughts are, why try to reinvent the wheel? Meaning, the w3c has a published standard for it's email matching regex (which you referenced) so why not just use that?

I would agree with you as well on adding the option to require a tld. How to implement that would be a good question.

Elin Waring

unread,
Nov 11, 2012, 2:38:29 PM11/11/12
to joomla-de...@googlegroups.com
I should say also that whatever we do it should also be reflected in the js validation of the field type email. 

Elin

Sam Moffatt

unread,
Nov 11, 2012, 3:02:51 PM11/11/12
to joomla-de...@googlegroups.com
For development I often use @localhost or @box email addresses. At one
point I had to hack validation on one site I was working on to get it
to work properly with what I'm testing. So I object to putting
arbitrary limitations on the domain.

Cheers,


Sam Moffatt
http://pasamio.id.au

Elin Waring

unread,
Nov 11, 2012, 3:21:40 PM11/11/12
to joomla-de...@googlegroups.com
I'm tempted to say you should make a custom rule if you don't want to use the w3c standards but then I usually take the position that it's good to be useful and not force people to write custom rules and fields for common use cases so I'm a bit torn on this. I guess it would be possible to have two regexes available. But that still leaves the JS in the field (and I know Louis said that the platform is getting rid of all its js but that hasn't happened at this point).  Right now putting type="email" and validate="email" gives you the same regex.

Elin 

Herman Peeren

unread,
Nov 11, 2012, 4:20:17 PM11/11/12
to joomla-de...@googlegroups.com
Just a clarification of the differences for who doesn't read regexps. I'll call the regular expression now used in Joomla J and the W3c-expression W:
  • in W the part in front of the @ may contain the characters !#$%&’*+/=?^`{|} and ~. J is too strict here, concerning RFC 5322 defined 'atext' as the standard.
  • in J a domainname and a tld can contain an underscore _ which is not correct following RFC 1123
  • in J there must be at least one tld (the last "extension") of length between 2 and 4 characters; as mentioned, such a tld is not necessary in local environments (and the length is not correct anymore).

Both expressions need a @ in an email address. W is a little bit more precise, following the standards. But on the internet it could be handy to check for a tld. That option can be implemented by changing the last * in W into a +.

brian teeman

unread,
Nov 11, 2012, 4:32:10 PM11/11/12
to joomla-de...@googlegroups.com


On Sunday, 11 November 2012 21:20:17 UTC, Herman Peeren wrote:
  • in J there must be at least one tld (the last "extension") of length between 2 and 4 characters; as mentioned, such a tld is not necessary in local environments (and the length is not correct anymore).


Not just in local environments is this a problem - with all the newly approved TLD coming online in the near future the 2-4 characters is an issue 

Elin Waring

unread,
Nov 11, 2012, 4:41:58 PM11/11/12
to joomla-de...@googlegroups.com
That's why I said one option is just to change the 4 to some larger number.

I do think it is  selling point for Joomla to be standards compliant or at least to offer standards compliance as the default.

Elin

Herman Peeren

unread,
Nov 11, 2012, 4:42:43 PM11/11/12
to joomla-de...@googlegroups.com


On Sunday, 11 November 2012 22:32:10 UTC+1, brian teeman wrote:
Not just in local environments is this a problem - with all the newly approved TLD coming online in the near future the 2-4 characters is an issue 

Correct. I meant: a tld is not necessary in local environments. But is handy on the internet. So, that is about the only thing that could be made optional in this expression: in a production-environment on the internet you could make a tld an obligation (by changing the last * into a +).

Elin Waring

unread,
Nov 11, 2012, 4:48:33 PM11/11/12
to joomla-de...@googlegroups.com
Well what we would do in the rule is make two separate regexes depending on the presence of an attribute such as Tld="required" 

So the rule could optionally be stricter than the js validation.

Elin
Elin

Andrew Eddie

unread,
Nov 11, 2012, 5:00:36 PM11/11/12
to JPlatform
I know this doesn't necessarily address the problem directly, but in a recent conference the point was made several times that regex's, while powerful, are notorious for being unmaintainable.  By that I mean can you look at that email regex and tell me immediately what the designer of the regex intended?  Maybe you can tell me what the regex does, but in the event there is a bug, it's very difficult to determine what the correct behaviour was supposed to be.

The point is, there is a school of thought to move away from regex's and go back to, shock horror, writing code to parse strings (developers, here this: it's ok to write code).  So in trying to fix email validation, I would lean towards ditching the regex altogether and going back to step-wise PHP/JS.  I would also have no problem in adding rule support by composition.  It's a valid use case that you may want some sort of black or white list approach to validation. 

Just throwing it out there.

Regards,
Andrew Eddie

Elin Waring

unread,
Nov 11, 2012, 6:05:50 PM11/11/12
to joomla-de...@googlegroups.com
That's fine in principle if you are throwing out JFormRule's basic structure which starts with that test method that is based on having a  regex property.  

I personally am way happier writing the code parts of rules as you could probably guess from the ones I have written but rewriting JFormRule was not particularly my agenda this week :P (dealing with the travel tld is what started me looking at it). 

Elin

Andrew Eddie

unread,
Nov 11, 2012, 6:31:17 PM11/11/12
to JPlatform
I just shared that as a general principle - no more, no less.

Regards,
Andrew Eddie

Regards,
Andrew Eddie
http://learn.theartofjoomla.com - training videos for Joomla developers

Elin Waring

unread,
Nov 11, 2012, 9:09:23 PM11/11/12
to joomla-de...@googlegroups.com
I know. 
Also the W3C will take an IP address whereas our current one won't. That is a plus I think.


Elin

Elin Waring

unread,
Nov 11, 2012, 10:36:09 PM11/11/12
to joomla-de...@googlegroups.com
I hate to do this to myself but we should also consider adding the multiple attribute since it is part of the standard.

Elin

Donald Gilbert

unread,
Nov 11, 2012, 10:39:32 PM11/11/12
to joomla-de...@googlegroups.com
:(

How widely is it used? I know it's part of the standard, but how would we even implement something like that?

Also, a bit off topic, and I know you'll see it eventually, but I did what I had talked about doing this weekend - Refactoring JFormField & Co -  https://github.com/joomla/joomla-platform/pull/1684

Herman Peeren

unread,
Nov 12, 2012, 2:35:27 AM11/12/12
to joomla-de...@googlegroups.com
As well J (the current regexp) as W (the W3C regexp) handle ip-numbers. I don't see a  problem there.

What do you mean by "the multiple attribute" in this context?

+1 for going from regexp to coding rules. Or more general: going to an interface of rules, independent of the implementation. You could still implement some things with a regexp: as long as it passes your tests it is irrelevant how it is implemented.

Andrew Eddie

unread,
Nov 12, 2012, 3:03:52 AM11/12/12
to JPlatform
On 12 November 2012 17:35, Herman Peeren <herman...@gmail.com> wrote:
+1 for going from regexp to coding rules. Or more general: going to an interface of rules, independent of the implementation. You could still implement some things with a regexp: as long as it passes your tests it is irrelevant how it is implemented.

I know this will be a shock but I totally agree :)

Regards,
Andrew Eddie

Thomas PAPIN

unread,
Nov 12, 2012, 8:40:56 AM11/12/12
to joomla-de...@googlegroups.com
-  If using code instead of regex, you will have to write and maintain two codes: one in PHP, one in JS. With regex, it's possible to use the same regex, don't you think ? But I agree on the fact, that regex are hard to read, each time you need to debug them, you need several minutes to read them again.

- Another comment, not directly linked to the initial subject.


"For development I often use @localhost or @box email addresses"

With Zend Framework, there is a way to have several configuration based on a "state" (developpment,production,etc...)
This is done in the ini configuration file with keywords like [production] [development]

Currently in Joomla, there is "debug mode", but it could be great to have a way to switch between different configuration. (several configuration.ini) and in the configuration.ini, you could add "tld" required or not, path to the installation, db password,etc...




2012/11/12 Andrew Eddie <mamb...@gmail.com>

Elin Waring

unread,
Nov 12, 2012, 4:23:05 PM11/12/12
to joomla-de...@googlegroups.com
Hearing no objections to switching  to the HTML5 standard I sent a PR for that change and some of the related details discussed above. I know that's tremendously shortsighted of me but the cms would like to solve the problem users are having with newer tlds, and I would just as soon do it correctly once rather than do a hack. In this case since the W3C has specifically promulgated a regex I think it's fine to go ahead using it pending of course the rewriting of the class that perhaps will come out of this thread. 


As part of the pr I kept the js and php validation the same by default though they won't be if the tld option attribute is used (the js will be less strict).

This is a fun discussion, I look forward to seeing the code that comes out of it. Thanks Herman for stepping up to do the rewriting.


Elin

Gary Mort

unread,
Nov 20, 2012, 7:32:37 AM11/20/12
to joomla-de...@googlegroups.com


On Sunday, November 11, 2012 5:00:57 PM UTC-5, Andrew Eddie wrote:

The point is, there is a school of thought to move away from regex's and go back to, shock horror, writing code to parse strings (developers, here this: it's ok to write code).  So in trying to fix email validation, I would lean towards ditching the regex altogether and going back to step-wise PHP/JS.  I would also have no problem in adding rule support by composition.  It's a valid use case that you may want some sort of black or white list approach to validation. 


While I highly dislike regex, I think for  cross-language complex tests it is the best choice.  If you store the regex in one place, you can use it for both the PHP and the Javascript validation - and when you want to change it you change it in one place.

Writing dual sets of code to perform the same test opens up the risk of having different validation formula's - or weird edge cases where something passes one test and not the other[it is very frustrating for the user to submit what is the client says is valid input only to have it rejected by the server.  Worse yet is to have perfectly valid input that the server does accept - but on some forms the javascript rejects it.  As an example, at various points in time the super admin is allowed to submit user edits in the backend that the frontend will reject...leading a user to not be able to edit their own profile!]

For that reason, I'd suggest to use regex in this case - especially since there is a well known standard regex test available.

Andrew Eddie

unread,
Nov 27, 2012, 7:03:35 PM11/27/12
to JPlatform
While I highly dislike regex, I think for  cross-language complex tests it is the best choice.  If you store the regex in one place, you can use it for both the PHP and the Javascript validation - and when you want to change it you change it in one place.

That's a fair point. 

Regards,
Andrew Eddie

Reply all
Reply to author
Forward
0 new messages