^[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$http://www.w3.org/TR/html-markup/input.email.html
I'm wondering if we would want to consider switching the email validation to that.
Both expressions need a @ in an email address. W is a little bit more precise, following the standards. But on the internet it could be handy to check for a tld. That option can be implemented by changing the last * in W into a +.
- in J there must be at least one tld (the last "extension") of length between 2 and 4 characters; as mentioned, such a tld is not necessary in local environments (and the length is not correct anymore).
Not just in local environments is this a problem - with all the newly approved TLD coming online in the near future the 2-4 characters is an issue
+1 for going from regexp to coding rules. Or more general: going to an interface of rules, independent of the implementation. You could still implement some things with a regexp: as long as it passes your tests it is irrelevant how it is implemented.
The point is, there is a school of thought to move away from regex's and go back to, shock horror, writing code to parse strings (developers, here this: it's ok to write code). So in trying to fix email validation, I would lean towards ditching the regex altogether and going back to step-wise PHP/JS. I would also have no problem in adding rule support by composition. It's a valid use case that you may want some sort of black or white list approach to validation.
While I highly dislike regex, I think for cross-language complex tests it is the best choice. If you store the regex in one place, you can use it for both the PHP and the Javascript validation - and when you want to change it you change it in one place.