Any comments on this matter are appreciated.
regards
Michael
Hmm, have you tried google? There may not be such in Python, but I'm sure
you can translate whatever samples you find. The basic idea is to use
regular expressions, right? I remember seeing an approach that explicitly
provided different levels of strictness. Something like this:
Level 1: Just make sure you have something@something
Level 2: Make sure you have some...@something.something
Level 3: Make sure the tld is valid.
Level 4: Make sure the domain is valid (whois, ping, etc)
Level 5: Make sure the address itself is valid (resolve the mx record and
test the address via smtp).
Cheers,
// mark
E-mail connectivity is only weakly about syntax, despite the
eagerness of many practitioners to show off their REs.
--
Cameron Laird <Cam...@Lairds.com>
Business: http://www.Phaseit.net
Personal: http://starbase.neosoft.com/~claird/home.html
Yeah, I should have mentioned that my own personal preference is to avoid
all this RE-based validation crap and just generate an email to the supplied
address that contains a URL that when navigated to confirms the validity of
the address.
That's much more straightforward, imho, and avoids all this, "What are the
valid tld's?" mess. Just take whatever the user supplies and send them a
confirmation email.
But, to each his own,
Cheers,
// mark
I agree with all the other replies to your post about the caveats on
validation. However, I have found some uses, e.g. client-side
validation in forms to catch user typos. In Javascript, I have used
the re.
"^\\w(\\w|\\.\\w)*@\\w(\\w|\\.\\w)*$"
I have not tested this in Python but it should need only minor mods,
if any...since you mention Web forms in your post, it may be that you
can do this browser-side.
Hope this helps
| I remember seeing an approach that explicitly
| provided different levels of strictness. Something like this:
| Level 1: Just make sure you have something@something
| Level 2: Make sure you have some...@something.something
"user@com." is (syntactically) valid too, i think.
| Level 3: Make sure the tld is valid.
| Level 4: Make sure the domain is valid (whois, ping, etc)
it can be hard to keep track of all the whois servers for all tld's.
| Level 5: Make sure the address itself is valid (resolve the mx record and
| test the address via smtp).
transient network problems can bite you here (also applies to 4).
-- erno
If you really want to ensure 100% syntactical correctness, here's the
regular expression for matching email addresses, from Jeffrey Friedl's
"Mastering Regular Expressions" book ;-)
[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\
xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xf
f\n\015()]*)*\)[\040\t]*)*(?:(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\x
ff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|"[^\\\x80-\xff\n\015
"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[\040\t]*(?:\([^\\\x80-\
xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80
-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*
)*(?:\.[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\
\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\
x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x8
0-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|"[^\\\x80-\xff\n
\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[\040\t]*(?:\([^\\\x
80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^
\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040
\t]*)*)*@[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([
^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\
\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\
x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-
\xff\n\015\[\]]|\\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()
]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\
x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\04
0\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\
n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\
015()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?!
[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\
]]|\\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\
x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\01
5()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)*|(?:[^(\040)<>@,;:".
\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]
)|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[^
()<>@,;:".\\\[\]\x80-\xff\000-\010\012-\037]*(?:(?:\([^\\\x80-\xff\n\0
15()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][
^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)|"[^\\\x80-\xff\
n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[^()<>@,;:".\\\[\]\
x80-\xff\000-\010\012-\037]*)*<[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?
:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-
\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:@[\040\t]*
(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015
()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()
]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\0
40)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\
[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\
xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*
)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x80
-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x
80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t
]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\
\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])
*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x
80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80
-\xff\n\015()]*)*\)[\040\t]*)*)*(?:,[\040\t]*(?:\([^\\\x80-\xff\n\015(
)]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\
\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*@[\040\t
]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\0
15()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015
()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(
\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|
\\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80
-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()
]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x
80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^
\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040
\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".
\\\[\]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff
])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\
\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x
80-\xff\n\015()]*)*\)[\040\t]*)*)*)*:[\040\t]*(?:\([^\\\x80-\xff\n\015
()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\
\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)?(?:[^
(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-
\037\x80-\xff])|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\
n\015"]*)*")[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|
\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))
[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x80-\xff
\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\x
ff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(
?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\
000-\037\x80-\xff])|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\
xff\n\015"]*)*")[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\x
ff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)
*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)*@[\040\t]*(?:\([^\\\x80-\x
ff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-
\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)
*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\
]\000-\037\x80-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\]
)[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-
\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\x
ff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(
?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80
-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)<
>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x8
0-\xff])|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])[\040\t]*(?:
\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]
*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)
*\)[\040\t]*)*)*>)
Jason Voegele
regards
Steve
--
Consulting, training, speaking: http://www.holdenweb.com/
Python Web Programming: http://pydish.holdenweb.com/pwp/
<< MIME encoded re-sample deleted :] >>
any summary on what that's supposed to mean?
(beyond "validate email address")
I see you've got a smiley, but just in case anyone is lacking in
attention, the regex was *not* MIME encoded.
>any summary on what that's supposed to mean?
>(beyond "validate email address")
Read the Friedl book. Seriously.
--
--- Aahz <*> (Copyright 2002 by aa...@pobox.com)
Hugs and backrubs -- I break Rule 6 http://www.rahul.net/aahz/
Androgynous poly kinky vanilla queer het Pythonista
"I support family values -- Addams family values" --www.nancybuttons.com
Email validation is a tricky business. "Algorithm" is called RFC 2822
(see http://rfc.net/rfc2822.html). Be warned to set aside enough time
to go to the bottom. Once there (@theBottom) you'll probably realize
that proper testing/checking is just not worthwhile. Even though all
looks and feels right, the very first part (before @ char) may contain
unintentional (spelling) error and as a result - mail will never reach
the recipient...
One of many possible ways to tackle the problem:
http://www.serialscripter.com/scripts/EMailValidatorCLS.html
(JScript)
Branimir
Forget it. Although there are some "rules" defined by the RFCs about
how the localpart (before the @) should look like, in reality you'll
find that close to every character can appear there. I'm just starting
with Python, but in PHP based apps, I resorted to simply checking if
there's a MX (mail exchange) server listed for the domain. If that
returns true, then all is fine.
Alexander Skwar
--
How to quote: http://learn.to/quote (german) http://quote.6x.to (english)
Homepage: http://www.iso-top.de | Jabber: ask...@charente.de
iso-top.de - Die günstige Art an Linux Distributionen zu kommen
Uptime: 12 days 3 hours 28 minutes
This won't work. A lot of the huge providers (AOL, T-Online...) always
give a true result here, even if the adress does not exist. This is
also done for anti-spam measures, because if a spammer could so easily
check if a given adress exists, it would help him.
Alexander Skwar
--
How to quote: http://learn.to/quote (german) http://quote.6x.to (english)
Homepage: http://www.iso-top.de | Jabber: ask...@charente.de
iso-top.de - Die günstige Art an Linux Distributionen zu kommen
Uptime: 12 days 3 hours 30 minutes
What's a "typo"? Suppose I'd be using Lotus Notes, then my adress could
actually look like "Alexander.Skwar/some/where@domain". This would
resolve to a typo in your case, although that adress is valid.
> if any...since you mention Web forms in your post, it may be that you
> can do this browser-side.
Never do this! If you rely on something client-side for checks, you're
in hell, because disabling JavaScript iwll break the check, and forging
the reply sent by the browser is very easy.
Alexander Skwar
--
How to quote: http://learn.to/quote (german) http://quote.6x.to (english)
Homepage: http://www.iso-top.de | Jabber: ask...@charente.de
iso-top.de - Die günstige Art an Linux Distributionen zu kommen
Uptime: 12 days 3 hours 33 minutes
It is able to validate all valid RfC 822 Addresses like
Muhammed.(I am the greatest) Ali @(the)Vegas.WBA
"\"quote.and space"@[].[\[].yp.to (Might work)
<@gateway.af.mil:Go...@heaven.af.mil>
<mailto:@c0re.jp>
this are all valid addresses. RfC 822 address parsing -which is
the essential for finding "valid" addresses is sheer horror.
drt
--
teenage mutant ninja hero coders from da c0re - http://c0re.jp/
me - http://koeln.ccc.de/~drt/
> It is able to validate all valid RfC 822 Addresses like
>
> Muhammed.(I am the greatest) Ali @(the)Vegas.WBA
> "\"quote.and space"@[].[\[].yp.to (Might work)
><@gateway.af.mil:Go...@heaven.af.mil>
><mailto:@c0re.jp>
>
> this are all valid addresses. RfC 822 address parsing -which is
> the essential for finding "valid" addresses is sheer horror.
It could be worse. You could have to parse addresses with mixed
822 and uucp bang notation (and it's various offspring).
Anyway, the correlation between being a "valid" 822 address and
whether mail sent to that address will end up anywhere useful
is probably small enough that it's not worth the effort.
--
Grant Edwards grante Yow! The Osmonds! You are
at all Osmonds!! Throwing up
visi.com on a freeway at dawn!!!