Regexp for email

7,328 views
Skip to first unread message

Archos

unread,
Jul 5, 2010, 7:01:52 AM7/5/10
to golang-nuts
Here I use a *simple* regexp. to validate email addresses, but it
doesn't validates like I expect.

It looks me that the regexp. is correct, isn't?

===
package main

import (
"fmt"
"os"
"regexp"
)

const _EXP_EMAIL = `^[a-z0-9._%+\-]+@[a-z0-9.\-]+\.[a-z]{2,4}$`


func main() {
exp, err := regexp.Compile(_EXP_EMAIL)
if err != nil {
fmt.Println(err)
os.Exit(1)
}

if !exp.MatchString("foo...@hotmail.com") {
println("Invalid email")
}

println()
}
===
Invalid email

chris dollin

unread,
Jul 5, 2010, 7:20:13 AM7/5/10
to Archos, golang-nuts
On 5 July 2010 12:01, Archos <raul...@sent.com> wrote:
> Here I use a *simple* regexp. to validate email addresses, but it
> doesn't validates like I expect.
>
> It looks me that the regexp. is correct, isn't?

> const _EXP_EMAIL = `^[a-z0-9._%+\-]+@[a-z0-9.\-]+\.[a-z]{2,4}$`

{2, 4} -- the documentation for regexp doesn't mention counted repetitions.

I assume you're aware of the required complexity for doing a good job of
doing syntax-checking on email addresses, and that the only real validation
is being able to send an email to that address and get some confirmation
back?

--
Chris "allusive" Dollin

Archos

unread,
Jul 5, 2010, 7:28:33 AM7/5/10
to golang-nuts

On Jul 5, 11:20 am, chris dollin <ehog.he...@googlemail.com> wrote:
Yes, of course. To do the best e-mail validating regex is very
complicated, and that complexity is a lot of greater in Go since the
library is not complete. But for if somebody is interested here you
have:

http://www.regular-expressions.info/email.html
http://fightingforalostcause.net/misc/2006/compare-email-regex.php

软刀

unread,
Sep 27, 2012, 1:21:54 PM9/27/12
to golan...@googlegroups.com


在 2010年7月5日星期一UTC+8下午7时01分52秒,Archos写道:
Here I use a *simple* regexp. to validate email addresses, but it
doesn't validates like I expect.

It looks me that the regexp. is correct, isn't?

===
package main

import (
        "fmt"
        "os"
        "regexp"
)

const _EXP_EMAIL = `^[a-z0-9._%+\-]+@[a-z0-9.\-]+\.[a-z]{2,4}$`

 the \- represent what?

Kevin Gillette

unread,
Sep 27, 2012, 3:26:13 PM9/27/12
to golan...@googlegroups.com
Since Go is actually fast enough to do custom parsing faster than regexes, and proper email validation (to cover the _entire_ rfc) is very difficult to do using regexes, but fairly trivial to do by hand, it's just easy enough to write a custom parser and be done with it.

Archos

unread,
Sep 28, 2012, 1:00:23 AM9/28/12
to golan...@googlegroups.com
The guys that did the RFC should have implemented too. Here are the regexp.[1] I'm using to verify the email but I agree with Kevin, it would be better to write a parser for it:

https://github.com/kless/validate/blob/master/email.go

Thomas Bushnell, BSG

unread,
Sep 28, 2012, 1:38:38 AM9/28/12
to Archos, golan...@googlegroups.com

Why do you think they did not implement it?

--
 
 

Archos

unread,
Sep 28, 2012, 2:52:25 AM9/28/12
to golan...@googlegroups.com
Tell me, where is the regular expression implemented by the guys that wrote the RFC? If you write a specification for it, you should think that it could be used in regular expressions.

One thing is to implement a network protocol but to create a specification related to programming (like a library) without to implement it, at least for me, a very bad idea. The example more clear is OpenGL.

Jesse McNelis

unread,
Sep 28, 2012, 2:57:38 AM9/28/12
to Archos, golan...@googlegroups.com
On Fri, Sep 28, 2012 at 4:52 PM, Archos <raul...@sent.com> wrote:
> Tell me, where is the regular expression implemented by the guys that wrote
> the RFC? If you write a specification for it, you should think that it could
> be used in regular expressions.

Email addresses predate the RFC so the RFC had to be compatible with
the existing formats.
Regular expressions only parse regular grammars. The format of email
addresses isn't regular. IIRC.

--
=====================
http://jessta.id.au

Rémy Oudompheng

unread,
Sep 28, 2012, 3:14:37 AM9/28/12
to Archos, golan...@googlegroups.com
On 2012/9/28 Archos <raul...@sent.com> wrote:
> Tell me, where is the regular expression implemented by the guys that wrote
> the RFC? If you write a specification for it, you should think that it could
> be used in regular expressions.
>
> One thing is to implement a network protocol but to create a specification
> related to programming (like a library) without to implement it, at least
> for me, a very bad idea. The example more clear is OpenGL.

Programming is not limited to regular expressions. Email validation
should be implemented by mail servers. Why do you need to implement it
again?

There is no reason to validate email addresses and not validate URLs.

Rémy.

Archos

unread,
Sep 28, 2012, 4:00:18 AM9/28/12
to golan...@googlegroups.com
You need email validation to validate the users' email that are going to register into a web service.
Of course, you could send an email but it is a fast way to avoid bots too.

Dan Kortschak

unread,
Sep 28, 2012, 5:41:39 AM9/28/12
to Archos, golan...@googlegroups.com

DisposaBoy

unread,
Sep 28, 2012, 9:35:24 AM9/28/12
to golan...@googlegroups.com
before you go about trying to validate the email address first think of why you're trying to do that. if you want to make sure it exists then the only way to do that is to send an email to it and get the user to click on a validation link or something like that. validating the format is useless beyond weeding out things that are obviously invalid or warning the. user about potentially miss-spelt

DisposaBoy

unread,
Sep 28, 2012, 9:35:24 AM9/28/12
to golan...@googlegroups.com

DisposaBoy

unread,
Sep 28, 2012, 9:38:09 AM9/28/12
to golan...@googlegroups.com
argh miss post... continue: validating the format of an email address is pointless beyond weeding out things that are obviously invalid or warning the user about typos etc. just because the email address is valid doesn't mean that the service will accept mail for it.

Thomas Bushnell, BSG

unread,
Sep 28, 2012, 10:34:23 AM9/28/12
to Archos, golan...@googlegroups.com

They did implement it. They did not use regexes to do so. Why is this relevant to go?

--
 
 

Patrick Mylund Nielsen

unread,
Sep 28, 2012, 10:49:52 AM9/28/12
to Kevin Gillette, golan...@googlegroups.com
Yep. Just check for the presence of an @ or ., then let sending a validation email be the validation. You can't confirm that an email actually exists using regular expressions anyway. If you insist on validating via regexp, this should validate correctly/according to RFC 2822:

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])


--
 
 

Kevin Gillette

unread,
Sep 29, 2012, 10:15:56 PM9/29/12
to golan...@googlegroups.com, Archos, jes...@jessta.id.au
I believe it is regular, even if there were an infinite set of possible addresses (which there are not, because of length-bounding), since the quoting and commenting rules are simple (no nested quotes or comments, for example). But at any rate, it might as well not be regular. A parser could probably be done in less gofmt'd code (including whitespace) than the hundreds of chars long regexes needed to do full validation. I also agree that there's not much point in parsing them either, since if you're taking an email address, then you're probably going to send mail to it -- you might as well just send a test email right then.

Michael Jones

unread,
Sep 30, 2012, 12:54:16 AM9/30/12
to Kevin Gillette, golan...@googlegroups.com, Archos, jes...@jessta.id.au
One obvious reason to parse email addresses is in the import or maintenance phases of a contact database.

--
 
 



--
Michael T. Jones | Chief Technology Advocate  | m...@google.com |  +1 650-335-5765

Kevin Gillette

unread,
Sep 30, 2012, 1:24:14 AM9/30/12
to golan...@googlegroups.com, Kevin Gillette, Archos, jes...@jessta.id.au
Maybe, though I'm not so sure of the benefit: if you're migrating from one system to another, can't you assume the emails addresses to be imported are already valid? If they're not, what can you do anyway? It's not as though you can email the user and ask them for a valid address. If they were not valid, you could either: 1) import them anyway (in which case why validate?), or 2) don't import them at all.

With #1, the school of thought is that it's better to have something than nothing. With #2, a problem is that sometimes email validation is done wrong (regexp email validation is especially prone to getting it _wrong_), and you may be deleting users because their perfectly valid email addresses didn't look valid to the validator.

Michael Jones

unread,
Sep 30, 2012, 2:20:06 AM9/30/12
to Kevin Gillette, golan...@googlegroups.com, Archos, jes...@jessta.id.au
I was thinking about looking for duplicates, trying to guess matches between names and email names (as in "Michael Jones" <....>), looking to group by "@domain.com" and so on. Never been an issue for me but it does not seem inherently mistaken to try to parse an email address. (For example, I recently wanted to do this to change some of my contacts from "Last, First" to "First Last" and I was willing to bet not many had commas in their names. I did it with sed and patched it up with vi. ;-)

--
 
 

Kevin Gillette

unread,
Sep 30, 2012, 4:29:51 AM9/30/12
to golan...@googlegroups.com, Kevin Gillette, Archos, jes...@jessta.id.au
Ah, I see your point. 98% of the time, the regexp `<?\S+@\S+?>?` will work for that, since it's exceedingly rare to see multiple forms of the same address, though you do see stuff like `"John Smith" <bl...@wherever.com>` along with `bl...@wherever.com`. I don't know if that regexp will work with Go's stdlib package -- when I have had reason to use regexps in Go, I haven't yet needed to deal with (non)greedy quantifiers.

nhat....@gmail.com

unread,
May 29, 2016, 1:09:54 AM5/29/16
to golang-nuts
    if err != nil { 
                fmt.Println(err) 
                os.Exit(1) 
        } 

I don't understand what that means. Can you explain it to me?

as....@gmail.com

unread,
Jun 6, 2016, 6:53:52 PM6/6/16
to golang-nuts
A good regexp doesn't parse RFC compliant email addresses, it un-parses addresses belonging to RFC aficionados and keeps them from registering in your database.

This is the regexp used in git-codereview, tested against wholesome email addresses belonging go contributors:
https://github.com/golang/review/blob/master/git-codereview/mail.go#L114

usman.m...@gmail.com

unread,
Nov 26, 2019, 8:47:12 AM11/26/19
to golang-nuts
at this regexp its giving warning bad syntax tag i finding how this issues will resolve di you know
Reply all
Reply to author
Forward
0 new messages