Domain URL regex help

0 views
Skip to first unread message

Rodusa

unread,
Jun 25, 2009, 11:27:48 AM6/25/09
to Regex
I am trying to capture an specific domain/submain URL but I am having
a hard time trying to eliminate those last 3 options:

amazon
http
http://www


This is the regex
((?<Protocol>\w+):\/\/)?(www\.)?([a-zA-Z0-9\-\.]+)(?<extension>(\.com)?
(\.net)?(\.br)?)
This is the result I get:

http://www.amazon.com.br
http://www.ama-zon.com.br
http://www.amazon.com
http://www.amazon.net
http://amazon.com
www.amazon.com
amazon.com
product.amazon.com
http://product.amazon.com
http://www.product.amazon.com
amazon
http
http://www

thanks

Rod

Accmailer

unread,
Jul 13, 2009, 4:58:05 AM7/13/09
to Regex
My suggestion
\W(?:http://(?:www.)?)?([-a-z0-9_]+\.)+(com|net|br)\W

Tested on your message.
Catches all good options and does not catch the last three ones.
gTLD list can be extendedю
No need to put a dot in front of every gTLD as the ([-a-z0-9_]+\.)+
construct ensures that every word in URL (there can be really a lot of
them) is followed by a dot

Note: Will not catch URLs with subdirestories and forward slashes. If
it is required, pls reply.

On Jun 25, 8:27 pm, Rodusa <rlueneb...@gmail.com> wrote:
> I am trying to capture an specific domain/submain URL but I am having
> a hard time trying to eliminate those last 3 options:
>
> amazon
> httphttp://www
Reply all
Reply to author
Forward
0 new messages