Extract Domain name (url)

244 views
Skip to first unread message

Abhishek shukla

unread,
Nov 12, 2009, 1:25:10 AM11/12/09
to rubyonrails-talk
Hello Friends,
I need to write a regular expression which will extract and return the domain name.

for example
if a user parse any of the below mention url it should save only "foo.com"

http://www.foo.com/
http://www.foo.com/something
http://foo.com/
https://something.foo.com/

Thanks for any help..

Thanks
abhis

Srinivas Iyer

unread,
Nov 12, 2009, 1:39:47 AM11/12/09
to rubyonra...@googlegroups.com
Good way to Start is trying it to learn on online Regular Expression Editor

http://rubular.com

Abhishek shukla

unread,
Nov 12, 2009, 1:46:11 AM11/12/09
to rubyonra...@googlegroups.com
Hey  srinivas,

Thanks for reply.

Somehow I am able to get the outpout, but the only problem is that i have to define all the uk|com|net|org|in

So just trying to figure out which will be the best way to get the output.

url_pattern = /^(?:.+?\.)+(.+?\.(?:co\.uk|com|net|org|in))(\:[0-9]{2,5})?\/*.*$/is
url = "http://www.foo.com"
url_pattern.match(url)
$1 #=> "foo.com"


Thanks
Abhishek

Conrad Taylor

unread,
Nov 12, 2009, 2:30:42 AM11/12/09
to rubyonra...@googlegroups.com
require 'uri'


urls.each { |url| puts URI::parse( url ).host.split( "." )[-2,2].join(".") }

Good luck,

-Conrad
 


Srinivas Iyer

unread,
Nov 12, 2009, 2:46:03 AM11/12/09
to rubyonra...@googlegroups.com
Hi Abhishek

You can try using Addressable gem for your requirement .

Step 1 : Install Addressable gem with the following command .

$sudo gem install addressable

Step 2 : Will be explaining with IRB u can try and integrate with
your rails application .

$ irb
> require 'rubygems'
> require 'addressable/uri'
> uri = Addressable::URI.parse("http://google.com")
=> #<Addressable::URI:0xfdb9aee5c URI:http://google.com>

Step 3 : You can extract only the host with the following command

> uri.host
=> "google.com"

There are many other different options which you can explore
http://addressable.rubyforge.org/api/classes/Addressable/URI.html

Hope this helps !

Best regards,
Srinivas Iyer
http://talkonsomething.com
http://twitter.com/srinivasiyermv

Conrad Taylor

unread,
Nov 12, 2009, 5:31:53 AM11/12/09
to rubyonra...@googlegroups.com
On Wed, Nov 11, 2009 at 11:46 PM, Srinivas Iyer <srim...@gmail.com> wrote:

Hi Abhishek

 You can try using Addressable  gem for your   requirement .

Step 1 :  Install Addressable gem with the following command .

         $sudo gem install  addressable

Step 2 :  Will be explaining with IRB u can try and integrate with
your rails application .

           $ irb
            > require 'rubygems'
            > require 'addressable/uri'
            >  uri = Addressable::URI.parse("http://google.com")
                 => #<Addressable::URI:0xfdb9aee5c URI:http://google.com>

Step 3 :  You can   extract only the host with the following command

           > uri.host
            => "google.com"

There are many other different options which you can explore
http://addressable.rubyforge.org/api/classes/Addressable/URI.html

Hope this helps !

Best regards,
Srinivas Iyer
http://talkonsomething.com
http://twitter.com/srinivasiyermv

Hi, the addressable gem doesn't produce the domain part of the web address.  For example,

irb(main):002:0> require 'addressable/uri'
=> true
irb(main):003:0> uri = Addressable::URI.parse("http://www.usc.edu/home.html" )
=> #<Addressable::URI:0x90e89c URI:http://www.usc.edu/home.html>
irb(main):004:0> uri.host

-Conrad

Craig White

unread,
Nov 12, 2009, 9:48:56 AM11/12/09
to rubyonra...@googlegroups.com
On Thu, 2009-11-12 at 02:31 -0800, Conrad Taylor wrote:

>
> irb(main):002:0> require 'addressable/uri'
> => true
> irb(main):003:0> uri =
> Addressable::URI.parse("http://www.usc.edu/home.html" )
> => #<Addressable::URI:0x90e89c URI:http://www.usc.edu/home.html>
> irb(main):004:0> uri.host
> => "www.usc.edu"

----
uri.host.split('.')[0]

Craig

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

Tony Amoyal

unread,
Nov 12, 2009, 9:59:33 AM11/12/09
to Ruby on Rails: Talk
# Given a URL, return a domain
def self.url_to_domain(url)
begin
host = URI.parse(self.fix_url(url)).host
host.gsub(/\Awww\./, "")
rescue
""
end
end

Tony Amoyal

unread,
Nov 12, 2009, 10:16:49 AM11/12/09
to Ruby on Rails: Talk
Oops, forgot to add the other function i was using:
# Prepend URL with http if necessary
def self.fix_url(u)
!!( u !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{u}" : u
end

Note that you need to require uri:

require 'uri'

I put this in a module called Utilities so the whole thing is:

require 'uri'

module Utilities

# Given a URL, return a domain
def self.url_to_domain(url)
begin
host = URI.parse(self.fix_url(url)).host
host.gsub(/\Awww\./, "")
rescue
""
end
end

# Prepend URL with http if necessary
def self.fix_url(u)
!!( u !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{u}" : u
end

end

And you call it with Utilities::url_to_domain(u)

Abhishek shukla

unread,
Nov 18, 2009, 2:03:23 AM11/18/09
to rubyonra...@googlegroups.com
Hello
Thanks friends for a superb solutions. Really appreciated.

Thanks
Abhis
 

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To post to this group, send email to rubyonra...@googlegroups.com
To unsubscribe from this group, send email to rubyonrails-ta...@googlegroups.com
For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en
-~----------~----~----~----~------~----~------~--~---


fort

unread,
Nov 23, 2009, 3:27:05 PM11/23/09
to Ruby on Rails: Talk
I faced the exact same situation a while ago, here's what I came up
with after reading the rest of this thread:

#!/usr/bin/env ruby

require 'uri'

module DomainExtractor
VALID_GENERIC_SUFIXES_RE = /^(com|net|org|co)$/

def self.extract(url)
u = fix_url(url)
uri = URI::parse(u)
domain = uri.host
chunks = domain.split('.')

if ! (chunks[-1] =~ VALID_GENERIC_SUFIXES_RE).nil?
domain = chunks[-2, 2].join('.')
elsif ! (chunks[-2] =~ VALID_GENERIC_SUFIXES_RE).nil?
domain = chunks[-3, 3].join('.')
else
domain = ""
end
domain.gsub(/\^www\./, "")
rescue
""
end

def self.fix_url(url)
!!( url !~ /\A(?:http:\/\/|https:\/\/)/i ) ? "http://#{url}" :
url
end
end

# test
urls = [
"http://google.com",
"http://www.google.com",
"http://google.com.uy",
"http://www.google.com.uy",
"http://google.com.uy/index.html",
"http://subdomain1.google.com.uy/index.html",
"http://subdomain1.subdomain2.google.com",
"http://www.subdomain1.google.com.uy/index.html",
"http://subdomain1.google.net/index.html",
"http://subdomain1.sub2.sub3.google.org.kz?test=3",
"http://kb.mediatemple.net/questions/251/Running+rake+tasks+from
+cron",
"https://creaproject.basecamphq.com/projects/3620850/todo_items/
413078/comments",
"google.com",
"google.com.uy",
"google.com.uy/index.php",
"sub1.sub2.google.com.uy?test=value",
"www.sub1.sub2.google.com.uy?test=value",
"http://sub1.sub2.google.com.uy?test=value",
"http://www.wwwsub1.sub2.google.com.uy?test=value"
]

urls.each do |url|
puts
puts "URL : #{url}"
result = DomainExtractor::extract(url)
puts "result: #{result}"
end
Reply all
Reply to author
Forward
0 new messages