Grups de Google ja no admet publicacions ni subscripcions noves de Usenet. El contingut antic es pot continuar consultant.

Dismiss

how to extract domain name without sub domain from url

239 visualitzacions

Ves al primer missatge no llegit

Chem Leakhina

no llegida,

22 de juny 2009, 23:46:5722/6/09

Hi everyone,

Does anyone know how to extract domain name without sub domain from url?

Example: http://test.domain.com => http://domain.com

Please give me an example code in ruby.

Thanks,
Leakhina
--
Posted via http://www.ruby-forum.com/.

Justin Collins

no llegida,

23 de juny 2009, 4:11:0523/6/09

Chem Leakhina wrote:
> Hi everyone,
>
> Does anyone know how to extract domain name without sub domain from url?
>
> Example: http://test.domain.com => http://domain.com
>
> Please give me an example code in ruby.
>
> Thanks,
> Leakhina
>

This is actually quite difficult, because there is a multitude of
possible second-level domains which can be used (such as .co.uk), and
they are not really standardized. Just picking one at random, the
country of Jordan has .com.jo, .net.jo, .gov.jo, .edu.jo, .org.jo,
mil.jo, .name.jo, and .sch.jo.

If one were to ignore such things, then it becomes easier:

$ irb
irb(main):001:0> require 'uri'
=> true
irb(main):002:0> u = URI.parse "http://test.domain.com/"
=> #<URI::HTTP:0xb7bbf848 URL:http://test.domain.com/>
irb(main):003:0> u.host
=> "test.domain.com"
irb(main):004:0> u.host.split(".")[-2,2]
=> ["domain", "com"]
irb(main):005:0> u.host.split(".")[-2,2].join(".")
=> "domain.com"

However, as mentioned above, there are a lot of domains this will not
work for.

-Justin

Robert Klemme

no llegida,

23 de juny 2009, 5:11:2523/6/09

2009/6/23 Justin Collins <justin...@ucla.edu>:

> Chem Leakhina wrote:
>>
>> Hi everyone,
>>
>> Does anyone know how to extract domain name without sub domain from url?
>>
>> Example: http://test.domain.com => http://domain.com
>>
>> Please give me an example code in ruby.
>>
>> Thanks,
>> Leakhina
>>
>
> This is actually quite difficult, because there is a multitude of possible
> second-level domains which can be used (such as .co.uk), and they are not
> really standardized. Just picking one at random, the country of Jordan has

> .com.jo, .net.jo, .gov.jo, .edu.jo, .org.jo, .mil.jo, .name.jo, and .sch.jo.

>
> If one were to ignore such things, then it becomes easier:
>
> $ irb
> irb(main):001:0> require 'uri'
> => true
> irb(main):002:0> u = URI.parse "http://test.domain.com/"
> => #<URI::HTTP:0xb7bbf848 URL:http://test.domain.com/>
> irb(main):003:0> u.host
> => "test.domain.com"
> irb(main):004:0> u.host.split(".")[-2,2]
> => ["domain", "com"]
> irb(main):005:0> u.host.split(".")[-2,2].join(".")
> => "domain.com"
>
>
> However, as mentioned above, there are a lot of domains this will not work
> for.

We can get better results by ignoring particular known domain prefixes
such as "ftp" and "www":

# this works with 1.8 and 1.9
%w{
www.google.com
google.co.uk
www.google.co.uk
foo.bar
}.each do |domain|
dom = domain.sub(/^(?:www|ftp)\./, '')[/^[^.]+/]
printf "%p -> %p\n", domain, dom
# alternative
dom = domain[/^(?:(?:ftp|www)\.)?([^.]+)/, 1]
printf "%p -> %p\n", domain, dom
end

Kind regards

robert

--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

0 missatges nous