The URI.extract method from the uri library can extract an array of uri's from
a string:
require 'uri'
URI.extract('My favorite site is http://google.com')
# => ["http://google.com"]
An optional second argument can limit the schemes that it will match against
and return:
URI.extract('Why do people use mailto:m...@lala.org links?')
# => ["mailto:m...@lala.org"]
URI.extract('Why do people use mailto:m...@lala.org links?', 'http')
# => []
marcel
--
Marcel Molina Jr. <mar...@vernix.org>
>On Sun, Jun 12, 2005 at 03:44:03AM +0900, sujeet kumar wrote:
>
>
>>how can I find the url's of the search. Can i
>>do it by regular expression or any other way.
>>
>>
>The URI.extract method from the uri library can extract an array of uri's from
>a string:
>
>
A universal regexp that finds URIs from an abstract text is a
complicated thing, indeed. Besides, it can produce false positives
(finding things that look like URIs, but aren't).
If you are sure that the page is a well-formed XHTML (I'm not sure if
that's the case or not with Google), you might instead parse it with
REXML, and use XPath to retrieve href attributes of all <a>..</a>
elements, selecting only those that start with "http://" (there may also
be mailto:, ftp:, JavaScript calls etc).
Best regards,
Alexey Verkhovsky
Why not use the Google API?
--
Eric Hodel - drb...@segment7.net - http://segment7.net
FEC2 57F1 D465 EB15 5D6E 7C11 332A 551C 796C 9F04