On 4/14/07, Vladimir Vladimirov <smartk
...@gmail.com> wrote:
> Hi all
> I'm trying to use Firebug to get XPath for selected elements on page
> like described here:
>
http://www.igvita.com/blog/2007/02/04/ruby-screen-scraper-in-60-seconds/ > and discovered following issue when trying to get XPath for table
> elements - it seems that Firebug (or even Firefox ) adds tbody tag for
> every table, even if it does not exists in source HTML code
This is the effect of Firebug operating on a live browser DOM aware of
the intricacies, normalization rules and similar of HTML, where
Hpricot and many others operate on input on a more literal level.
Similar things happen when you have broken markup with badly nested
tags and similar; Firefox first normalizes the input to something it
can build a proper DOM of, and Firebug can then find the nodes using
whatever XPath ended up matching the generated DOM, which might differ
substantially from the DOM of the literal file.
> So XPath copied from Firebug cannot be used in other HTML parsers.
You'll probably find that Firebug's xpaths can be applied in other
environments doing HTML normalization similar to Firefox (using the
same xpath for cutting out nodes in Opera, for instance). Hpricot is
unfortunately not one of them, though.
--
/ Johan Sundström, http://ecmanaut.blogspot.com/