Here is example of what I've discovered
I was trying to extract main block (without menu and header) from this
page (it is on Russian but its html code is still html code :) ):
http://www.microsoft.com/rus/windows/embedded/license.mspx
I've used inspector to find element I need, then copied XPath of it:
/html/body/table/tbody/tr/td[2]/table/tbody/tr/td
Then I've used HPricot ruby library to extract this element by XPath,
but hpricot found nothing. So I've debug a little and find that
HPricot can access same element with following XPath
/html/body/table/tr/td/table
I see two differences - tbody was skipped, and something happens to
td[2]...
I've check page html source - there are no tbody tags at all
Could anybody explain me how this can happened that Firebug's has
tbody in DOM?
Also is there any way to make Firebug or Firefox generate DOM without
adding elements?
PS - I've still using Firebug to find XPath - but do this manually -
for example on that page I extract element I need by following XPath:
td[@class=MedCMS]
Best Regards
Vladimir
This is the effect of Firebug operating on a live browser DOM aware of
the intricacies, normalization rules and similar of HTML, where
Hpricot and many others operate on input on a more literal level.
Similar things happen when you have broken markup with badly nested
tags and similar; Firefox first normalizes the input to something it
can build a proper DOM of, and Firebug can then find the nodes using
whatever XPath ended up matching the generated DOM, which might differ
substantially from the DOM of the literal file.
> So XPath copied from Firebug cannot be used in other HTML parsers.
You'll probably find that Firebug's xpaths can be applied in other
environments doing HTML normalization similar to Firefox (using the
same xpath for cutting out nodes in Opera, for instance). Hpricot is
unfortunately not one of them, though.
--
/ Johan Sundström, http://ecmanaut.blogspot.com/
> This is the effect of Firebug operating on a live browser DOM aware of
> the intricacies, normalization rules and similar of HTML, where
> Hpricot and many others operate on input on a more literal level.
> Similar things happen when you have broken markup with badly nested
> tags and similar; Firefox first normalizes the input to something it
> can build a proper DOM of, and Firebug can then find the nodes using
> whatever XPath ended up matching the generated DOM, which might differ
> substantially from the DOM of the literal file.
>
>