copy XPath question

296 views
Skip to first unread message

Vladimir Vladimirov

unread,
Apr 14, 2007, 6:27:45 AM4/14/07
to Firebug
Hi all
I'm trying to use Firebug to get XPath for selected elements on page
like described here:
http://www.igvita.com/blog/2007/02/04/ruby-screen-scraper-in-60-seconds/
and discovered following issue when trying to get XPath for table
elements - it seems that Firebug (or even Firefox ) adds tbody tag for
every table, even if it does not exists in source HTML code
So XPath copied from Firebug cannot be used in other HTML parsers.

Here is example of what I've discovered
I was trying to extract main block (without menu and header) from this
page (it is on Russian but its html code is still html code :) ):
http://www.microsoft.com/rus/windows/embedded/license.mspx
I've used inspector to find element I need, then copied XPath of it:

/html/body/table/tbody/tr/td[2]/table/tbody/tr/td

Then I've used HPricot ruby library to extract this element by XPath,
but hpricot found nothing. So I've debug a little and find that
HPricot can access same element with following XPath

/html/body/table/tr/td/table

I see two differences - tbody was skipped, and something happens to
td[2]...

I've check page html source - there are no tbody tags at all

Could anybody explain me how this can happened that Firebug's has
tbody in DOM?
Also is there any way to make Firebug or Firefox generate DOM without
adding elements?

PS - I've still using Firebug to find XPath - but do this manually -
for example on that page I extract element I need by following XPath:
td[@class=MedCMS]

Best Regards
Vladimir

Johan Sundström

unread,
Apr 14, 2007, 7:17:30 AM4/14/07
to fir...@googlegroups.com
On 4/14/07, Vladimir Vladimirov <smar...@gmail.com> wrote:
> Hi all
> I'm trying to use Firebug to get XPath for selected elements on page
> like described here:
> http://www.igvita.com/blog/2007/02/04/ruby-screen-scraper-in-60-seconds/
> and discovered following issue when trying to get XPath for table
> elements - it seems that Firebug (or even Firefox ) adds tbody tag for
> every table, even if it does not exists in source HTML code

This is the effect of Firebug operating on a live browser DOM aware of
the intricacies, normalization rules and similar of HTML, where
Hpricot and many others operate on input on a more literal level.
Similar things happen when you have broken markup with badly nested
tags and similar; Firefox first normalizes the input to something it
can build a proper DOM of, and Firebug can then find the nodes using
whatever XPath ended up matching the generated DOM, which might differ
substantially from the DOM of the literal file.

> So XPath copied from Firebug cannot be used in other HTML parsers.

You'll probably find that Firebug's xpaths can be applied in other
environments doing HTML normalization similar to Firefox (using the
same xpath for cutting out nodes in Opera, for instance). Hpricot is
unfortunately not one of them, though.

--
/ Johan Sundström, http://ecmanaut.blogspot.com/

Vladimir Vladimirov

unread,
Apr 18, 2007, 4:01:14 AM4/18/07
to Firebug
Thanks for clarification
Anyway I'm still using Firebug to inspect DOM and then write XPath on
my own having in mind Firefox normalization
Also I'm using small RoR web based UI to test XPathes on target pages

> This is the effect of Firebug operating on a live browser DOM aware of
> the intricacies, normalization rules and similar of HTML, where
> Hpricot and many others operate on input on a more literal level.
> Similar things happen when you have broken markup with badly nested
> tags and similar; Firefox first normalizes the input to something it
> can build a proper DOM of, and Firebug can then find the nodes using
> whatever XPath ended up matching the generated DOM, which might differ
> substantially from the DOM of the literal file.
>
>

Reply all
Reply to author
Forward
0 new messages