Google Groups Home
Help | Sign in
copy XPath question
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  3 messages - Collapse all
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
Vladimir Vladimirov  
View profile
 More options Apr 14 2007, 6:27 am
From: "Vladimir Vladimirov" <smartk...@gmail.com>
Date: Sat, 14 Apr 2007 03:27:45 -0700
Local: Sat, Apr 14 2007 6:27 am
Subject: copy XPath question
Hi all
I'm trying to use Firebug to get XPath for selected elements on page
like described here:
http://www.igvita.com/blog/2007/02/04/ruby-screen-scraper-in-60-seconds/
and discovered following issue when trying to get XPath for table
elements - it seems that Firebug (or even Firefox ) adds tbody tag for
every table, even if it does not exists in source HTML code
So XPath copied from Firebug cannot be used in other HTML parsers.

Here is example of what I've discovered
I was trying to extract main block (without menu and header) from this
page (it is on Russian but its html code is still html code :) ):
http://www.microsoft.com/rus/windows/embedded/license.mspx
I've used inspector to find element I need, then copied XPath of it:

/html/body/table/tbody/tr/td[2]/table/tbody/tr/td

Then I've used HPricot ruby library to extract this element by XPath,
but hpricot found nothing. So I've debug a little and find that
HPricot can access same element with following XPath

/html/body/table/tr/td/table

I see two differences - tbody was skipped, and something happens to
td[2]...

I've check page html source - there are no tbody tags at all

Could anybody explain me how this can happened that Firebug's has
tbody in DOM?
Also is there any way to make Firebug or Firefox generate DOM without
adding elements?

PS - I've still using Firebug to find XPath - but do this manually -
for example on that page I extract element I need by following XPath:
td[@class=MedCMS]

Best Regards
Vladimir


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Johan Sundström  
View profile
 More options Apr 14 2007, 7:17 am
From: "Johan Sundström" <oyas...@gmail.com>
Date: Sat, 14 Apr 2007 13:17:30 +0200
Local: Sat, Apr 14 2007 7:17 am
Subject: Re: copy XPath question
On 4/14/07, Vladimir Vladimirov <smartk...@gmail.com> wrote:

> Hi all
> I'm trying to use Firebug to get XPath for selected elements on page
> like described here:
> http://www.igvita.com/blog/2007/02/04/ruby-screen-scraper-in-60-seconds/
> and discovered following issue when trying to get XPath for table
> elements - it seems that Firebug (or even Firefox ) adds tbody tag for
> every table, even if it does not exists in source HTML code

This is the effect of Firebug operating on a live browser DOM aware of
the intricacies, normalization rules and similar of HTML, where
Hpricot and many others operate on input on a more literal level.
Similar things happen when you have broken markup with badly nested
tags and similar; Firefox first normalizes the input to something it
can build a proper DOM of, and Firebug can then find the nodes using
whatever XPath ended up matching the generated DOM, which might differ
substantially from the DOM of the literal file.

> So XPath copied from Firebug cannot be used in other HTML parsers.

You'll probably find that Firebug's xpaths can be applied in other
environments doing HTML normalization similar to Firefox (using the
same xpath for cutting out nodes in Opera, for instance). Hpricot is
unfortunately not one of them, though.

--
 / Johan Sundström, http://ecmanaut.blogspot.com/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Vladimir Vladimirov  
View profile
 More options Apr 18 2007, 4:01 am
From: Vladimir Vladimirov <smartk...@gmail.com>
Date: Wed, 18 Apr 2007 01:01:14 -0700
Local: Wed, Apr 18 2007 4:01 am
Subject: Re: copy XPath question
Thanks for clarification
Anyway I'm still using Firebug to inspect DOM and then write XPath on
my own having in mind Firefox normalization
Also I'm using small RoR web based UI to test XPathes on target pages


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »

Create a group - Google Groups - Google Home - Terms of Service - Privacy Policy
©2008 Google